Page MenuHomePhabricator

There seems to be a fundamental problem in SimplifyCFG: Dead code removal can result in uninitialized variables. The impact is an “endless” loop which can be considered the consequence of searching for the initialization. More details are...
Needs ReviewPublic

Authored by Gerolf on Jan 15 2016, 7:01 PM.

Details

Reviewers
Gerolf
Summary

...below.

The proposed patch fixes the problem by eliminating *all* blocks dominated by a
block that has become unreachable. This requires dominance information. The
upside is that using dominance will also allow cleaning up code in SimplifyCFG
that computes “local” dominance eg. DominatesMergePoint(). The potential
downside is an increase in compile time, which I’m still collecting data on.
Anecdotally I hear the argument that computing dominance is expensive. I’m
curious if anyone has specifics about this Theoretically dominance is computed
by a linear run O(CFG), except when the CFG is irreducible (which should be
rare). In that case Tarjan gives an almost linear algorithm. So I would not
expect a material compile-time impact from it.

The patch also externs SimplifyCFG to indicate when more than one block is
delated, so the caller can take care of the stale iterator problem when
necessary.

More details:

The actual endless loop is in the constant compare gather() routine in
Utils/SimplifyCFG.cpp. The same value ret.0.off0.i is pushed back into the
queue:
%.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i

Here is what happens at the IR level:

for.cond.i: ; preds = %if.end6.i,
%if.end.i54
%ix.0.i = phi i32 [ 0, %if.end.i54 ], [ %inc.i55, %if.end6.i ]
%ret.0.off0.i = phi i1 [false, %if.end.i54], [%.ret.0.off0.i, %if.end6.i] <<<
%cmp2.i = icmp ult i32 %ix.0.i, %11
br i1 %cmp2.i, label %for.body.i, label %LBJ_TmpSimpleNeedExt.exit

if.end6.i: ; preds = %for.body.i
%cmp10.i = icmp ugt i32 %conv.i, %add9.i
%.ret.0.off0.i = or i1 %ret.0.off0.i, %cmp10.i <<<

When if.end.i54 gets eliminated which removes the definition of ret.0.off0.i.
The result is the expression %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
(Note the first ‘or’ operand is now %.ret.0.off0.i, and *NOT* %ret.0.off0.i). And
now there is use of .ret.0.off0.i before a definition which triggers the
“endless” loop in gather():

while(!DFT.empty()) {

V = DFT.pop_back_val();   // V is .ret.0.off0.i

if (Instruction *I = dyn_cast<Instruction>(V)) {
  // If it is a || (or && depending on isEQ), process the operands.
  if (I->getOpcode() == (isEQ ? Instruction::Or : Instruction::And)) {
    DFT.push_back(I->getOperand(1));  // This is now .ret.0.off0.i also
    DFT.push_back(I->getOperand(0));

    continue; // “endless loop” for .ret.0.off0.i
  }

[SimplifyCFG] Fix for "endless" loop after dead code removal

Diff Detail

Event Timeline

Gerolf updated this revision to Diff 45063.Jan 15 2016, 7:01 PM
Gerolf retitled this revision from to There seems to be a fundamental problem in SimplifyCFG: Dead code removal can result in uninitialized variables. The impact is an “endless” loop which can be considered the consequence of searching for the initialization. More details are....
Gerolf updated this object.
Gerolf added a reviewer: Gerolf.
Gerolf added a subscriber: llvm-commits.
reames added a subscriber: reames.Jan 19 2016, 11:58 AM

I strongly suspect this patch is incorrect as written. I don't have a particular counter example, but there are many places in SimplifyCFG that modify the CFG in ways that effect dominance information and I don't see enough updates in SimplifyCFG to account for that.

I would suggest focusing on the infinite loop you addressed rather than trying to make SimplifyCFG preserve dominator tree. I haven't studied the code you mentioned is at fault, but the standard visited set idiom or another approach for stopping infinite recursion seems likely to be much easier to implement.

If you want to go down that route, I'd suggest separating a separate patch series which teaches SimplifyCFG to preserve dominator tree info through all of the updates. I'll warn you this is going to be a good amount of work though! You'll probably want a on demand mechanism to force recalc within the pass and then go through each transform one at a time to reduce recalcs.

Gerolf edited edge metadata.Jan 27 2016, 8:51 PM

Philip, the code does not claim it preserves the dominator tree. Nor does it need to.

The reason why I think the code is correct although SImplifyCFG may modify the cfg is that
a) blocks only get removed when dominance has been computed for them
b) any block that is dominated by an unreachable block can be removed also (by definition)
c) if there is an new block introduce that is dominated by an unreachable block but dominance is missing the unreachable block will still be removed. Then the new block will be removed later in the SimplifyCFGPass loop.. So there is no change from the behavior of the existing code.

The only problematic scenario I can think of is that a) a new block is inserted and b) a block with an initialization becomes unreachable and is removed. Clearly there is no dominance info and thus a block with the PHI node (for.cond.i in the example) would not be removed which could then result in that endless loop. But in this scenario there needs to be a definition from the new block which then avoids the issue.

I was mostly concerned about compile-time, but my data from the LLVM test suite and benchmark (SPEC 2000, 2006 etc) also shows a few small improvements (likely in the noise) on x86 O3:
Performance Improvements - Execution Time Δ Previous Current σ
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk -2.69% 2.1296 2.0724 0.0179
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -2.04% 2.9327 2.8728 0.0222
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k -1.75% 3.4523 3.3920 0.0173
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt -1.64% 2.1694 2.1339 0.0083
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -1.56% 2.9577 2.9115 0.0175
SingleSource/Benchmarks/Misc-C++/Large/ray -1.18% 1.7727 1.7517 0.0009

Could you give this patch a second thought? Thanks!

Is the code still correct when the pass removes an edge entering a loop with two entries?

For example, if we initially have a CFG like this which has a loop (BB1,BB2),

BB0-->BB1<-->BB2<--BB3

BB0:
v0 = a
BB1:
v1 = phi(v0, v3)
...
BB2:
v2 = phi(v1, v4)
v3 = v2 + 1
...
BB3:
v4 = b

BB1 doesn't dominate BB2 because the loop has two entries from BB0 and BB3.

However, if simplifycfg is able to remove edge (BB3->BB2), (BB1,BB2) becomes a single entry loop with header BB1.

BB0:
v0 = a
BB1:
v1 = phi(v0, v3)
...
BB2:
v3 = v1 + 1
...

If simplifycfg is able to remove edge BB0->BB1, then you'll get

BB1:
...
BB2:
v3 = v3 + 1
...

Regardless of whether this is a valid counter example, I feel that using a stale dominance information to remove unreachable blocks is a little fragile. Is it possible to incrementally update the dominance information when the CFG is transformed? I guess we have to make sure it doesn't have a huge impact on compile time, but I think there are efficient ways to do it if we can identify the dominator tree nodes that are affected and don't have to update the dominator tree for the whole CFG?

Gerolf added a comment.Feb 2 2016, 8:45 PM

Thank you Akira and Philip. I agree with your concerns that using a stale dominator information could at some end up in a situation where the current implementation might not prevent a similar bug. For now I provided a simpler fix for the endless loop problem in http://reviews.llvm.org/D16839. Please take a look at that. In case there is a more systemic problem in SimplifyCFG that requires dominance in this scenario I think the best approach is to compute all blocks dominated by a block B on the fly. This can be done for example in 3 passes over all blocks reachable from B.