This is an archive of the discontinued LLVM Phabricator instance.

[Jump-Threading] Fixed jump threading hang issues (PR15386, PR15851)
ClosedPublic

Authored by dinesh.d on Jun 2 2014, 7:45 AM.

Details

Summary

For specific IR patterns, jump threading pass goes in to infinite loop transforming one pattern to other and then back to previous one.
For these kind of pattern just checking if BB changed is not enough. I have added a set of BBs to maintain how many BBs are updated
and code to terminate pass if count is not increased to previous iteration.

If required, we can tweak this to check if last few iterations have not updated any new BB to terminate pass.

I am not very familiar with Jump Threading pass so this patch might not correct. Please let me know if I should look in any other direction
to fix these issues.

Diff Detail

Event Timeline

dinesh.d updated this revision to Diff 10017.Jun 2 2014, 7:45 AM
dinesh.d retitled this revision from to [Jump-Threading] Fixed jump threading hang issues (PR15386, PR15851).
dinesh.d updated this object.
dinesh.d edited the test plan for this revision. (Show Details)
dinesh.d added a subscriber: Unknown Object (MLST).
dinesh.d updated this revision to Diff 10018.Jun 2 2014, 7:50 AM

Forgot to add RUN line in test case

rafael edited edge metadata.Jun 9 2014, 6:24 PM

It is not clear why this check is the correct one.One issue is

  • If in the first pass we update 3 BBs, and in the second pass we update 2, this will stop, right? How do you know that there is no further changes to be done?

The more important one is that it doesn't address the root of the problem. From the description it looks like two parts of the pass disagree as to what the canonical form is. Shouldn't we change one of them avoid the transformation?

What are the two forms that cause us to bounce back and forth?

The testcase should also use FileCheck to see that the output is in the canonical form (whatever we decide that to be).

Each pass iterates over all BBs of the functions and all updated BBs are added to
'UpdatedBB' so if first pass adds 3 BBs and second pass adds 2, there will be
5 BBs in the list and we will still continue with next pass.

Every change in BBs actually adds atleast one new block XXX.thread and till it finds
some new block added or updated, it will do next iteration on all BBs of the function.
and the condition, that will break this is when we are not adding any new block or not
updatind any new BB.

In attached test case, jump threading stuck between 'for.cond1' and 'for.body' block.
Here is sample output for few iterations.
After first iteration:

for.cond1:                                        ; preds = %for.body.thread, %for.body, %land.rhs
  %div5 = phi i32 [ 0, %for.body.thread ], [ %div, %for.body ], [ %div, %land.rhs ]
  %inc = add nsw i32 %div5, 1
  %cmp = icmp slt i32 %inc, 2
  br i1 %cmp, label %for.body, label %for.cond1.thread

for.cond1.thread:                                 ; preds = %for.cond1
  br label %for.body.thread

for.body.thread:                                  ; preds = %for.cond1.thread
  br label %for.cond1

for.body:                                         ; preds = %for.cond1
  %i.02 = phi i32 [ %inc, %for.cond1 ]
  %div = sdiv i32 %i.02, 2
  %i.0.off = add i32 %i.02, 1
  %0 = icmp ugt i32 %i.0.off, 2
  br i1 %0, label %land.rhs, label %for.cond1

2nd:

for.cond1:                                        ; preds = %for.body.thread, %for.body, %land.rhs
  %div5 = phi i32 [ %div, %for.body ], [ %div, %land.rhs ], [ 0, %for.body.thread ]
  %inc = add nsw i32 %div5, 1
  %cmp = icmp slt i32 %inc, 2
  br i1 %cmp, label %for.body, label %for.cond1.thread6

for.cond1.thread6:                                ; preds = %for.cond1
  br label %for.body.thread

for.body.thread:                                  ; preds = %for.cond1.thread6
  br label %for.cond1

for.body:                                         ; preds = %for.cond1
  %i.02 = phi i32 [ %inc, %for.cond1 ]
  %div = sdiv i32 %i.02, 2
  %i.0.off = add i32 %i.02, 1
  %0 = icmp ugt i32 %i.0.off, 2
  br i1 %0, label %land.rhs, label %for.cond1

and 3rd:

for.cond1:                                        ; preds = %for.body.thread, %for.body, %land.rhs
  %div5 = phi i32 [ %div, %for.body ], [ %div, %land.rhs ], [ 0, %for.body.thread ]
  %inc = add nsw i32 %div5, 1
  %cmp = icmp slt i32 %inc, 2
  br i1 %cmp, label %for.body, label %for.cond1.thread

for.cond1.thread:                                 ; preds = %for.cond1
  br label %for.body.thread

for.body.thread:                                  ; preds = %for.cond1.thread
  br label %for.cond1

for.body:                                         ; preds = %for.cond1
  %i.02 = phi i32 [ %inc, %for.cond1 ]
  %div = sdiv i32 %i.02, 2
  %i.0.off = add i32 %i.02, 1
  %0 = icmp ugt i32 %i.0.off, 2
  br i1 %0, label %land.rhs, label %for.cond1

I will update test cases as you suggested. Actually I am not sure about test as
it hangs with trunk and check-all never completes.

Here are few observations I came across during debugging:
If any of the existing BB in the function is updated, it will be added in UpdatedBB,
if not already exist. For this new XXX.thread blocks are not considered for this
iteration.

For next iteration, all existing BB and new XXX.thread blocks are considered and if
any of the block is updated, it will get added to UpdatedBB.

Condition I have added to break, will be true if none of the existing BBs are updated,
which were not updated in previous passes.

So until, passes are threading some new path, iterations will go on and will break only
if already threaded paths are re-threaded and no new threading is introduced.

dinesh.d updated this revision to Diff 10272.Jun 10 2014, 5:57 AM
dinesh.d edited edge metadata.

updated test case

I looked a bit in a debugger and the issue is not the Changed or EverChanged variables. That part of the logic is correct. It is simply "while we found something profitable to do, do it".

What is wrong is that we can find two conflicting actions and we think they are both profitable. The testcase can be reduced to just

define void @f() {
entry:
ret void

for.cond1:
%i.025 = phi i32 [ %inc, %for.body ], [ %inc, %for.body ], [ 1, %for.cond1 ]
%cmp = icmp slt i32 %i.025, 2
br i1 %cmp, label %for.body, label %for.cond1

for.body:
%inc = add nsw i32 %i.025, 0
%a = icmp ugt i32 %inc, 2
br i1 %a, label %for.cond1, label %for.cond1
}

What happens is that we jump thread the jump from for.cond1 to itself, since we can see that the result is just a jump to for.body:

define void @f() {
entry:
ret void

for.cond1:
%i.025 = phi i32 [ %inc, %for.body ], [ %inc, %for.body ]
%cmp = icmp slt i32 %i.025, 2
br i1 %cmp, label %for.body, label %for.cond1.thread

for.cond1.thread:
br label %for.body

for.body:
%i.0252 = phi i32 [ 1, %for.cond1.thread ], [ %i.025, %for.cond1 ]
%inc = add nsw i32 %i.0252, 0
%a = icmp ugt i32 %inc, 2
br i1 %a, label %for.cond1, label %for.cond1
}

We then jump thread the jump from for.cond1.thread to for.body:

define void @f() {
entry:
ret void

for.cond1:
%i.025 = phi i32 [ %inc, %for.body ], [ %inc, %for.body ], [ 1, %for.body.thread ]
%cmp = icmp slt i32 %i.025, 2
br i1 %cmp, label %for.body, label %for.cond1.thread

for.cond1.thread:
br label %for.body.thread

for.body.thread:
br label %for.cond1

for.body:
%i.0252 = phi i32 [ %i.025, %for.cond1 ]
%inc = add nsw i32 %i.0252, 0
%a = icmp ugt i32 %inc, 2
br i1 %a, label %for.cond1, label %for.cond1
}

And we are back to a situation where for.cond1 can be jump threaded. This will loop forever, with for.cond1 and for.body alternating at giving each other another phi edge.

Now, why doesn't this happen all the time? The answer is that we don't jump thread over destinations of back edges. For example, a small modification to the test:

define void @f() {
entry:
br label %for.cond1

for.cond1:
%i.025 = phi i32 [ %inc, %for.body ], [ %inc, %for.body ], [ 1, %for.cond1 ], [ 1, %entry ]
%cmp = icmp slt i32 %i.025, 2
br i1 %cmp, label %for.body, label %for.cond1

for.body:
%inc = add nsw i32 %i.025, 0
%a = icmp ugt i32 %inc, 2
br i1 %a, label %for.cond1, label %for.cond1
}

will prevent it from going in an infinite loop since now for.cond1 is a destination of a back edge.

The logic is sound for reachable code: If we are going over the basic blocks in a loop, at some point we get to one that is pointed by a back edge and stop.

The problem is that in unreachable code there can be cycles with no back edges and we loop forever.

It seems that the best solution is to just delete dead code upfront, or at least don't try to jump thread it.

Cheers,
Rafael

dinesh.d updated this revision to Diff 10316.Jun 10 2014, 11:10 PM

Wow, I couldn't have come up with that. I have updated code to remove
unreachable blocks at the start of pass.

I had to update one test case in select.ll where

switch i32 %0, label %L2 [
  i32 5061, label %L1
  i32 0, label %L2
]

is now getting converted to

%cond = icmp eq i32 %0, 5061
br i1 %cond, label %L2.thread, label %L2
rafael accepted this revision.Jun 16 2014, 1:10 PM
rafael edited edge metadata.

LGTM with two nits.

lib/Transforms/Scalar/JumpThreading.cpp
161

Please include a comment saying why this is not just an optimization.

test/Transforms/JumpThreading/pr15851_hang.ll
1

I don't think timeout is portable. Just run opt directly.

This revision is now accepted and ready to land.Jun 16 2014, 1:10 PM
dinesh.d updated this revision to Diff 10501.Jun 17 2014, 7:37 AM
dinesh.d edited edge metadata.

Thanks. Updated patch as per comments.

dinesh.d closed this revision.Jun 17 2014, 7:42 AM
dinesh.d updated this revision to Diff 10502.

Closed by commit rL211103 (authored by dinesh).