Page MenuHomePhabricator

[InstCombine] Fix infinite loop due to bitcast <-> phi transforms
ClosedPublic

Authored by nikic on Dec 7 2019, 3:26 AM.

Details

Summary

Fix for https://bugs.llvm.org/show_bug.cgi?id=44245.

The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering.

This patch fixes the issue by adding the old phis to the IC worklist, so they actually get removed (instead of this happening on the next InstCombine iteration). I'm not very familiar with the overall InstCombine design, so not sure if this is a proper fix or just a workaround.

Diff Detail

Event Timeline

nikic created this revision.Dec 7 2019, 3:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2019, 3:26 AM
lkail added a subscriber: lkail.Dec 7 2019, 3:34 AM
spatel accepted this revision.Dec 9 2019, 5:24 AM

I don't know if this is enough to solve the bug completely, but since it's a 1-liner and works on this case, LGTM.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
2383

Can we assert that OldPN has no users at this point?

llvm/test/Transforms/InstCombine/pr44245.ll
5

Add a comment line here that this used to infinite loop.

I know that reducers like to put 'undef' into instructions, but we tend to make those tests meaningless as we improve/change our understanding of undef. So I'd change this test to take an 'i1' argument and replace all of the 'br i1 undef' to use that argument instead.

This revision is now accepted and ready to land.Dec 9 2019, 5:24 AM
cwabbott added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
2383

This isn't true until D71209 lands, and therefore the problem could still happen because OldPN won't actually get cleaned up. Even afterwards, there could be a PHI web hanging around. In my patch I assumed that some other dead-code pass would come along and delete it eventually, but I don't know if InstCombine is smart enough to do that before folding the load. Maybe you can just remove all the old phi nodes manually once my patch lands?

nikic planned changes to this revision.Dec 9 2019, 12:15 PM
nikic marked an inline comment as done.
nikic added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
2383

Yes, we can't assert that that there are no users here, as there may still be phi users.

I generally agree that it may be safer to explicitly remove the old phi nodes -- InstCombine can clean up dead phi cycles, but I'm not sure whether there are any limitations to that. Doing a final replaceInstUsesWith UndefValue on all the old phis would be the safest option.

I'll hold off on this until your patch lands, as doing an undef replacement wouldn't be safe without it.

tmandry added a subscriber: tmandry.Dec 9 2019, 2:06 PM
nikic updated this revision to Diff 235701.Dec 31 2019, 3:41 AM

Directly remove the instruction, to make sure potential phi cycles also get immediately removed.

This revision is now accepted and ready to land.Dec 31 2019, 3:41 AM
nikic requested review of this revision.Dec 31 2019, 3:45 AM
nikic marked an inline comment as done.

I've now switched this to directly remove the instruction rather than only adding it to the worklist, please take another look.

spatel accepted this revision.Dec 31 2019, 6:18 AM

LGTM - see inline for test nit.

llvm/test/Transforms/InstCombine/pr44245.ll
3

The standard RUN line uses:

; RUN: opt -S -instcombine < %s | FileCheck %s

I don't know if the extra "-o -" makes any difference, but we prefer the redirect of the input file because it can prevent unintended matching of strings from a file path that managed to escape into the checked output.

This revision is now accepted and ready to land.Dec 31 2019, 6:18 AM
This revision was automatically updated to reflect the committed changes.
nikic reopened this revision.Dec 31 2019, 9:53 AM

Seems to have broken test-suite and bootstrap, the latter possibly with an infinite loop, based on timeout messages. Not immediately clear to me why...

This revision is now accepted and ready to land.Dec 31 2019, 9:53 AM
lebedev.ri requested changes to this revision.Jan 8 2020, 11:04 AM

Seems to have broken test-suite and bootstrap, the latter possibly with an infinite loop, based on timeout messages. Not immediately clear to me why...

This revision now requires changes to proceed.Jan 8 2020, 11:04 AM
nikic added a comment.Jan 11 2020, 3:56 AM

Mostly reduced testcase triggering an infinite loop with this patch:

%type_1 = type {}
%type_2 = type {}
%type_3 = type {}

define dso_local void @SearchGalley() local_unnamed_addr {
entry:
  br label %while.cond

while.cond:                                       ; preds = %cond.end144, %entry
  %link.0 = phi %type_2* [ undef, %entry ], [ %cond145, %cond.end144 ]
  %os115 = bitcast %type_2* %link.0 to %type_3*
  %ou116 = getelementptr inbounds %type_3, %type_3* %os115, i32 0
  %os1117 = bitcast %type_3* %ou116 to %type_1*
  br label %for.cond

for.cond:                                         ; preds = %while.cond
  switch i32 undef, label %sw.epilog [
    i32 120, label %sw.bb
    i32 122, label %sw.bb
    i32 121, label %sw.bb101
  ]

sw.bb:                                            ; preds = %for.cond, %for.cond
  unreachable

sw.bb101:                                         ; preds = %for.cond
  unreachable

sw.epilog:                                        ; preds = %for.cond
  br i1 undef, label %cond.true133, label %cond.false138

cond.true133:                                     ; preds = %sw.epilog
  %0 = load %type_2*, %type_2** undef, align 8
  br label %cond.end144

cond.false138:                                    ; preds = %sw.epilog
  %1 = load %type_2*, %type_2** undef, align 8
  br label %cond.end144

cond.end144:                                      ; preds = %cond.false138, %cond.true133
  %cond145 = phi %type_2* [ %0, %cond.true133 ], [ %1, %cond.false138 ]
  br label %while.cond
}
nikic added a comment.EditedJan 11 2020, 4:15 AM

As far as I can see this is basically the same issue as what this patch tries to solve. Just this time it's not the extra use in the old phi causing the loop, but a change in worklist order caused by this change.

I think to really fix this issue, the transformation of the load and store need to be included as part of this transform, otherwise we cannot reliably guarantee that things will be processed in an order that does not result in loops.

nikic updated this revision to Diff 237493.Jan 11 2020, 5:25 AM

Explicitly perform load combine.

nikic added a comment.Jan 11 2020, 5:30 AM

I've updated the patch to directly perform the load combine now. I believe the store case shouldn't exhibit the same issue, as the bitcast is cannot be a phi operand there. I've also added the new looping testcase and -instcombine-infinite-loop-threshold=2 to make sure that the combine runs through in one iteration.

lebedev.ri requested changes to this revision.Jan 13 2020, 1:43 AM

Did you verify that test-suite now passes?

(marking as reviewed, thanks)

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
2325–2327

I'm confused. Why are we replacing old LI with undef instead of NewV?
This should be explained in the comment.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
452 ↗(On Diff #237493)

Can you please precommit this?

llvm/test/Transforms/InstCombine/pr44245.ll
2

Precommit

This revision now requires changes to proceed.Jan 13 2020, 1:43 AM
nikic updated this revision to Diff 237718.Jan 13 2020, 9:58 AM

Add comment, split out NFC change.

nikic marked 3 inline comments as done.Jan 13 2020, 10:03 AM
nikic added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
2325–2327

Added a comment. Keep in mind that the new load is going to have a different type, so it can't be replaced. The whole transform replaces one phi web with another, so we're free to stick undefs in the original one.

llvm/test/Transforms/InstCombine/pr44245.ll
2

This is an infinite loop test. It's theoretically possible to precommit this because infinite loops are no longer infinite nowdays, but that means we have to mess with the exit code etc. I don't think there is any value in that, as we're not testing output anyway.

nikic planned changes to this revision.Jan 13 2020, 11:42 AM
nikic marked an inline comment as done.

Did you verify that test-suite now passes?

Gah. No infinite loops now, but MultiSource/Benchmarks/Olden/power/power.test may have a miscompile.

nikic added a comment.Jan 13 2020, 1:09 PM

I'm a bit stumped here. From the trace I see that previously instcombine produced

; Function Attrs: nounwind uwtable
define dso_local { double, double } @Compute_Leaf(%struct.leaf* nocapture %l, double %pi_R, double %pi_I) local_unnamed_addr #0 {
entry:
  %0 = bitcast %struct.leaf* %l to i64*
  %1 = load i64, i64* %0, align 8, !tbaa !32
  store i64 %1, i64* bitcast (double* @P to i64*), align 8, !tbaa !23
  %Q = getelementptr inbounds %struct.leaf, %struct.leaf* %l, i64 0, i32 0, i32 1
  %2 = bitcast double* %Q to i64*
  %3 = load i64, i64* %2, align 8, !tbaa !34
  store i64 %3, i64* bitcast (double* @Q to i64*), align 8, !tbaa !23
  tail call void @optimize_node(double %pi_R, double %pi_I)
  %4 = load double, double* @P, align 8, !tbaa !23
  %cmp = fcmp olt double %4, 0.000000e+00
  br i1 %cmp, label %if.then, label %entry.if.end_crit_edge

entry.if.end_crit_edge:                           ; preds = %entry
  %.pre = load i64, i64* bitcast (double* @Q to i64*), align 8, !tbaa !23
  %5 = bitcast i64 %.pre to double
  %phitmp = bitcast i64 %.pre to double
  br label %if.end

if.then:                                          ; preds = %entry
  store double 0.000000e+00, double* @P, align 8, !tbaa !23
  store double 0.000000e+00, double* @Q, align 8, !tbaa !23
  br label %if.end

if.end:                                           ; preds = %entry.if.end_crit_edge, %if.then
  %6 = phi double [ 0.000000e+00, %if.then ], [ %5, %entry.if.end_crit_edge ]
  %7 = phi double [ 0.000000e+00, %if.then ], [ %phitmp, %entry.if.end_crit_edge ]
  %8 = phi double [ 0.000000e+00, %if.then ], [ %4, %entry.if.end_crit_edge ]
  %9 = phi double [ 0.000000e+00, %if.then ], [ %4, %entry.if.end_crit_edge ]
  %10 = getelementptr %struct.leaf, %struct.leaf* %l, i64 0, i32 0, i32 0
  store double %8, double* %10, align 8, !tbaa !32
  store double %6, double* %Q, align 8, !tbaa !34
  %.fca.0.insert = insertvalue { double, double } undef, double %9, 0
  %.fca.1.insert = insertvalue { double, double } %.fca.0.insert, double %7, 1
  ret { double, double } %.fca.1.insert
}

while with this patch:

; Function Attrs: nounwind uwtable
define dso_local { double, double } @Compute_Leaf(%struct.leaf* nocapture %l, double %pi_R, double %pi_I) local_unnamed_addr #0 {
entry:
  %0 = bitcast %struct.leaf* %l to i64*
  %1 = load i64, i64* %0, align 8, !tbaa !32
  store i64 %1, i64* bitcast (double* @P to i64*), align 8, !tbaa !23
  %Q = getelementptr inbounds %struct.leaf, %struct.leaf* %l, i64 0, i32 0, i32 1
  %2 = bitcast double* %Q to i64*
  %3 = load i64, i64* %2, align 8, !tbaa !34
  store i64 %3, i64* bitcast (double* @Q to i64*), align 8, !tbaa !23
  tail call void @optimize_node(double %pi_R, double %pi_I)
  %4 = load double, double* @P, align 8, !tbaa !23
  %cmp = fcmp olt double %4, 0.000000e+00
  br i1 %cmp, label %if.then, label %entry.if.end_crit_edge

entry.if.end_crit_edge:                           ; preds = %entry
  %.pre12 = load double, double* @Q, align 8, !tbaa !23
  br label %if.end

if.then:                                          ; preds = %entry
  store double 0.000000e+00, double* @P, align 8, !tbaa !23
  store double 0.000000e+00, double* @Q, align 8, !tbaa !23
  br label %if.end

if.end:                                           ; preds = %entry.if.end_crit_edge, %if.then
  %5 = phi double [ 0.000000e+00, %if.then ], [ %.pre12, %entry.if.end_crit_edge ]
  %6 = phi double [ 0.000000e+00, %if.then ], [ %4, %entry.if.end_crit_edge ]
  %7 = phi double [ 0.000000e+00, %if.then ], [ %4, %entry.if.end_crit_edge ]
  %8 = getelementptr %struct.leaf, %struct.leaf* %l, i64 0, i32 0, i32 0
  store double %6, double* %8, align 8, !tbaa !32
  store double %5, double* %Q, align 8, !tbaa !34
  %.fca.0.insert = insertvalue { double, double } undef, double %7, 0
  %.fca.1.insert = insertvalue { double, double } %.fca.0.insert, double 0.000000e+00, 1
  ret { double, double } %.fca.1.insert
}

Note the double 0.0 argument on the second insertvalue.

Unfortunately I've not been able to reproduce this when running -instcombine standalone. I've also tried adding -tbaa -loops -expensive-combines but that didn't work either.

Am I missing some option that makes opt different from clang O3 instcombine?

nikic added a comment.Jan 13 2020, 2:22 PM

Okay, I think I got it. I don't have a standalone reproducer for this, but what seems to be happening is that we're iterating over OldPN->users() in https://github.com/llvm/llvm-project/blob/a506f7f9105eec4baac296d21c922457d6f4b52a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp#L2347, but https://github.com/llvm/llvm-project/blob/a506f7f9105eec4baac296d21c922457d6f4b52a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp#L2353 is going to modify the users. That makes us skip over a user (specifically, the original bitcast), which then refers to the old phi node.

That's already rather bad in itself (because it defeats the purpose of the transform and smells of memory unsafety), but with this change the undef replacement makes uses of the old phi actively incorrect.

nikic added a comment.Jan 13 2020, 2:39 PM

I was able to create a reproducer with uselistorder:

@Q = internal unnamed_addr global double 1.000000e+00, align 8
 
define double @test(i1 %c, i64* %p) {
entry:
  br i1 %c, label %if, label %end 

if:                           
  %load = load i64, i64* bitcast (double* @Q to i64*), align 8
  br label %end 

end:                                           
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, i64* %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast 

  uselistorder i64 %phi, { 1, 0 }
}

This also demonstrates the currently suboptimal (but not incorrect) transform without this patch: https://llvm.godbolt.org/z/KVqP2P

As such, we can fix this issue independently first :)

nikic requested review of this revision.Jan 14 2020, 12:55 AM

I've now verified that this passes test-suite when applied together with D72657.

lebedev.ri accepted this revision.Jan 14 2020, 8:43 AM

Okay, LG if there are no further comments from @spatel.

This revision is now accepted and ready to land.Jan 14 2020, 8:43 AM
spatel accepted this revision.Jan 14 2020, 9:12 AM

I wish we didn't have such complicated transforms in instcombine, but that's out-of-bounds for this patch. :)
LGTM

This revision was automatically updated to reflect the committed changes.