This is an archive of the discontinued LLVM Phabricator instance.

Differential D20847

BitCast infinite loop fix
AbandonedPublic

Authored by evstupac on May 31 2016, 6:23 PM.

Download Raw Diff

Details

Reviewers

majnemer
Carrot

Summary

The patch helps to avoid infinite loop while BitCast instruction combining for the case in D14596.

A -> B cast
PHI
B -> A cast

While combining (iterating through worklist) LLVM compiler creates one new PHI and one new BitCast for load and store instructions.
InstCombine builder implicitly add the instructions to worklist. This potentially creates infinite loop.

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac updated this revision to Diff 59163.May 31 2016, 6:23 PM

evstupac retitled this revision from to BitCast infinite loop fix.

evstupac updated this object.

evstupac added reviewers: majnemer, Carrot.

evstupac set the repository for this revision to rL LLVM.

evstupac added subscribers: llvm-commits, davidxl, echristo, kbarton.

evstupac mentioned this in D14596: [SROA] Choose more profitable type in findCommonType.May 31 2016, 6:43 PM

This needs a testcase.

Also, I find removing things from the worklist incredibly, deeply suspicious. Otherwise the rest of InstCombine must be careful not to add the item to the list, no?

In D20847#445263, @majnemer wrote:

This needs a testcase.

Also, I find removing things from the worklist incredibly, deeply suspicious. Otherwise the rest of InstCombine must be careful not to add the item to the list, no?

Could you please explain how this ends up looping? Are we recreating the same input over again?

Also, I find removing things from the worklist incredibly, deeply suspicious. Otherwise the rest of InstCombine must be careful not to add the item to the list, no?

Here InstCombine creates exactly the thing it is searching for: new BitCast (lines 1876, 1898) and new PHI (line 1861).

Could you please explain how this ends up looping? Are we recreating the same input over again?

I'll try to narrow my case and add a test case.
Basically what can occur is:

xB = load (type B);
xA = load (type A);
yB = (B)xA;           // A -> B
zB = PHI[xB, yB];     // PHI
zA = (A)zB;           // B -> A
store zA;
store zB;

The transformation will create (new instruction marked with +):

xB = load (type B);
xA = load (type A);
+yA = (A)xB;             // B -> A
+zAn = PHI[yA, xA];      // PHI
+zBn = (B)zAn;           // A -> B
store zAn;
store zBn;

So why does it loop more than once?
Does it not fixpoint at the second form?
If not that seems like the issue.

A test case added.
An early exit added for the described case instead of removing instructions from a Worklist.

fix a typo in the test case options

Added PR27996 with "C" test case that hangs as well as test in the patch.

majnemer added inline comments.Jun 3 2016, 12:34 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1816–1825	Could a problem arise if this phi was used by a phi?

Yeah, i was about to say, you can make this cycle as large as you want.
It may be that instcombine doesn't happen to match that pattern now, but in
general, it has to have a fixpoint where it says "yeah, this is the right
form" or else you are always prone to infinite loops with phi cycles.

evstupac added inline comments.Jun 3 2016, 12:45 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1816–1825	No. We add new BitCast at the end only if PN->uses() has store (line 1913). Here I'm looking for "store" in the same PN->uses().

it has to have a fixpoint where it says "yeah, this is the right
form" or else you are always prone to infinite loops with phi cycles.

We can cycle only if InstCombine here adds instructions it is looking for. Otherwise patterns from Worklist will end at some point.
Unless some other InstCombine optimization adds the same pattern. However, this is generally unpredictable. To avoid such cases we can end InstCombine at some point, say if for some time Worklist only grows or stay unchanged. But that is for another patch.

In D20847#448754, @evstupac wrote:

it has to have a fixpoint where it says "yeah, this is the right
form" or else you are always prone to infinite loops with phi cycles.

We can cycle only if InstCombine here adds instructions it is looking for. Otherwise patterns from Worklist will end at some point.

This can, of course, happen.

Unless some other InstCombine optimization adds the same pattern. However, this is generally unpredictable. To avoid such cases we can end InstCombine at some point, say if for some time Worklist only grows or stay unchanged. But that is for another patch.

InstCombine is supposed to run to fix-point, it is considered a feature because it's goal is to canonicalize the IR.

I'm starting to wonder if the original transform is phrased incorrectly.

This can, of course, happen.

The patch is trying to avoid this at least for BitCastOptimization.

Anyway infinite loop on compilation is not ok. And we should revert r270135 and r270479 or apply some fix for this.

PING.
The test in PR27996 still hangs as well as the test in the patch.
Can we apply the fix? Temporary? Is someone working on another solution?

PING 2.
Tests still hang.

I would be in favor of reverting r263734+r270135 (i.e. delete optimizeBitCastFromPhi); r263734 clearly slipped through without sufficient review.

Sorry if I am missing something.
But is 2 votes here enough to revert the revisions?
Should author do this? Or I can do this as well?
There are still no vote from author and reviewer.

In situations like this, where it is causing demonstrable issues and
appears broken, two "votes" are enough to revert the change.

You should ask the author to revert the change, explicitly, in a separate
thread (a lot of people filter review mail).
If they don't do so expediently, you should revert it for them, and ask
them to take a look at fixing when they get a chance.

Hi All,

Please go ahead and revert these. I've just noticed this thread, but Carrot
is on vacation for a few weeks and so likely isn't reading this.

Do follow up on the original commits and let him know what you've done.

Thanks!

-eric

I've done it here:

Committing to https://llvm.org/svn/llvm-project/llvm/trunk ...
D test/Transforms/InstCombine/pr25342.ll
D test/Transforms/InstCombine/pr27703.ll
M lib/Transforms/InstCombine/InstCombineCasts.cpp
M lib/Transforms/InstCombine/InstCombineInternal.h
Committed r274094

I'll follow up to the latter of the original commits.

-eric

xB = load (type B);
xA = load (type A);
+yA = (A)xB;             // B -> A
+zAn = PHI[yA, xA];      // PHI
+zBn = (B)zAn;           // A -> B
store zAn;
store zBn;

optimizeBitCastFromPhi generates

+zBn = (B)zAn;           // A -> B

and expects it will be combined with the following store instruction to another

store zAn

But unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.

optimizeBitCastFromPhi only generates BitCast for load/store instructions, and BitCast with load/store instructions can easily be handled by InstCombineLoadStoreAlloca.cpp. So one solution to the problem is if all users of a CI are load/store instructions, we should not do optimizeBitCastFromPhi on it.

evstupac abandoned this revision.May 2 2018, 12:25 PM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

25 lines

test/

Transforms/

InstCombine/

bitcast-cycle.ll

43 lines

Diff 59446

lib/Transforms/InstCombine/InstCombineCasts.cpp

	Show First 20 Lines • Show All 1,793 Lines • ▼ Show 20 Lines
	///			///
	/// A -> B cast			/// A -> B cast
	/// PHI			/// PHI
	/// B -> A cast			/// B -> A cast
	///			///
	/// All the related PHI nodes can be replaced by new PHI nodes with type A.			/// All the related PHI nodes can be replaced by new PHI nodes with type A.
	/// The uses of \p CI can be changed to the new PHI node corresponding to \p PN.			/// The uses of \p CI can be changed to the new PHI node corresponding to \p PN.
	Instruction InstCombiner::optimizeBitCastFromPhi(CastInst &CI, PHINode PN) {			Instruction InstCombiner::optimizeBitCastFromPhi(CastInst &CI, PHINode PN) {
				// Do not try to optimize the following case:
				// ...
				// Z = PHI [X, ...
				// Y = BitCast (Z)
				// store Y
				// store Z
				// As the optimization converts the case to:
				// ...
				// Zn = PHI [Xn, ...
				// X = BitCast (Xn)
				// store Zn
				// store X
				// Potentially leading to infinite loop.
				bool Store = false;
				for (User *U : PN->users()) {
				auto *SI = dyn_cast<StoreInst>(U);
				if (SI && SI->isSimple() && SI->getOperand(0) == PN)
				Store = true;
				}
				for (User *U : CI.users()) {
				auto *SI = dyn_cast<StoreInst>(U);
				if (SI && SI->isSimple() && SI->getOperand(0) == &CI && Store)
				return nullptr;
				}
				majnemerUnsubmitted Not Done Reply Inline Actions Could a problem arise if this phi was used by a phi? majnemer: Could a problem arise if this phi was used by a phi?
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions No. We add new BitCast at the end only if PN->uses() has store (line 1913). Here I'm looking for "store" in the same PN->uses(). evstupac: No. We add new BitCast at the end only if PN->uses() has store (line 1913). Here I'm looking…

	Value *Src = CI.getOperand(0);			Value *Src = CI.getOperand(0);
	Type *SrcTy = Src->getType(); // Type B			Type *SrcTy = Src->getType(); // Type B
	Type *DestTy = CI.getType(); // Type A			Type *DestTy = CI.getType(); // Type A

	SmallVector<PHINode *, 4> PhiWorklist;			SmallVector<PHINode *, 4> PhiWorklist;
	SmallSetVector<PHINode *, 4> OldPhiNodes;			SmallSetVector<PHINode *, 4> OldPhiNodes;

	// Find all of the A->B casts and PHI nodes.			// Find all of the A->B casts and PHI nodes.
	▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

test/Transforms/InstCombine/bitcast-cycle.ll

				; RUN: opt < %s -S -instcombine \| FileCheck %s
				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

				; This test case checks that BitCast optimization do not cycle.

				; CHECK: define i32 @foo

				@i = constant i32 1, align 4
				@f = constant float 0x3FF19999A0000000, align 4
				@cmp = common global i32 0, align 4
				@resf = common global float* null, align 8
				@resi = common global i32* null, align 8

				; Function Attrs: norecurse nounwind uwtable
				define i32 @foo() #0 {
				entry:
				br label %while.cond

				while.cond: ; preds = %if.then, %if.else, %entry
				%res.0 = phi i32* [ null, %entry ], [ @i, %if.then ], [ bitcast (float* @f to i32*), %if.else ]
				%0 = load i32, i32* @cmp, align 4
				%shr = ashr i32 %0, 1
				store i32 %shr, i32* @cmp, align 4
				%tobool = icmp ne i32 %shr, 0
				br i1 %tobool, label %while.body, label %while.end

				while.body: ; preds = %while.cond
				%and = and i32 %shr, 1
				%tobool1 = icmp ne i32 %and, 0
				br i1 %tobool1, label %if.then, label %if.else

				if.then: ; preds = %while.body
				br label %while.cond

				if.else: ; preds = %while.body
				br label %while.cond

				while.end: ; preds = %while.cond
				%1 = bitcast i32* %res.0 to float*
				store float* %1, float** @resf, align 8
				store i32* %res.0, i32** @resi, align 8
				ret i32 0
				}