This is an archive of the discontinued LLVM Phabricator instance.

[LICM] sink through non-trivially replicable PHI
ClosedPublic

Authored by junbuml on Aug 25 2017, 1:44 PM.

Download Raw Diff

Details

Reviewers

hfinkel
majnemer
davidxl
bmakam
mcrosier
danielcdh
efriedma
jtony

Commits

rGf5fb3d745d9b: [LICM] sink through non-trivially replicable PHI
rL317335: [LICM] sink through non-trivially replicable PHI

Summary

The current LICM allows sinking an instruction only when it is exposed to exit
blocks through a trivially replacable PHI of which all incoming values are the
same instruction. This change enhance LICM to sink a sinkable instruction
through non-trivially replacable PHIs by spliting predecessors of loop
exits.

Diff Detail

Repository: rL LLVM

Event Timeline

junbuml created this revision.Aug 25 2017, 1:44 PM

efriedma added inline comments.Aug 25 2017, 2:43 PM

lib/Transforms/Scalar/LICM.cpp
803 ↗	(On Diff #112738)	Do you need to do something to keep SunkCopies up to date?
858 ↗	(On Diff #112738)	The indexing here is a little weird; can you first compute all the blocks to split, then split them?

Addressed Eli's comment.

lib/Transforms/Scalar/LICM.cpp
803 ↗	(On Diff #112738)	This is to clone only one instruction per exit block and I couldn't think of a case where the same instruction is cloned several times in the same exit block. Please let me know if you can see any case where we need to update SunkCopies.

efriedma added inline comments.Aug 29 2017, 11:59 AM

lib/Transforms/Scalar/LICM.cpp
803 ↗	(On Diff #112738)	The reason the SunkCopies map exists is that there could be multiple PHI nodes which use the same instruction in a loop exit block. Maybe it would be more straightforward if you would make all the necessary calls to splitPredecessorsOfLoopExit before you actually start sinking instructions?

Addressed Eli's comment.

junbuml added inline comments.Aug 30 2017, 8:33 AM

lib/Transforms/Scalar/LICM.cpp
803 ↗	(On Diff #112738)	Now, I split non-trivially replaceable PHIs before sinking in sink(). Please take a look and let me know any comment. Thanks Eli for the review.

I'd like more eyes on this to figure out if there are potential problems with this transform (maybe there's a performance cost for splitting edges in some cases?).

A few more testcases would be nice; maybe a loop with multiple exit blocks, a loop with more than two exits pointing to the same exit block, and a loop where multiple exit PHI nodes use the same sinkable instruction.

lib/Transforms/Scalar/LICM.cpp
850 ↗	(On Diff #113266)	Iterating over a SmallPtrSet is non-deterministic.
879 ↗	(On Diff #113266)	Maybe put a comment here describing what the first loop is doing?
925 ↗	(On Diff #113266)	Unnecessary cast.

Overall, i'm unclear on why you can't compute which ones will actually be sunk, and then split the preds.
Also, the examples does not even require splitting preds to effect the transform.

These are simply phi of ops form, and there is an equivalent op of phis form you could use directly in the exit block.
For example, you have, in the last test (i converted the checks to the code):

Out12.split.loop.exit:
lcssaphi1 = phi i32 [ %N_addr.0.pn, %ContLoop ]
mulres  = mul i32 %N, %lcssaphi1
subres1 = sub %mulres, %N2
br label %Out12

Out12.split.loop.exit1:
lcssaphi2 = phi i32 [ %N_addr.0.pn, %Loop ]
mulres2 =  mul i32 %N, %lcssaphi2
subres2 = sub i32, %mulres2, %N
br label %Out12

Out12:
something = phi i32 [%subres, %Out12.split.loop.exit ], [ %subres2, %Out12.split.loop.exit1 ]

This is the same as

Out12: 
phires = phi i32 [%N, %Loop], [%N2, %ContLoop]
mulres = mul i32 %N, %N_addr.0.pn
; something here has the same value as the phi named something above
something = sub i32 %mulres, %phires

It seems to me if you can compute which can be sunk before choosing to split the preds, you can do it without splitting the preds.
What am i missing?

lib/Transforms/Scalar/LICM.cpp
911 ↗	(On Diff #113266)	Is there a reason not to just put all the users in a smallvector of WeakVH first? While it's true splitPred may invalidate the iterators, none of the answers to what this loop does can change (that i can see) as a result of that (nor will this loop do anything for any new values).

I agree that the test doesn't really require splitting since the operations of incoming values are the same (sub/mul) and all operands used in the sinkable chain (%N_addr.0.pn, %N, and %N2) dominate the PHI, so we can directly add PHIs for operands and share the same operation in the exit block like you comment. For me, it seems that the test case is a special case in which splitting is not required. In most case, I think splitting is required to sink through a non-trivially replicable PHI. For example, if operations of incoming values are different or any operands in the sinkable chain do not dominate the PHI, we should split preds.

For me, SimplifyCFG should clean up such foldable case. The test case seems to be handled properly by SimplifyCFG :

Out12:
  %N_addr.0.pn.lcssa.sink = phi i32 [ %N_addr.0.pn, %Loop ], [ %N_addr.0.pn, %ContLoop ]
  %N.sink = phi i32 [ %N, %Loop ], [ %N2, %ContLoop ]
  %tmp.6.le = mul i32 %N, %N_addr.0.pn.lcssa.sink
  %tmp.7.le = sub i32 %tmp.6.le, %N.sink

Would it make sense that LICM split preds for non-trivially replicable PHIs in general and let the SimplyCFG fold such case. If there is any case where SimplyCFG cannot fold, we can improve SimplyCFG.

In D37163#857853, @junbuml wrote:

I agree that the test doesn't really require splitting since the operations of incoming values are the same (sub/mul) and all operands used in the sinkable chain (%N_addr.0.pn, %N, and %N2) dominate the PHI, so we can directly add PHIs for operands and share the same operation in the exit block like you comment. For me, it seems that the test case is a special case in which splitting is not required. In most case, I think splitting is required to sink through a non-trivially replicable PHI. For example, if operations of incoming values are different or any operands in the sinkable chain do not dominate the PHI, we should split preds.

The case where you can't place it somewhere safe should be the case where the edge is critical.
Otherwise, there should always be somewhere to place the computation safely and correctly.
Now, i admit it may be tricky to come up with such an algorithm, but i'd also be surprised if it doesn't exist in literature already.

For me, SimplifyCFG should clean up such foldable case. The test case seems to be handled properly by SimplifyCFG :
Out12:
  %N_addr.0.pn.lcssa.sink = phi i32 [ %N_addr.0.pn, %Loop ], [ %N_addr.0.pn, %ContLoop ]
  %N.sink = phi i32 [ %N, %Loop ], [ %N2, %ContLoop ]
  %tmp.6.le = mul i32 %N, %N_addr.0.pn.lcssa.sink
  %tmp.7.le = sub i32 %tmp.6.le, %N.sink
Would it make sense that LICM split preds for non-trivially replicable PHIs in general and let the SimplyCFG fold such case. If there is any case where SimplyCFG cannot fold, we can improve SimplyCFG.

If we can avoid creating the mess in the first place, i'd rather see us do that.
Changing the CFG like this costs something (incremental dominator updates, etc).
If we don't need to do it, we generally shouldn't do it.

If the algorithm turns out to be wildly complex, sure, let's do the easy thing and split preds and let something clean it up.
But i'd at least like to see us *try*.

Of course, it would be good to avoid unnecessary splitting in the fist place if it's not very costly. Let me try to find a reasonable approach for this.

The case where you can't place it somewhere safe should be the case where the edge is critical.
Otherwise, there should always be somewhere to place the computation safely and correctly.

I'm not perfectly clear about your above comment. Did you mean that we cannot safely sink a sinkable instruction through a critical edge? So, in such case we should split the critical edge to sink?

In D37163#858781, @junbuml wrote:

Of course, it would be good to avoid unnecessary splitting in the fist place if it's not very costly. Let me try to find a reasonable approach for this.

If you need help, let me know.
If you can't find one, awesome, let's go with splitting :)

The case where you can't place it somewhere safe should be the case where the edge is critical.
Otherwise, there should always be somewhere to place the computation safely and correctly.

I'm not perfectly clear about your above comment. Did you mean that we cannot safely sink a sinkable instruction through a critical edge? So, in such case we should split the critical edge to sink?

Yes.
Instructions are really placed on edges in most cases.
That is, you are hoisting or sinking it out of a block, and want it to appear first/last in another block.
This is really edge placement.
But most compilers don't allow code on edges, so you actually have to put it in the block.
In the normal, non-critical edge case, this is safe to do. That is, it's possible to do it and have the instruction execute under only the same conditions it did before.

In the critical edge case, you may need to split the edge in order to have a place to put it.

Let me give you a hoisting example just because people seem to find it easier when i explain it that direction:

a      b
| \  /
c  d

We have the same computation in B and D.
So you want to do PRE on it.
To eliminate the extra computation, you need to place a computation on the edge between A and D. That is the only way to ensure that the computation is still available on all paths to D.

If you could place code on that edge, it would be easy, no muss, no fuss!
You'd be guaranteed it only executes on the a->d path.
LLVM (and most other compilers) can't place code on edges, only in the blocks.

But if you place the computation in block A, it may also be computed on the path to C, and when D is never executed.
That's no good, now your computation may execute when it didn't before (think of a load, for example, or any trapping instruction).

This happens because a->d is a critical edge. It's an edge from a block with multiple successors to a block with multiple predecessors.

The only safe way to place the computation on this edge is to split this edge so it looks like:

a
| \         b
c   t     /
      \  /
        d

and then place the computation in t.

Sinking has the same problem (imagine you want to sink something from a to d. if you place it at the beginning of d, now it also occurs on the b->d path, where it may never have occurred before. The only way to make it occur just on the a->d path is to split the edge as we did above)
The edges to loop exits may often be critical edges, so you may (but not always) have to split them to place computations safely.
You can actually compute which edges are "blocking" your transformation and split them.
(you could also pre-split all critical edges, at some cost. There is a lot of literature on this, and the TL;DR is that you can prove that a number of dataflow problems do not get optimal answers in the presence of critical edges. LLVM has not chosen this path so far, however, because the cost has outweighed the practical benefit).

I looked at if we can add an extra check to avoid unnecessary splitting, but for me this seems to be a special case for which I don't want to increase complexity in this change. Since all incoming edges in non-trivially replacable PHI in LCSSA is critical, splitting them by default doesn't seems unreasonable to me.

The splitting is not required only when all incoming value are sinkable and the operation of them is same in the non-trivially replicable PHIs. In such case, we can sink the same sinkable operation and modify the PHI to pass the operands to the shared operation like the IR you mentioned above. Handling such case pretty much special case which I believe we can let other pass handle or we can extend it in a separate patch if we need to handle it in LICM.

Added more test cases and made minor updates.

junbuml added inline comments.Sep 7 2017, 10:22 AM

lib/Transforms/Scalar/LICM.cpp
911 ↗	(On Diff #113266)	After splitPredecessorsOfLoopExit(), user PHIs will be modified and could no longer be an user and also could be not in the exit block anymore. We also need to get Use from user_iterator for the unreachability check for Phi's incoming block in this loop. Instead of holding the original users and iterating them over, I think restarting the iterators for valid users might be earlier to read. To avoid performing the same check for the same users, I cached visited users. If you see the test16() in the test case, the same phi is used multiple times as an user. After splitPredecessorsOfLoopExit, the PHI will no longer be an user of the sinkable instruction, so we may not want to revisit this no-longer-user phi.

Okay.
One reason we don't split critical edges by default in most passes in LLVM is that edges coming out of switches that lead to the same block are pretty much always critical.
Doing so by default for large switches leads to *huge* code blowup.

IE it goes from 1 block to N blocks, where N is the number of switch cases.

If there are multiple layers of switches, it's obviously even worse.

Have you tested on this on any switch heavy code to ensure it does not blow up code size and compile time?

Okay.
One reason we don't split critical edges by default in most passes in LLVM is that edges coming out of switches that lead to the same block are pretty much always critical.
Doing so by default for large switches leads to *huge* code blowup.
IE it goes from 1 block to N blocks, where N is the number of switch cases.
If there are multiple layers of switches, it's obviously even worse.
Have you tested on this on any switch heavy code to ensure it does not blow up code size and compile time?

I tried to build spec2000/spec2006/spec2017 with/without this change for aarch64. This change was applied widely and as far as I know perlbench use lots of big switches, but I didn't see any significant changes in code size and compile time across all benchmarks.
However, as you point out, it could possibly increase code size even in some case where splitting is required for sinking. Just to be safe what about checking if the number of predecessors is larger than some threshold from cl::opt before splitting?

It's possible to get quadratic code growth from LICM sinking. For example:

int foo();
int unconditional_break(int n, int *p) {
int r = 0;
for (int i = 0; i < n; ++i) {
  int x = foo();
  int a = *p;
  int b = a * a;
  int c = b * b;
  int d = c * c;
  int e = d * d;
  int f = e * e;
  if (x == 1) { r = a; break; }
  if (x == 2) { r = b; break; }
  if (x == 3) { r = c; break; }
  if (x == 4) { r = d; break; }
  if (x == 5) { r = e; break; }
  if (x == 6) { r = f; break; }
}
return r;
}

int foo();
int conditional_break(int n, int *p) {
int r = 0;
for (int i = 0; i < n; ++i) {
  int x = foo();
  int a = *p;
  int b = a * a;
  int c = b * b;
  int d = c * c;
  int e = d * d;
  int f = e * e;
  if (x == 1) { r = a; if (foo()) break; }
  if (x == 2) { r = b; if (foo()) break; }
  if (x == 3) { r = c; if (foo()) break; }
  if (x == 4) { r = d; if (foo()) break; }
  if (x == 5) { r = e; if (foo()) break; }
  if (x == 6) { r = f; if (foo()) break; }
}
return r;
}

The source contains 5 multiplies, but the optimized IR has 15 due to sinking. For the first function, sinking happens on trunk. For the second function, this patch will allow sinking, I think.

That said, codesize bloat from hasn't ever shown up as a problem in practice; I doubt it will cause problems here.

In D37163#863648, @junbuml wrote:

Okay.
One reason we don't split critical edges by default in most passes in LLVM is that edges coming out of switches that lead to the same block are pretty much always critical.
Doing so by default for large switches leads to *huge* code blowup.
IE it goes from 1 block to N blocks, where N is the number of switch cases.
If there are multiple layers of switches, it's obviously even worse.
Have you tested on this on any switch heavy code to ensure it does not blow up code size and compile time?

I tried to build spec2000/spec2006/spec2017 with/without this change for aarch64. This change was applied widely and as far as I know perlbench use lots of big switches, but I didn't see any significant changes in code size and compile time across all benchmarks.
However, as you point out, it could possibly increase code size even in some case where splitting is required for sinking. Just to be safe what about checking if the number of predecessors is larger than some threshold from cl::opt before splitting?

If you haven't seen any real code size regressions, i think you should just be prepared to address it in the future, and let's leave it at that for now :)
Thanks for working through all this.

junbuml mentioned this in D37076: [LICM] Allow sinking when foldable in loop.Sep 13 2017, 2:06 PM

Kindly ping. Please let me know any comment.

Ran tests for spec2000/2006/2017 on aarch64, and there was no clear change in performance and size. However, observed 4% performance gain in spec2006/xalancbmk when applied together with my another LICM patch (https://reviews.llvm.org/D37076).

Minor update in comments and fixed a test case failure in subreg-postra-2.ll by preventing the select instruction from being sunk.

Herald added a subscriber: nemanjai. · View Herald TranscriptSep 26 2017, 9:44 AM

Kindly ping.

Kindly ping one more time. Please let me know any comment.

LGTM

This revision is now accepted and ready to land.Oct 20 2017, 1:41 PM

Tony,
Can you please confirm if the change in subreg-postra-2.ll doesn't break the original intention of the test?
Thanks,
Jun

I'm still not clear if the change in subreg-postra-2.ll is okay. Can anyone who know about this PowerPC test please review the change in the test.
Thanks,
Jun

hfinkel added inline comments.Oct 31 2017, 4:15 PM

test/CodeGen/PowerPC/subreg-postra-2.ll
7 ↗	(On Diff #120267)	Do you need to change this test at all? Would using the flag -ppc-gep-opt=0 enable you to leave the test as is?

Added "-ppc-gep-opt=0". No change in the test with/without this change. The change in the second RUN is because of using -ppc-gep-opt=0, but this doesn't seem to break the original intention of this test.

Rebased. I will commit this if the change in PowerPC test (subreg-postra-2.ll) is okay.
Thanks,
Jun

In D37163#914400, @junbuml wrote:

Rebased. I will commit this if the change in PowerPC test (subreg-postra-2.ll) is okay.

Yes, looks fine to me.

Thanks,
Jun

Closed by commit rL317335: [LICM] sink through non-trivially replicable PHI (authored by junbuml). · Explain WhyNov 3 2017, 9:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

LICM.cpp

196 lines

test/

CodeGen/

PowerPC/

subreg-postra-2.ll

8 lines

Transforms/

LICM/

sinking.ll

284 lines

Diff 121492

llvm/trunk/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PredIteratorCache.h"		#include "llvm/IR/PredIteratorCache.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/LoopPassManager.h"		#include "llvm/Transforms/Scalar/LoopPassManager.h"
		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"		#include "llvm/Transforms/Utils/SSAUpdater.h"
#include <algorithm>		#include <algorithm>
#include <utility>		#include <utility>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "licm"		#define DEBUG_TYPE "licm"
Show All 15 Lines	cl::desc("Max num uses visited for identifying load "
"invariance in loop using invariant start (default = 8)"));		"invariance in loop using invariant start (default = 8)"));

static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);		static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);
static bool isNotUsedInLoop(const Instruction &I, const Loop *CurLoop,		static bool isNotUsedInLoop(const Instruction &I, const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo);		const LoopSafetyInfo *SafetyInfo);
static bool hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static bool hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);
static bool sink(Instruction &I, const LoopInfo LI, const DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
const Loop CurLoop, AliasSetTracker CurAST,		const Loop CurLoop, const LoopSafetyInfo SafetyInfo,
const LoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);
static bool isSafeToExecuteUnconditionally(Instruction &Inst,		static bool isSafeToExecuteUnconditionally(Instruction &Inst,
const DominatorTree *DT,		const DominatorTree *DT,
const Loop *CurLoop,		const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
const Instruction *CtxI = nullptr);		const Instruction *CtxI = nullptr);
static bool pointerInvalidatedByLoop(Value *V, uint64_t Size,		static bool pointerInvalidatedByLoop(Value *V, uint64_t Size,
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator II = BB->end(); II != BB->begin();) {

// Check to see if we can sink this instruction to the exit blocks		// Check to see if we can sink this instruction to the exit blocks
// of the loop. We can do this if the all users of the instruction are		// of the loop. We can do this if the all users of the instruction are
// outside of the loop. In this case, it doesn't even matter if the		// outside of the loop. In this case, it doesn't even matter if the
// operands of the instruction are loop invariant.		// operands of the instruction are loop invariant.
//		//
if (isNotUsedInLoop(I, CurLoop, SafetyInfo) &&		if (isNotUsedInLoop(I, CurLoop, SafetyInfo) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, SafetyInfo, ORE)) {		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, SafetyInfo, ORE)) {
		if (sink(I, LI, DT, CurLoop, SafetyInfo, ORE)) {
++II;		++II;
Changed \|= sink(I, LI, DT, CurLoop, CurAST, SafetyInfo, ORE);		CurAST->deleteValue(&I);
		I.eraseFromParent();
		Changed = true;
		}
}		}
}		}
}		}
return Changed;		return Changed;
}		}

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in depth first		/// the specified block, and that are in the current loop) in depth first
▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	if (const PHINode *PN = dyn_cast<PHINode>(UI)) {
return false;		return false;

// We need to sink a callsite to a unique funclet. Avoid sinking if the		// We need to sink a callsite to a unique funclet. Avoid sinking if the
// phi use is too muddled.		// phi use is too muddled.
if (isa<CallInst>(I))		if (isa<CallInst>(I))
if (!BlockColors.empty() &&		if (!BlockColors.empty() &&
BlockColors.find(const_cast<BasicBlock *>(BB))->second.size() != 1)		BlockColors.find(const_cast<BasicBlock *>(BB))->second.size() != 1)
return false;		return false;

// A PHI node where all of the incoming values are this instruction are
// special -- they can just be RAUW'ed with the instruction and thus
// don't require a use in the predecessor. This is a particular important
// special case because it is the pattern found in LCSSA form.
if (isTriviallyReplacablePHI(*PN, I)) {
if (CurLoop->contains(PN))
return false;
else
continue;
}

// Otherwise, PHI node uses occur in predecessor blocks if the incoming
// values. Check for such a use being inside the loop.
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
if (PN->getIncomingValue(i) == &I)
if (CurLoop->contains(PN->getIncomingBlock(i)))
return false;

continue;
}		}

if (CurLoop->contains(UI))		if (CurLoop->contains(UI))
return false;		return false;
}		}
return true;		return true;
}		}

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	if (Instruction OInst = dyn_cast<Instruction>(OI))
OInst->getName() + ".lcssa", &ExitBlock.front());		OInst->getName() + ".lcssa", &ExitBlock.front());
for (unsigned i = 0, e = PN.getNumIncomingValues(); i != e; ++i)		for (unsigned i = 0, e = PN.getNumIncomingValues(); i != e; ++i)
OpPN->addIncoming(OInst, PN.getIncomingBlock(i));		OpPN->addIncoming(OInst, PN.getIncomingBlock(i));
*OI = OpPN;		*OI = OpPN;
}		}
return New;		return New;
}		}

		static Instruction *sinkThroughTriviallyReplacablePHI(
		PHINode TPN, Instruction I, LoopInfo *LI,
		SmallDenseMap<BasicBlock , Instruction , 32> &SunkCopies,
		const LoopSafetyInfo SafetyInfo, const Loop CurLoop) {
		assert(isTriviallyReplacablePHI(TPN, I) &&
		"Expect only trivially replacalbe PHI");
		BasicBlock *ExitBlock = TPN->getParent();
		Instruction *New;
		auto It = SunkCopies.find(ExitBlock);
		if (It != SunkCopies.end())
		New = It->second;
		else
		New = SunkCopies[ExitBlock] =
		CloneInstructionInExitBlock(I, ExitBlock, *TPN, LI, SafetyInfo);
		return New;
		}

		static bool canSplitPredecessors(PHINode *PN) {
		BasicBlock *BB = PN->getParent();
		if (!BB->canSplitPredecessors())
		return false;
		for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
		BasicBlock BBPred = PI;
		if (isa<IndirectBrInst>(BBPred->getTerminator()))
		return false;
		}
		return true;
		}

		static void splitPredecessorsOfLoopExit(PHINode PN, DominatorTree DT,
		LoopInfo LI, const Loop CurLoop) {
		#ifndef NDEBUG
		SmallVector<BasicBlock *, 32> ExitBlocks;
		CurLoop->getUniqueExitBlocks(ExitBlocks);
		SmallPtrSet<BasicBlock *, 32> ExitBlockSet(ExitBlocks.begin(),
		ExitBlocks.end());
		#endif
		BasicBlock *ExitBB = PN->getParent();
		assert(ExitBlockSet.count(ExitBB) && "Expect the PHI is in an exit block.");

		// Split predecessors of the loop exit to make instructions in the loop are
		// exposed to exit blocks through trivially replacable PHIs while keeping the
		// loop in the canonical form where each predecessor of each exit block should
		// be contained within the loop. For example, this will convert the loop below
		// from
		//
		// LB1:
		// %v1 =
		// br %LE, %LB2
		// LB2:
		// %v2 =
		// br %LE, %LB1
		// LE:
		// %p = phi [%v1, %LB1], [%v2, %LB2] <-- non-trivially replacable
		//
		// to
		//
		// LB1:
		// %v1 =
		// br %LE.split, %LB2
		// LB2:
		// %v2 =
		// br %LE.split2, %LB1
		// LE.split:
		// %p1 = phi [%v1, %LB1] <-- trivially replacable
		// br %LE
		// LE.split2:
		// %p2 = phi [%v2, %LB2] <-- trivially replacable
		// br %LE
		// LE:
		// %p = phi [%p1, %LE.split], [%p2, %LE.split2]
		//
		SmallSetVector<BasicBlock *, 8> PredBBs(pred_begin(ExitBB), pred_end(ExitBB));
		while (!PredBBs.empty()) {
		BasicBlock PredBB = PredBBs.begin();
		assert(CurLoop->contains(PredBB) &&
		"Expect all predecessors are in the loop");
		if (PN->getBasicBlockIndex(PredBB) >= 0)
		SplitBlockPredecessors(ExitBB, PredBB, ".split.loop.exit", DT, LI, true);
		PredBBs.remove(PredBB);
		}
		}

/// When an instruction is found to only be used outside of the loop, this		/// When an instruction is found to only be used outside of the loop, this
/// function moves it to the exit blocks and patches up SSA form as needed.		/// function moves it to the exit blocks and patches up SSA form as needed.
/// This method is guaranteed to remove the original instruction from its		/// This method is guaranteed to remove the original instruction from its
/// position, and may either delete it or move it to outside of the loop.		/// position, and may either delete it or move it to outside of the loop.
///		///
static bool sink(Instruction &I, const LoopInfo LI, const DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
const Loop CurLoop, AliasSetTracker CurAST,		const Loop CurLoop, const LoopSafetyInfo SafetyInfo,
const LoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
DEBUG(dbgs() << "LICM sinking instruction: " << I << "\n");		DEBUG(dbgs() << "LICM sinking instruction: " << I << "\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "InstSunk", &I)		return OptimizationRemark(DEBUG_TYPE, "InstSunk", &I)
<< "sinking " << ore::NV("Inst", &I);		<< "sinking " << ore::NV("Inst", &I);
});		});
bool Changed = false;		bool Changed = false;
if (isa<LoadInst>(I))		if (isa<LoadInst>(I))
++NumMovedLoads;		++NumMovedLoads;
else if (isa<CallInst>(I))		else if (isa<CallInst>(I))
++NumMovedCalls;		++NumMovedCalls;
++NumSunk;		++NumSunk;
Changed = true;		Changed = true;

#ifndef NDEBUG		// Iterate over users to be ready for actual sinking. Replace users via
SmallVector<BasicBlock *, 32> ExitBlocks;		// unrechable blocks with undef and make all user PHIs trivially replcable.
CurLoop->getUniqueExitBlocks(ExitBlocks);		SmallPtrSet<Instruction *, 8> VisitedUsers;
SmallPtrSet<BasicBlock *, 32> ExitBlockSet(ExitBlocks.begin(),		for (Value::user_iterator UI = I.user_begin(), UE = I.user_end(); UI != UE;) {
ExitBlocks.end());		auto User = cast<Instruction>(UI);
#endif		Use &U = UI.getUse();
		++UI;

// Clones of this instruction. Don't create more than one per exit block!		if (VisitedUsers.count(User))
SmallDenseMap<BasicBlock , Instruction , 32> SunkCopies;		continue;

// If this instruction is only used outside of the loop, then all users are
// PHI nodes in exit blocks due to LCSSA form. Just RAUW them with clones of
// the instruction.
while (!I.use_empty()) {
Value::user_iterator UI = I.user_begin();
auto User = cast<Instruction>(UI);
if (!DT->isReachableFromEntry(User->getParent())) {		if (!DT->isReachableFromEntry(User->getParent())) {
User->replaceUsesOfWith(&I, UndefValue::get(I.getType()));		User->replaceUsesOfWith(&I, UndefValue::get(I.getType()));
continue;		continue;
}		}

// The user must be a PHI node.		// The user must be a PHI node.
PHINode *PN = cast<PHINode>(User);		PHINode *PN = cast<PHINode>(User);

// Surprisingly, instructions can be used outside of loops without any		// Surprisingly, instructions can be used outside of loops without any
// exits. This can only happen in PHI nodes if the incoming block is		// exits. This can only happen in PHI nodes if the incoming block is
// unreachable.		// unreachable.
Use &U = UI.getUse();
BasicBlock *BB = PN->getIncomingBlock(U);		BasicBlock *BB = PN->getIncomingBlock(U);
if (!DT->isReachableFromEntry(BB)) {		if (!DT->isReachableFromEntry(BB)) {
U = UndefValue::get(I.getType());		U = UndefValue::get(I.getType());
continue;		continue;
}		}

BasicBlock *ExitBlock = PN->getParent();		VisitedUsers.insert(PN);
assert(ExitBlockSet.count(ExitBlock) &&		if (isTriviallyReplacablePHI(*PN, I))
"The LCSSA PHI is not in an exit block!");		continue;

Instruction *New;		if (!canSplitPredecessors(PN))
auto It = SunkCopies.find(ExitBlock);		return false;
if (It != SunkCopies.end())
New = It->second;		// Split predecessors of the PHI so that we can make users trivially
else		// replacable.
New = SunkCopies[ExitBlock] =		splitPredecessorsOfLoopExit(PN, DT, LI, CurLoop);
CloneInstructionInExitBlock(I, ExitBlock, PN, LI, SafetyInfo);
		// Should rebuild the iterators, as they may be invalidated by
		// splitPredecessorsOfLoopExit().
		UI = I.user_begin();
		UE = I.user_end();
		}

		#ifndef NDEBUG
		SmallVector<BasicBlock *, 32> ExitBlocks;
		CurLoop->getUniqueExitBlocks(ExitBlocks);
		SmallPtrSet<BasicBlock *, 32> ExitBlockSet(ExitBlocks.begin(),
		ExitBlocks.end());
		#endif

		// Clones of this instruction. Don't create more than one per exit block!
		SmallDenseMap<BasicBlock , Instruction , 32> SunkCopies;

		// If this instruction is only used outside of the loop, then all users are
		// PHI nodes in exit blocks due to LCSSA form. Just RAUW them with clones of
		// the instruction.
		while (!I.use_empty()) {
		Value::user_iterator UI = I.user_begin();
		PHINode PN = cast<PHINode>(UI);
		assert(ExitBlockSet.count(PN->getParent()) &&
		"The LCSSA PHI is not in an exit block!");
		// The PHI must be trivially replacable.
		Instruction *New = sinkThroughTriviallyReplacablePHI(PN, &I, LI, SunkCopies,
		SafetyInfo, CurLoop);
PN->replaceAllUsesWith(New);		PN->replaceAllUsesWith(New);
PN->eraseFromParent();		PN->eraseFromParent();
}		}

CurAST->deleteValue(&I);
I.eraseFromParent();
return Changed;		return Changed;
}		}

/// When an instruction is found to only use loop invariant operands that		/// When an instruction is found to only use loop invariant operands that
/// is safe to hoist, this instruction is called to do the dirty work.		/// is safe to hoist, this instruction is called to do the dirty work.
///		///
static bool hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static bool hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
▲ Show 20 Lines • Show All 530 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/subreg-postra-2.ll

	; RUN: llc -verify-machineinstrs -mcpu=pwr7 < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-gep-opt=0 < %s \| FileCheck %s
	; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-gen-isel=false < %s \| FileCheck --check-prefix=CHECK-NO-ISEL %s			; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-gen-isel=false -ppc-gep-opt=0 < %s \| FileCheck --check-prefix=CHECK-NO-ISEL %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @jbd2_journal_commit_transaction(i32 %input1, i32* %input2, i32* %input3, i8** %input4) #0 {			define void @jbd2_journal_commit_transaction(i32 %input1, i32* %input2, i32* %input3, i8** %input4) #0 {
	entry:			entry:
	br label %while.body392			br label %while.body392

	Show All 22 Lines

	; CHECK-LABEL: @jbd2_journal_commit_transaction			; CHECK-LABEL: @jbd2_journal_commit_transaction
	; CHECK-NO-ISEL-LABEL: @jbd2_journal_commit_transaction			; CHECK-NO-ISEL-LABEL: @jbd2_journal_commit_transaction
	; CHECK: andi.			; CHECK: andi.
	; CHECK: crmove [[REG:[0-9]+]], 1			; CHECK: crmove [[REG:[0-9]+]], 1
	; CHECK: stdcx.			; CHECK: stdcx.
	; CHECK: isel {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}, [[REG]]			; CHECK: isel {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}, [[REG]]
	; CHECK-NO-ISEL: bc 12, 20, [[TRUE:.LBB[0-9]+]]			; CHECK-NO-ISEL: bc 12, 20, [[TRUE:.LBB[0-9]+]]
	; CHECK-NO-ISEL: ori 4, 7, 0			; CHECK-NO-ISEL: ori 7, 8, 0
	; CHECK-NO-ISEL-NEXT: b [[SUCCESSOR:.LBB[0-9]+]]			; CHECK-NO-ISEL-NEXT: b [[SUCCESSOR:.LBB[0-9]+]]
	; CHECK-NO-ISEL: [[TRUE]]			; CHECK-NO-ISEL: [[TRUE]]
	; CHECK-NO-ISEL-NEXT: addi 4, 3, 0			; CHECK-NO-ISEL: addi 7, 3, 0

	if.then420: ; preds = %while.end418			if.then420: ; preds = %while.end418
	unreachable			unreachable

	if.end421: ; preds = %while.end418			if.end421: ; preds = %while.end418
	unreachable			unreachable

	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/trunk/test/Transforms/LICM/sinking.ll

	Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines
	lab60:			lab60:
	; CHECK: lab60:			; CHECK: lab60:
	; CHECK: store			; CHECK: store
	; CHECK-NEXT: indirectbr			; CHECK-NEXT: indirectbr
	store i32 2145244101, i32* undef, align 4			store i32 2145244101, i32* undef, align 4
	indirectbr i8* undef, [label %lab21, label %lab19]			indirectbr i8* undef, [label %lab21, label %lab19]
	}			}

	declare void @f(i32*)			; Check if LICM can sink a sinkable instruction the exit blocks through
				; a non-trivially replacable PHI node.
				;
				; CHECK-LABEL: @test14
				; CHECK-LABEL: Loop:
				; CHECK-NOT: mul
				; CHECK-NOT: sub
				;
				; CHECK-LABEL: Out12.split.loop.exit:
				; CHECK: %[[LCSSAPHI:.*]] = phi i32 [ %N_addr.0.pn, %ContLoop ]
				; CHECK: %[[MUL:.*]] = mul i32 %N, %[[LCSSAPHI]]
				; CHECK: br label %Out12
				;
				; CHECK-LABEL: Out12.split.loop.exit1:
				; CHECK: %[[LCSSAPHI2:.*]] = phi i32 [ %N_addr.0.pn, %Loop ]
				; CHECK: %[[MUL2:.*]] = mul i32 %N, %[[LCSSAPHI2]]
				; CHECK: %[[SUB:.*]] = sub i32 %[[MUL2]], %N
				; CHECK: br label %Out12
				;
				; CHECK-LABEL: Out12:
				; CHECK: phi i32 [ %[[MUL]], %Out12.split.loop.exit ], [ %[[SUB]], %Out12.split.loop.exit1 ]
				define i32 @test14(i32 %N, i32 %N2, i1 %C) {
				Entry:
				br label %Loop
				Loop:
				%N_addr.0.pn = phi i32 [ %dec, %ContLoop ], [ %N, %Entry ]
				%sink.mul = mul i32 %N, %N_addr.0.pn
				%sink.sub = sub i32 %sink.mul, %N
				%dec = add i32 %N_addr.0.pn, -1
				br i1 %C, label %ContLoop, label %Out12
				ContLoop:
				%tmp.1 = icmp ne i32 %N_addr.0.pn, 1
				br i1 %tmp.1, label %Loop, label %Out12
				Out12:
				%tmp = phi i32 [%sink.mul, %ContLoop], [%sink.sub, %Loop]
				ret i32 %tmp
				}

				; In this test, splitting predecessors is not really required because the
				; operations of sinkable instructions (sub and mul) are same. In this case, we
				; can sink the same sinkable operations and modify the PHI to pass the operands
				; to the shared operations. As of now, we split predecessors of non-trivially
				; replicalbe PHIs by default in LICM because all incoming edges of a
				; non-trivially replacable PHI in LCSSA is critical.
				;
				; CHECK-LABEL: @test15
				; CHECK-LABEL: Loop:
				; CHECK-NOT: mul
				; CHECK-NOT: sub
				;
				; CHECK-LABEL: Out12.split.loop.exit:
				; CHECK: %[[LCSSAPHI:.*]] = phi i32 [ %N_addr.0.pn, %ContLoop ]
				; CHECK: %[[MUL:.*]] = mul i32 %N, %[[LCSSAPHI]]
				; CHECK: %[[SUB:.*]] = sub i32 %[[MUL]], %N2
				; CHECK: br label %Out12
				;
				; CHECK-LABEL: Out12.split.loop.exit1:
				; CHECK: %[[LCSSAPHI2:.*]] = phi i32 [ %N_addr.0.pn, %Loop ]
				; CHECK: %[[MUL2:.*]] = mul i32 %N, %[[LCSSAPHI2]]
				; CHECK: %[[SUB2:.*]] = sub i32 %[[MUL2]], %N
				; CHECK: br label %Out12
				;
				; CHECK-LABEL: Out12:
				; CHECK: phi i32 [ %[[SUB]], %Out12.split.loop.exit ], [ %[[SUB2]], %Out12.split.loop.exit1 ]
				define i32 @test15(i32 %N, i32 %N2, i1 %C) {
				Entry:
				br label %Loop
				Loop:
				%N_addr.0.pn = phi i32 [ %dec, %ContLoop ], [ %N, %Entry ]
				%sink.mul = mul i32 %N, %N_addr.0.pn
				%sink.sub = sub i32 %sink.mul, %N
				%sink.sub2 = sub i32 %sink.mul, %N2
				%dec = add i32 %N_addr.0.pn, -1
				br i1 %C, label %ContLoop, label %Out12
				ContLoop:
				%tmp.1 = icmp ne i32 %N_addr.0.pn, 1
				br i1 %tmp.1, label %Loop, label %Out12
				Out12:
				%tmp = phi i32 [%sink.sub2, %ContLoop], [%sink.sub, %Loop]
				ret i32 %tmp
				}

				; Sink through a non-trivially replacable PHI node which use the same sinkable
				; instruction multiple times.
				;
				; CHECK-LABEL: @test16
				; CHECK-LABEL: Loop:
				; CHECK-NOT: mul
				;
				; CHECK-LABEL: Out.split.loop.exit:
				; CHECK: %[[PHI:.*]] = phi i32 [ %l2, %ContLoop ]
				; CHECK: br label %Out
				;
				; CHECK-LABEL: Out.split.loop.exit1:
				; CHECK: %[[SINKABLE:.*]] = mul i32 %l2.lcssa, %t.le
				; CHECK: br label %Out
				;
				; CHECK-LABEL: Out:
				; CHECK: %idx = phi i32 [ %[[PHI]], %Out.split.loop.exit ], [ %[[SINKABLE]], %Out.split.loop.exit1 ]
				define i32 @test16(i1 %c, i8** %P, i32* %P2, i64 %V) {
				entry:
				br label %loop.ph
				loop.ph:
				br label %Loop
				Loop:
				%iv = phi i64 [ 0, %loop.ph ], [ %next, %ContLoop ]
				%l2 = call i32 @getv()
				%t = trunc i64 %iv to i32
				%sinkable = mul i32 %l2, %t
				switch i32 %l2, label %ContLoop [
				i32 32, label %Out
				i32 46, label %Out
				i32 95, label %Out
				]
				ContLoop:
				%next = add nuw i64 %iv, 1
				%c1 = call i1 @getc()
				br i1 %c1, label %Loop, label %Out
				Out:
				%idx = phi i32 [ %l2, %ContLoop ], [ %sinkable, %Loop ], [ %sinkable, %Loop ], [ %sinkable, %Loop ]
				ret i32 %idx
				}

				; Sink a sinkable instruction through multiple non-trivially replacable PHIs in
				; differect exit blocks.
				;
				; CHECK-LABEL: @test17
				; CHECK-LABEL: Loop:
				; CHECK-NOT: mul
				;
				; CHECK-LABEL:OutA.split.loop.exit{{.*}}:
				; CHECK: %[[OP1:.*]] = phi i32 [ %N_addr.0.pn, %ContLoop1 ]
				; CHECK: %[[SINKABLE:.*]] = mul i32 %N, %[[OP1]]
				; CHECK: br label %OutA
				;
				; CHECK-LABEL:OutA:
				; CHECK: phi i32{{.}}[ %[[SINKABLE]], %OutA.split.loop.exit{{.}} ]
				;
				; CHECK-LABEL:OutB.split.loop.exit{{.*}}:
				; CHECK: %[[OP2:.*]] = phi i32 [ %N_addr.0.pn, %ContLoop2 ]
				; CHECK: %[[SINKABLE2:.*]] = mul i32 %N, %[[OP2]]
				; CHECK: br label %OutB
				;
				; CHECK-LABEL:OutB:
				; CHECK: phi i32 {{.}}[ %[[SINKABLE2]], %OutB.split.loop.exit{{.}} ]
				define i32 @test17(i32 %N, i32 %N2) {
				Entry:
				br label %Loop
				Loop:
				%N_addr.0.pn = phi i32 [ %dec, %ContLoop3 ], [ %N, %Entry ]
				%sink.mul = mul i32 %N, %N_addr.0.pn
				%c0 = call i1 @getc()
				br i1 %c0 , label %ContLoop1, label %OutA
				ContLoop1:
				%c1 = call i1 @getc()
				br i1 %c1, label %ContLoop2, label %OutA

				ContLoop2:
				%c2 = call i1 @getc()
				br i1 %c2, label %ContLoop3, label %OutB
				ContLoop3:
				%c3 = call i1 @getc()
				%dec = add i32 %N_addr.0.pn, -1
				br i1 %c3, label %Loop, label %OutB
				OutA:
				%tmp1 = phi i32 [%sink.mul, %ContLoop1], [%N2, %Loop]
				br label %Out12
				OutB:
				%tmp2 = phi i32 [%sink.mul, %ContLoop2], [%dec, %ContLoop3]
				br label %Out12
				Out12:
				%tmp = phi i32 [%tmp1, %OutA], [%tmp2, %OutB]
				ret i32 %tmp
				}


				; Sink a sinkable instruction through both trivially and non-trivially replacable PHIs.
				;
				; CHECK-LABEL: @test18
				; CHECK-LABEL: Loop:
				; CHECK-NOT: mul
				; CHECK-NOT: sub
				;
				; CHECK-LABEL:Out12.split.loop.exit:
				; CHECK: %[[OP:.*]] = phi i32 [ %iv, %ContLoop ]
				; CHECK: %[[DEC:.*]] = phi i32 [ %dec, %ContLoop ]
				; CHECK: %[[SINKMUL:.*]] = mul i32 %N, %[[OP]]
				; CHECK: %[[SINKSUB:.*]] = sub i32 %[[SINKMUL]], %N2
				; CHECK: br label %Out12
				;
				; CHECK-LABEL:Out12.split.loop.exit1:
				; CHECK: %[[OP2:.*]] = phi i32 [ %iv, %Loop ]
				; CHECK: %[[SINKMUL2:.*]] = mul i32 %N, %[[OP2]]
				; CHECK: %[[SINKSUB2:.*]] = sub i32 %[[SINKMUL2]], %N2
				; CHECK: br label %Out12
				;
				; CHECK-LABEL:Out12:
				; CHECK: %tmp1 = phi i32 [ %[[SINKSUB]], %Out12.split.loop.exit ], [ %[[SINKSUB2]], %Out12.split.loop.exit1 ]
				; CHECK: %tmp2 = phi i32 [ %[[DEC]], %Out12.split.loop.exit ], [ %[[SINKSUB2]], %Out12.split.loop.exit1 ]
				; CHECK: %add = add i32 %tmp1, %tmp2
				define i32 @test18(i32 %N, i32 %N2) {
				Entry:
				br label %Loop
				Loop:
				%iv = phi i32 [ %dec, %ContLoop ], [ %N, %Entry ]
				%sink.mul = mul i32 %N, %iv
				%sink.sub = sub i32 %sink.mul, %N2
				%c0 = call i1 @getc()
				br i1 %c0, label %ContLoop, label %Out12
				ContLoop:
				%dec = add i32 %iv, -1
				%c1 = call i1 @getc()
				br i1 %c1, label %Loop, label %Out12
				Out12:
				%tmp1 = phi i32 [%sink.sub, %ContLoop], [%sink.sub, %Loop]
				%tmp2 = phi i32 [%dec, %ContLoop], [%sink.sub, %Loop]
				%add = add i32 %tmp1, %tmp2
				ret i32 %add
				}

				; Do not sink an instruction through a non-trivially replacable PHI, to avoid
				; assert while splitting predecessors, if the terminator of predecessor is an
				; indirectbr.
				; CHECK-LABEL: @test19
				; CHECK-LABEL: L0:
				; CHECK: %sinkable = mul
				; CHECK: %sinkable2 = add

				define i32 @test19(i1 %cond, i1 %cond2, i8* %address, i32 %v1) nounwind {
				entry:
				br label %L0
				L0:
				%indirect.goto.dest = select i1 %cond, i8* blockaddress(@test19, %exit), i8* %address
				%v2 = call i32 @getv()
				%sinkable = mul i32 %v1, %v2
				%sinkable2 = add i32 %v1, %v2
				indirectbr i8* %indirect.goto.dest, [label %L1, label %exit]

				L1:
				%indirect.goto.dest2 = select i1 %cond2, i8* blockaddress(@test19, %exit), i8* %address
				indirectbr i8* %indirect.goto.dest2, [label %L0, label %exit]

				exit:
				%r = phi i32 [%sinkable, %L0], [%sinkable2, %L1]
				ret i32 %r
				}


				; Do not sink through a non-trivially replacable PHI if splitting predecessors
				; not allowed in SplitBlockPredecessors().
				;
				; CHECK-LABEL: @test20
				; CHECK-LABEL: while.cond
				; CHECK: %sinkable = mul
				; CHECK: %sinkable2 = add
				define void @test20(i32* %s, i1 %b, i32 %v1, i32 %v2) personality i32 (...)* @__CxxFrameHandler3 {
				entry:
				br label %while.cond
				while.cond:
				%v = call i32 @getv()
				%sinkable = mul i32 %v, %v2
				%sinkable2 = add i32 %v, %v2
				br i1 %b, label %try.cont, label %while.body
				while.body:
				invoke void @may_throw()
				to label %while.body2 unwind label %catch.dispatch
				while.body2:
				invoke void @may_throw2()
				to label %while.cond unwind label %catch.dispatch
				catch.dispatch:
				%.lcssa1 = phi i32 [ %sinkable, %while.body ], [ %sinkable2, %while.body2 ]
				%cp = cleanuppad within none []
				store i32 %.lcssa1, i32* %s
				cleanupret from %cp unwind to caller
				try.cont:
				ret void
				}

				declare void @may_throw()
				declare void @may_throw2()
				declare i32 @__CxxFrameHandler3(...)
				declare i32 @getv()
				declare i1 @getc()
				declare void @f(i32*)
	declare void @g()			declare void @g()