This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Improve profitability check for folding PHI args
Needs ReviewPublic

Authored by uweigand on Aug 1 2017, 1:23 PM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
• dberlin

Summary

FoldPHIArgOpIntoPHI and related routines move operations from PHI
arguments to the result of the PHI node. To avoid creating more
new operations than are removed by the transformation, this is
currently only done if the original operations each have a single
use only (i.e. the PHI node). This is overly pessimistic for
two reasons:

A single operation can be used multiple times as input to the same PHI node. This is called out in a FIXME in the code: FIXME: The hasOneUse check will fail for PHIs that use the value more than themselves more than once.

Sometimes there are two (or more) PHI nodes using the same set of arguments (e.g. if a loop variable is also used after the end of the loop). If *all* these PHIs are transformed, the original argument operations will still be dead.

This patch implements an IsFoldPHIArgProfitable routine that
performs a more precise check to determine whether after all
PHI nodes are transformed, there will be no more operations
in total than we had before. This check is then used instead
of the hasOneUse checks throughout FoldPHIArgOpIntoPHI and
related routines.

The existing Transforms/LoopUnroll/runtime-loop-multiple-exits.ll
test case now shows further improvement (it was using a single
operation multiple times in the same PHI). A new test added to
Transforms/InstCombine/phi.ll demonstrates the multiple-PHI case.

Diff Detail

Event Timeline

uweigand created this revision.Aug 1 2017, 1:23 PM

Please please don't :)
Doing it the way this codepath does it can be exponential time/space
NewGVN will do the same thing, and in fact, catch all possible cases that can exist here in polynomial time.

I think the original check is a good compromise, but you are changing it to spend O(N^2) time per phi node, which, imho, is not good.
I strongly think we should just leave this one alone and be happy for now.
If you have pressing testcases, i'm happy to try to put someone on finishing the small amount of work on NewGVN bugs to get it on by default :)

FWIW, I'm currently working on squashing out the remaining problems in NewGVN (see updates in bugzilla for details).
I expect realistically that the pass will be enabled by default in a couple of months, given all the insane amount of fuzzing we did on it.

Well, I'd be fine with NewGVN doing this transform. But at least right now, it doesn't appear to be doing so -- running the new test case (@phi_multi in phi.ll) through "opt -O2 --enable-newgvn" does not move the zext/or out of the loop. Is this supposed to happen?

As to compile time, I didn't notice any increase in real-world tests. While it is true that the new check can be quadratic in the worst case, there are several early exits that are actually taken most of time, so *usually* the check has the same complexity as the hasOneUse test it removed. Only in the case where a value is used in more than one PHI node do we even start the more expensive check ...

NewGVN currently has a limitation that davide is fixing around recursive translation, so it won't get it quite yet.
Note also that the usual transform it performs is to remove the redundancy, not to just sink (which was the original usecase for the transform).
If you mainly care about the sinking, it may require a bit of thought. It's also worth noting our dividing line for instcombine was one expressed to me as "local transforms", and this is most certainly not :)
(IE if this is a local transform, i can express PRE in InstCombine, and have it do global PRE).

Note that the check is not just quadratic in the worst case because instcombine iterates. This mechanism of transforming phi of ops into op of phis can be exponential time worst case even without instcombine's iteration :)
But i'll agree that i would expect the early exits to be hit mostly.

In any case, is this testcase the main testcase you care about? Is it a real performance win for your platform in general (IE how much is it buying us)?

The testcase is extracted from a real-word program. On that program, the transformation (moving some of those operations out of a hot loop) is a significant overall win (about 10% improved performance of the whole program). I agree that this application is a quite special case -- this patch doesn't make much difference to the overall performance of the platform in general.

Agreed that the overall InstCombine algorithm can exhibit performance issues, and should probably be replaced in the long term. But I don't think this particular patch makes those issues significantly worse :-)

In D36172#829179, @uweigand wrote:

The testcase is extracted from a real-word program. On that program, the transformation (moving some of those operations out of a hot loop) is a significant overall win (about 10% improved performance of the whole program). I agree that this application is a quite special case -- this patch doesn't make much difference to the overall performance of the platform in general.

I'd be against doing it for this reason, but i'll leave it to others.

Agreed that the overall InstCombine algorithm can exhibit performance issues, and should probably be replaced in the long term. But I don't think this particular patch makes those issues significantly worse :-)

Let be very clear: I'm actually talking about *this specific transformation*, not instcombine.

The general transformations between
a = x op y
b = x op y
result = phi(a, b)
and
1 = phi(x, y)
2 = phi(x, y)
result = 1 op 2

is exponential when applied repeatedly.

The original version you replaced limited this in a way that it would not be because of the single use restriction
I'm pretty positive yours, combined with instcombine's iteration, can be shown to be exponential worst case

In D36172#829195, @dberlin wrote:

In D36172#829179, @uweigand wrote:

The testcase is extracted from a real-word program. On that program, the transformation (moving some of those operations out of a hot loop) is a significant overall win (about 10% improved performance of the whole program). I agree that this application is a quite special case -- this patch doesn't make much difference to the overall performance of the platform in general.

I'd be against doing it for this reason, but i'll leave it to others.

Agreed that the overall InstCombine algorithm can exhibit performance issues, and should probably be replaced in the long term. But I don't think this particular patch makes those issues significantly worse :-)

Let be very clear: I'm actually talking about *this specific transformation*, not instcombine.

The general transformations between
a = x op y
b = x op y
result = phi(a, b)
and
1 = phi(x, y)
2 = phi(x, y)
result = 1 op 2

is exponential when applied repeatedly.

and NewGVN is only able to avoid such behavior by

memoizing the results properly, since it's a value numberer
Only attempting the transform when it sees something that could possibly be redundant as a result.

(but even then, the transform is still polynomial rather than linear)

Neither of these limitations apply here.
Given the right program, i believe you will happily go off into exponential land :)

In D36172#829195, @dberlin wrote:

Let be very clear: I'm actually talking about *this specific transformation*, not instcombine.

The general transformations between
a = x op y
b = x op y
result = phi(a, b)
and
1 = phi(x, y)
2 = phi(x, y)
result = 1 op 2

is exponential when applied repeatedly.

Well, yes, because that would introduce two phis where there originally was just one. But InstCombine doesn't do that, either with or without my patch. Before the patch, InstCombine used to transform:

a = x op c
b = y op c
result = phi(a, b)
to
tmp = phi(x, y)
result = tmp op c

Now, it will in addition also transform:

a = x op c
b = y op c
result1 = phi(a, b)
result2 = phi(a, b)
to
tmp1 = phi(x, y)
result1 = tmp1 op c
tmp2 = phi(x, y)
result2 = tmp2 op c

So with my patch, it may introduce more copies of the "op" (but still not more that we had before the transformation). But either with or without my patch, there will never be more *phis*. This is not because of the single-use check, but because we only move unary ops, or binary ops where the other operand is invariant, over the phi.

In D36172#829223, @uweigand wrote:

In D36172#829195, @dberlin wrote:

Let be very clear: I'm actually talking about *this specific transformation*, not instcombine.

The general transformations between
a = x op y
b = x op y
result = phi(a, b)
and
1 = phi(x, y)
2 = phi(x, y)
result = 1 op 2

is exponential when applied repeatedly.

Well, yes, because that would introduce two phis where there originally was just one. But InstCombine doesn't do that, either with or without my patch. Before the patch, InstCombine used to transform:

That's the space issue, but the time issue is the actual evaluation .
You have definitely solved the space issue, i will not disagree there.

I was also just giving an example of the generalization of this transform.

Anyway, i've said my say, so i'm going to leave this for someone else to review and decide on :)

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombinePHI.cpp

87 lines

test/

Transforms/

InstCombine/

phi.ll

53 lines

LoopUnroll/

runtime-loop-multiple-exits.ll

12 lines

Diff 109190

lib/Transforms/InstCombine/InstCombinePHI.cpp

Show All 18 Lines
#include "llvm/IR/DebugInfo.h"		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

		/// Verify that if we fold the PHI arguments into a single operation with
		/// the PHI node as input, the original operations become unused. This
		/// is the case if the only user(s) of the original operations are either
		/// this PHI node, or another PHI node with the same set of arguments
		/// (because the operation will be performed on all such nodes if it is
		/// performed on any such node). In the case of multiple PHI nodes, we
		/// only want to perform the transformation if the number of PHI nodes
		/// (i.e. the number of new operations that will be introduced) is not
		/// larger than the number of original operations.
		static bool IsFoldPHIArgProfitable(PHINode &PN) {
		SmallPtrSet<PHINode*, 8> PHIs;
		SmallPtrSet<Value*, 8> Values;

		PHIs.insert(&PN);
		for (Value *IncValue : PN.incoming_values())
		Values.insert(IncValue);

		for (Value *IncValue : Values)
		if (!isa<Constant>(IncValue))
		for (auto &U : IncValue->uses()) {
		// All uses must be PHI nodes.
		auto *P = dyn_cast<PHINode>(U.getUser());
		if (!P)
		return false;
		// If we already know this PHI node, we're good. Otherwise,
		// add it to our set. If the set is now so large that the
		// operation would become unprofitable, bail.
		if (!PHIs.insert(P).second)
		continue;
		if (PHIs.size() > Values.size())
		return false;
		// Verify that the new PHI node has exactly the same set of
		// argument values as the original one.
		SmallPtrSet<Value*, 8> OtherValues;
		for (Value *OtherValue : P->incoming_values()) {
		OtherValues.insert(OtherValue);
		if (!Values.count(OtherValue))
		return false;
		}
		if (OtherValues.size() != Values.size())
		return false;
		}

		return true;
		}

/// The PHI arguments will be folded into a single operation with a PHI node		/// The PHI arguments will be folded into a single operation with a PHI node
/// as input. The debug location of the single operation will be the merged		/// as input. The debug location of the single operation will be the merged
/// locations of the original PHI node arguments.		/// locations of the original PHI node arguments.
DebugLoc InstCombiner::PHIArgMergedDebugLoc(PHINode &PN) {		DebugLoc InstCombiner::PHIArgMergedDebugLoc(PHINode &PN) {
auto *FirstInst = cast<Instruction>(PN.getIncomingValue(0));		auto *FirstInst = cast<Instruction>(PN.getIncomingValue(0));
const DILocation *Loc = FirstInst->getDebugLoc();		const DILocation *Loc = FirstInst->getDebugLoc();

for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {		for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {
Show All 11 Lines	Instruction *InstCombiner::FoldPHIArgBinOpIntoPHI(PHINode &PN) {
assert(isa<BinaryOperator>(FirstInst) \|\| isa<CmpInst>(FirstInst));		assert(isa<BinaryOperator>(FirstInst) \|\| isa<CmpInst>(FirstInst));
unsigned Opc = FirstInst->getOpcode();		unsigned Opc = FirstInst->getOpcode();
Value *LHSVal = FirstInst->getOperand(0);		Value *LHSVal = FirstInst->getOperand(0);
Value *RHSVal = FirstInst->getOperand(1);		Value *RHSVal = FirstInst->getOperand(1);

Type *LHSType = LHSVal->getType();		Type *LHSType = LHSVal->getType();
Type *RHSType = RHSVal->getType();		Type *RHSType = RHSVal->getType();

// Scan to see if all operands are the same opcode, and all have one use.		// Scan to see if all operands are the same opcode.
for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {		for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {
Instruction *I = dyn_cast<Instruction>(PN.getIncomingValue(i));		Instruction *I = dyn_cast<Instruction>(PN.getIncomingValue(i));
if (!I \|\| I->getOpcode() != Opc \|\| !I->hasOneUse() \|\|		if (!I \|\| I->getOpcode() != Opc \|\|
// Verify type of the LHS matches so we don't fold cmp's of different		// Verify type of the LHS matches so we don't fold cmp's of different
// types.		// types.
I->getOperand(0)->getType() != LHSType \|\|		I->getOperand(0)->getType() != LHSType \|\|
I->getOperand(1)->getType() != RHSType)		I->getOperand(1)->getType() != RHSType)
return nullptr;		return nullptr;

// If they are CmpInst instructions, check their predicates		// If they are CmpInst instructions, check their predicates
if (CmpInst *CI = dyn_cast<CmpInst>(I))		if (CmpInst *CI = dyn_cast<CmpInst>(I))
if (CI->getPredicate() != cast<CmpInst>(FirstInst)->getPredicate())		if (CI->getPredicate() != cast<CmpInst>(FirstInst)->getPredicate())
return nullptr;		return nullptr;

// Keep track of which operand needs a phi node.		// Keep track of which operand needs a phi node.
if (I->getOperand(0) != LHSVal) LHSVal = nullptr;		if (I->getOperand(0) != LHSVal) LHSVal = nullptr;
if (I->getOperand(1) != RHSVal) RHSVal = nullptr;		if (I->getOperand(1) != RHSVal) RHSVal = nullptr;
}		}

// If both LHS and RHS would need a PHI, don't do this transformation,		// If both LHS and RHS would need a PHI, don't do this transformation,
// because it would increase the number of PHIs entering the block,		// because it would increase the number of PHIs entering the block,
// which leads to higher register pressure. This is especially		// which leads to higher register pressure. This is especially
// bad when the PHIs are in the header of a loop.		// bad when the PHIs are in the header of a loop.
if (!LHSVal && !RHSVal)		if (!LHSVal && !RHSVal)
return nullptr;		return nullptr;

		// Verify that folding is profitable (the original operands are dead).
		if (!IsFoldPHIArgProfitable(PN))
		return nullptr;

// Otherwise, this is safe to transform!		// Otherwise, this is safe to transform!

Value *InLHS = FirstInst->getOperand(0);		Value *InLHS = FirstInst->getOperand(0);
Value *InRHS = FirstInst->getOperand(1);		Value *InRHS = FirstInst->getOperand(1);
PHINode NewLHS = nullptr, NewRHS = nullptr;		PHINode NewLHS = nullptr, NewRHS = nullptr;
if (!LHSVal) {		if (!LHSVal) {
NewLHS = PHINode::Create(LHSType, PN.getNumIncomingValues(),		NewLHS = PHINode::Create(LHSType, PN.getNumIncomingValues(),
FirstInst->getOperand(0)->getName() + ".pn");		FirstInst->getOperand(0)->getName() + ".pn");
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::FoldPHIArgGEPIntoPHI(PHINode &PN) {

// We don't want to replace this phi if the replacement would require		// We don't want to replace this phi if the replacement would require
// more than one phi, which leads to higher register pressure. This is		// more than one phi, which leads to higher register pressure. This is
// especially bad when the PHIs are in the header of a loop.		// especially bad when the PHIs are in the header of a loop.
bool NeededPhi = false;		bool NeededPhi = false;

bool AllInBounds = true;		bool AllInBounds = true;

// Scan to see if all operands are the same opcode, and all have one use.		// Scan to see if all operands are the same opcode.
for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {		for (unsigned i = 1; i != PN.getNumIncomingValues(); ++i) {
GetElementPtrInst *GEP= dyn_cast<GetElementPtrInst>(PN.getIncomingValue(i));		GetElementPtrInst *GEP= dyn_cast<GetElementPtrInst>(PN.getIncomingValue(i));
if (!GEP \|\| !GEP->hasOneUse() \|\| GEP->getType() != FirstInst->getType() \|\|		if (!GEP \|\| GEP->getType() != FirstInst->getType() \|\|
GEP->getNumOperands() != FirstInst->getNumOperands())		GEP->getNumOperands() != FirstInst->getNumOperands())
return nullptr;		return nullptr;

AllInBounds &= GEP->isInBounds();		AllInBounds &= GEP->isInBounds();

// Keep track of whether or not all GEPs are of alloca pointers.		// Keep track of whether or not all GEPs are of alloca pointers.
if (AllBasePointersAreAllocas &&		if (AllBasePointersAreAllocas &&
(!isa<AllocaInst>(GEP->getOperand(0)) \|\|		(!isa<AllocaInst>(GEP->getOperand(0)) \|\|
Show All 33 Lines	Instruction *InstCombiner::FoldPHIArgGEPIntoPHI(PHINode &PN) {
// bother doing this transformation. At best, this will just save a bit of		// bother doing this transformation. At best, this will just save a bit of
// offset calculation, but all the predecessors will have to materialize the		// offset calculation, but all the predecessors will have to materialize the
// stack address into a register anyway. We'd actually rather clone the		// stack address into a register anyway. We'd actually rather clone the
// load up into the predecessors so that we have a load of a gep of an alloca,		// load up into the predecessors so that we have a load of a gep of an alloca,
// which can usually all be folded into the load.		// which can usually all be folded into the load.
if (AllBasePointersAreAllocas)		if (AllBasePointersAreAllocas)
return nullptr;		return nullptr;

		// Verify that folding is profitable (the original operands are dead).
		if (!IsFoldPHIArgProfitable(PN))
		return nullptr;

// Otherwise, this is safe to transform. Insert PHI nodes for each operand		// Otherwise, this is safe to transform. Insert PHI nodes for each operand
// that is variable.		// that is variable.
SmallVector<PHINode*, 16> OperandPhis(FixedOperands.size());		SmallVector<PHINode*, 16> OperandPhis(FixedOperands.size());

bool HasAnyPHIs = false;		bool HasAnyPHIs = false;
for (unsigned i = 0, e = FixedOperands.size(); i != e; ++i) {		for (unsigned i = 0, e = FixedOperands.size(); i != e; ++i) {
if (FixedOperands[i]) continue; // operand doesn't need a phi.		if (FixedOperands[i]) continue; // operand doesn't need a phi.
Value *FirstOp = FirstInst->getOperand(i);		Value *FirstOp = FirstInst->getOperand(i);
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::FoldPHIArgLoadIntoPHI(PHINode &PN) {
// the path through the other successor.		// the path through the other successor.
if (isVolatile &&		if (isVolatile &&
FirstLI->getParent()->getTerminator()->getNumSuccessors() != 1)		FirstLI->getParent()->getTerminator()->getNumSuccessors() != 1)
return nullptr;		return nullptr;

// Check to see if all arguments are the same operation.		// Check to see if all arguments are the same operation.
for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {
LoadInst *LI = dyn_cast<LoadInst>(PN.getIncomingValue(i));		LoadInst *LI = dyn_cast<LoadInst>(PN.getIncomingValue(i));
if (!LI \|\| !LI->hasOneUse())		if (!LI)
return nullptr;		return nullptr;

// We can't sink the load if the loaded value could be modified between		// We can't sink the load if the loaded value could be modified between
// the load and the PHI.		// the load and the PHI.
if (LI->isVolatile() != isVolatile \|\|		if (LI->isVolatile() != isVolatile \|\|
LI->getParent() != PN.getIncomingBlock(i) \|\|		LI->getParent() != PN.getIncomingBlock(i) \|\|
LI->getPointerAddressSpace() != LoadAddrSpace \|\|		LI->getPointerAddressSpace() != LoadAddrSpace \|\|
!isSafeAndProfitableToSinkLoad(LI))		!isSafeAndProfitableToSinkLoad(LI))
Show All 9 Lines	for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {
// If the PHI is of volatile loads and the load block has multiple		// If the PHI is of volatile loads and the load block has multiple
// successors, sinking it would remove a load of the volatile value from		// successors, sinking it would remove a load of the volatile value from
// the path through the other successor.		// the path through the other successor.
if (isVolatile &&		if (isVolatile &&
LI->getParent()->getTerminator()->getNumSuccessors() != 1)		LI->getParent()->getTerminator()->getNumSuccessors() != 1)
return nullptr;		return nullptr;
}		}

		// Verify that folding is profitable (the original operands are dead).
		if (!IsFoldPHIArgProfitable(PN))
		return nullptr;

// Okay, they are all the same operation. Create a new PHI node of the		// Okay, they are all the same operation. Create a new PHI node of the
// correct type, and PHI together all of the LHS's of the instructions.		// correct type, and PHI together all of the LHS's of the instructions.
PHINode *NewPN = PHINode::Create(FirstLI->getOperand(0)->getType(),		PHINode *NewPN = PHINode::Create(FirstLI->getOperand(0)->getType(),
PN.getNumIncomingValues(),		PN.getNumIncomingValues(),
PN.getName()+".in");		PN.getName()+".in");

Value *InVal = FirstLI->getOperand(0);		Value *InVal = FirstLI->getOperand(0);
NewPN->addIncoming(InVal, PN.getIncomingBlock(0));		NewPN->addIncoming(InVal, PN.getIncomingBlock(0));
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::FoldPHIArgZextsIntoPHI(PHINode &Phi) {

// Walk the phi operands checking that we only have zexts or constants that		// Walk the phi operands checking that we only have zexts or constants that
// we can shrink for free. Store the new operands for the new phi.		// we can shrink for free. Store the new operands for the new phi.
SmallVector<Value *, 4> NewIncoming;		SmallVector<Value *, 4> NewIncoming;
unsigned NumZexts = 0;		unsigned NumZexts = 0;
unsigned NumConsts = 0;		unsigned NumConsts = 0;
for (Value *V : Phi.incoming_values()) {		for (Value *V : Phi.incoming_values()) {
if (auto *Zext = dyn_cast<ZExtInst>(V)) {		if (auto *Zext = dyn_cast<ZExtInst>(V)) {
// All zexts must be identical and have one use.		// All zexts must be identical.
if (Zext->getSrcTy() != NarrowType \|\| !Zext->hasOneUse())		if (Zext->getSrcTy() != NarrowType)
return nullptr;		return nullptr;
NewIncoming.push_back(Zext->getOperand(0));		NewIncoming.push_back(Zext->getOperand(0));
NumZexts++;		NumZexts++;
} else if (auto *C = dyn_cast<Constant>(V)) {		} else if (auto *C = dyn_cast<Constant>(V)) {
// Make sure that constants can fit in the new type.		// Make sure that constants can fit in the new type.
Constant *Trunc = ConstantExpr::getTrunc(C, NarrowType);		Constant *Trunc = ConstantExpr::getTrunc(C, NarrowType);
if (ConstantExpr::getZExt(Trunc, C->getType()) != C)		if (ConstantExpr::getZExt(Trunc, C->getType()) != C)
return nullptr;		return nullptr;
Show All 9 Lines	Instruction *InstCombiner::FoldPHIArgZextsIntoPHI(PHINode &Phi) {
// variable operand are handled by FoldPHIArgOpIntoPHI() and foldOpIntoPhi()		// variable operand are handled by FoldPHIArgOpIntoPHI() and foldOpIntoPhi()
// respectively. foldOpIntoPhi() wants to do the opposite transform that is		// respectively. foldOpIntoPhi() wants to do the opposite transform that is
// performed here. It tries to replicate a cast in the phi operand's basic		// performed here. It tries to replicate a cast in the phi operand's basic
// block to expose other folding opportunities. Thus, InstCombine will		// block to expose other folding opportunities. Thus, InstCombine will
// infinite loop without this check.		// infinite loop without this check.
if (NumConsts == 0 \|\| NumZexts < 2)		if (NumConsts == 0 \|\| NumZexts < 2)
return nullptr;		return nullptr;

		// Verify that folding is profitable (the original operands are dead).
		if (!IsFoldPHIArgProfitable(Phi))
		return nullptr;

// All incoming values are zexts or constants that are safe to truncate.		// All incoming values are zexts or constants that are safe to truncate.
// Create a new phi node of the narrow type, phi together all of the new		// Create a new phi node of the narrow type, phi together all of the new
// operands, and zext the result back to the original type.		// operands, and zext the result back to the original type.
PHINode *NewPhi = PHINode::Create(NarrowType, NumIncomingValues,		PHINode *NewPhi = PHINode::Create(NarrowType, NumIncomingValues,
Phi.getName() + ".shrunk");		Phi.getName() + ".shrunk");
for (unsigned i = 0; i != NumIncomingValues; ++i)		for (unsigned i = 0; i != NumIncomingValues; ++i)
NewPhi->addIncoming(NewIncoming[i], Phi.getIncomingBlock(i));		NewPhi->addIncoming(NewIncoming[i], Phi.getIncomingBlock(i));

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	if (!ConstantOp)
return FoldPHIArgBinOpIntoPHI(PN);		return FoldPHIArgBinOpIntoPHI(PN);
} else {		} else {
return nullptr; // Cannot fold this operation.		return nullptr; // Cannot fold this operation.
}		}

// Check to see if all arguments are the same operation.		// Check to see if all arguments are the same operation.
for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 1, e = PN.getNumIncomingValues(); i != e; ++i) {
Instruction *I = dyn_cast<Instruction>(PN.getIncomingValue(i));		Instruction *I = dyn_cast<Instruction>(PN.getIncomingValue(i));
if (!I \|\| !I->hasOneUse() \|\| !I->isSameOperationAs(FirstInst))		if (!I \|\| !I->isSameOperationAs(FirstInst))
return nullptr;		return nullptr;
if (CastSrcTy) {		if (CastSrcTy) {
if (I->getOperand(0)->getType() != CastSrcTy)		if (I->getOperand(0)->getType() != CastSrcTy)
return nullptr; // Cast operation must match.		return nullptr; // Cast operation must match.
} else if (I->getOperand(1) != ConstantOp) {		} else if (I->getOperand(1) != ConstantOp) {
return nullptr;		return nullptr;
}		}
}		}

		// Verify that folding is profitable (the original operands are dead).
		if (!IsFoldPHIArgProfitable(PN))
		return nullptr;

// Okay, they are all the same operation. Create a new PHI node of the		// Okay, they are all the same operation. Create a new PHI node of the
// correct type, and PHI together all of the LHS's of the instructions.		// correct type, and PHI together all of the LHS's of the instructions.
PHINode *NewPN = PHINode::Create(FirstInst->getOperand(0)->getType(),		PHINode *NewPN = PHINode::Create(FirstInst->getOperand(0)->getType(),
PN.getNumIncomingValues(),		PN.getNumIncomingValues(),
PN.getName()+".in");		PN.getName()+".in");

Value *InVal = FirstInst->getOperand(0);		Value *InVal = FirstInst->getOperand(0);
NewPN->addIncoming(InVal, PN.getIncomingBlock(0));		NewPN->addIncoming(InVal, PN.getIncomingBlock(0));
▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitPHINode(PHINode &PN) {
if (Instruction *Result = FoldPHIArgZextsIntoPHI(PN))		if (Instruction *Result = FoldPHIArgZextsIntoPHI(PN))
return Result;		return Result;

// If all PHI operands are the same operation, pull them through the PHI,		// If all PHI operands are the same operation, pull them through the PHI,
// reducing code size.		// reducing code size.
if (isa<Instruction>(PN.getIncomingValue(0)) &&		if (isa<Instruction>(PN.getIncomingValue(0)) &&
isa<Instruction>(PN.getIncomingValue(1)) &&		isa<Instruction>(PN.getIncomingValue(1)) &&
cast<Instruction>(PN.getIncomingValue(0))->getOpcode() ==		cast<Instruction>(PN.getIncomingValue(0))->getOpcode() ==
cast<Instruction>(PN.getIncomingValue(1))->getOpcode() &&		cast<Instruction>(PN.getIncomingValue(1))->getOpcode())
// FIXME: The hasOneUse check will fail for PHIs that use the value more
// than themselves more than once.
PN.getIncomingValue(0)->hasOneUse())
if (Instruction *Result = FoldPHIArgOpIntoPHI(PN))		if (Instruction *Result = FoldPHIArgOpIntoPHI(PN))
return Result;		return Result;

// If this is a trivial cycle in the PHI node graph, remove it. Basically, if		// If this is a trivial cycle in the PHI node graph, remove it. Basically, if
// this PHI only has a single use (a PHI), and if that PHI only has one use (a		// this PHI only has a single use (a PHI), and if that PHI only has one use (a
// PHI)... break the cycle.		// PHI)... break the cycle.
if (PN.hasOneUse()) {		if (PN.hasOneUse()) {
Instruction *PHIUser = cast<Instruction>(PN.user_back());		Instruction *PHIUser = cast<Instruction>(PN.user_back());
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

test/Transforms/InstCombine/phi.ll

Show First 20 Lines • Show All 873 Lines • ▼ Show 20 Lines	if.else: ; preds = %entry
%1 = select i1 %cmp, i32 1, i32 2		%1 = select i1 %cmp, i32 1, i32 2
br label %if.end		br label %if.end

if.end: ; preds = %entry, %if.then		if.end: ; preds = %entry, %if.then
%a.0 = phi i32 [ %1, %if.else], [ %n, %entry ], [2, %if.then]		%a.0 = phi i32 [ %1, %if.else], [ %n, %entry ], [2, %if.then]
%cmp1 = icmp ne i32 %a.0, 0		%cmp1 = icmp ne i32 %a.0, 0
ret i1 %cmp1		ret i1 %cmp1
}		}


		; Verify that we fold arguments through multiple identical PHI nodes

		; CHECK-LABEL: @phi_multi
		; CHECK: loop_start:
		; CHECK: %[[RES2:.*]] = phi i8 [ %valB, %entry ], [ %valB.next, %loop_next ]
		; CHECK: exit1:
		; CHECK-NEXT: %[[RES1:.*]] = phi i8 [ %valB, %entry ], [ %valB.next, %loop_next ]
		; CHECK-NEXT: %[[RES1A:.*]] = zext i8 %[[RES1]] to i64
		; CHECK-NEXT: %[[RES1B:.*]] = or i64 %[[RES1A]], %c
		; CHECK-NEXT: ret i64 %[[RES1B]]
		; CHECK: exit2:
		; CHECK-NEXT: %[[RES2A:.*]] = zext i8 %[[RES2]] to i64
		; CHECK-NEXT: %[[RES2B:.*]] = or i64 %[[RES2A]], %c
		; CHECK-NEXT: ret i64 %[[RES2B]]

		define i64 @phi_multi(i8* %ptrA, i8* %ptrB, i64 %c) {
		entry:
		%valA = load i8, i8* %ptrA
		%valB = load i8, i8* %ptrB
		%valB.ext = zext i8 %valB to i64
		%valB.res = or i64 %c, %valB.ext
		%cmp1 = icmp eq i8 %valA, %valB
		br i1 %cmp1, label %exit1, label %loop_start

		loop_start:
		%ptrA.0 = phi i8* [ %ptrA, %entry ], [ %ptrA.next, %loop_next ]
		%ptrB.0 = phi i8* [ %ptrB, %entry ], [ %ptrB.next, %loop_next ]
		%valA.0 = phi i8 [ %valA, %entry ], [ %valA.next, %loop_next ]
		%valB.res.0 = phi i64 [ %valB.res, %entry ], [ %valB.next.res, %loop_next ]
		%cmp2 = icmp eq i8 %valA.0, 0
		br i1 %cmp2, label %exit2, label %loop_next

		loop_next:
		%ptrA.next = getelementptr inbounds i8, i8* %ptrA.0, i64 1
		%ptrB.next = getelementptr inbounds i8, i8* %ptrB.0, i64 1
		%valA.next = load i8, i8* %ptrA.next
		%valB.next = load i8, i8* %ptrB.next
		%valB.next.ext = zext i8 %valB.next to i64
		%valB.next.res = or i64 %c, %valB.next.ext
		%cmp3 = icmp eq i8 %valA.next, %valB.next
		br i1 %cmp3, label %loop_start, label %exit1

		exit1:
		%valB.res.1 = phi i64 [ %valB.res, %entry ], [ %valB.next.res, %loop_next ]
		ret i64 %valB.res.1

		exit2:
		ret i64 %valB.res.0
		}

test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	latch: ; preds = %header
br i1 %cmp, label %header, label %LoopExit		br i1 %cmp, label %header, label %LoopExit
}		}

; two exiting and two exit blocks.		; two exiting and two exit blocks.
; the non-latch exiting block has duplicate edges to the non-latch exit block.		; the non-latch exiting block has duplicate edges to the non-latch exit block.
define i64 @test5(i64 %trip, i64 %add, i1 %cond) {		define i64 @test5(i64 %trip, i64 %add, i1 %cond) {
; EPILOG: test5(		; EPILOG: test5(
; EPILOG: exit1.loopexit:		; EPILOG: exit1.loopexit:
; EPILOG-NEXT: %result.ph = phi i64 [ %ivy, %loop_exiting ], [ %ivy, %loop_exiting ], [ %ivy.1, %loop_exiting.1 ], [ %ivy.1, %loop_exiting.1 ], [ %ivy.2, %loop_exiting.2 ],		; EPILOG-NEXT: %iv.pn = phi i64 [ %iv, %loop_exiting ], [ %iv, %loop_exiting ], [ %iv_next, %loop_exiting.1 ], [ %iv_next, %loop_exiting.1 ], [ %iv_next.1, %loop_exiting.2 ],
; EPILOG-NEXT: br label %exit1		; EPILOG-NEXT: br label %exit1
; EPILOG: exit1.loopexit2:		; EPILOG: exit1.loopexit2:
; EPILOG-NEXT: %ivy.epil = add i64 %iv.epil, %add
; EPILOG-NEXT: br label %exit1		; EPILOG-NEXT: br label %exit1
; EPILOG: exit1:		; EPILOG: exit1:
; EPILOG-NEXT: %result = phi i64 [ %result.ph, %exit1.loopexit ], [ %ivy.epil, %exit1.loopexit2 ]		; EPILOG-NEXT: %iv.pn.pn = phi i64 [ %iv.pn, %exit1.loopexit ], [ %iv.epil, %exit1.loopexit2 ]
		; EPILOG-NEXT: %result = add i64 %iv.pn.pn, %add
; EPILOG-NEXT: ret i64 %result		; EPILOG-NEXT: ret i64 %result
; EPILOG: loop_latch.7:		; EPILOG: loop_latch.7:
; EPILOG: %niter.nsub.7 = add i64 %niter, -8		; EPILOG: %niter.nsub.7 = add i64 %niter, -8

; PROLOG: test5(		; PROLOG: test5(
; PROLOG: exit1.loopexit:		; PROLOG: exit1.loopexit:
; PROLOG-NEXT: %result.ph = phi i64 [ %ivy, %loop_exiting ], [ %ivy, %loop_exiting ], [ %ivy.1, %loop_exiting.1 ], [ %ivy.1, %loop_exiting.1 ], [ %ivy.2, %loop_exiting.2 ],		; PROLOG-NEXT: %iv.pn = phi i64 [ %iv, %loop_exiting ], [ %iv, %loop_exiting ], [ %iv_next, %loop_exiting.1 ], [ %iv_next, %loop_exiting.1 ], [ %iv_next.1, %loop_exiting.2 ],
; PROLOG-NEXT: br label %exit1		; PROLOG-NEXT: br label %exit1
; PROLOG: exit1.loopexit1:		; PROLOG: exit1.loopexit1:
; PROLOG-NEXT: %ivy.prol = add i64 %iv.prol, %add
; PROLOG-NEXT: br label %exit1		; PROLOG-NEXT: br label %exit1
; PROLOG: exit1:		; PROLOG: exit1:
; PROLOG-NEXT: %result = phi i64 [ %result.ph, %exit1.loopexit ], [ %ivy.prol, %exit1.loopexit1 ]		; PROLOG-NEXT: %iv.pn.pn = phi i64 [ %iv.pn, %exit1.loopexit ], [ %iv.prol, %exit1.loopexit1 ]
		; PROLOG-NEXT: %result = add i64 %iv.pn.pn, %add
; PROLOG-NEXT: ret i64 %result		; PROLOG-NEXT: ret i64 %result
; PROLOG: loop_latch.7:		; PROLOG: loop_latch.7:
; PROLOG: %iv_next.7 = add nsw i64 %iv, 8		; PROLOG: %iv_next.7 = add nsw i64 %iv, 8
entry:		entry:
br label %loop_header		br label %loop_header

loop_header:		loop_header:
%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]		%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines