This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Swap to using TargetTransformInfo for cost analysis.
ClosedPublic

Authored by jmolloy on Feb 9 2015, 9:08 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
andreadb
hfinkel

Summary

We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.

Test changes:

combine-comparisons-by-cse.ll: Removed unneeded branch check.
2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
coalesce-subregs.ll: Superfluous block check.
2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 19588.Feb 9 2015, 9:08 AM

jmolloy retitled this revision from to [SimplifyCFG] Swap to using TargetTransformInfo for cost analysis..

jmolloy updated this object.

jmolloy edited the test plan for this revision. (Show Details)

jmolloy added reviewers: hfinkel, t.p.northover.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: Unknown Object (MLST).

kristof.beyls added a subscriber: kristof.beyls.Feb 9 2015, 9:14 AM

I'm assuming that this patch should help to make sure vmin/vmax/fmin/fmax instructions gets produced for C code similar to
if (a<b) b=a;

If that is indeed one of the intents, it may be good to write a test to check that? After having quickly looked into existing regression tests, it seems there are plenty of tests to check correct MC-level handling of vmin/vmax/fmin/fmax instructions and also quite a lot of tests checking it gets generated when the operation is encoded as a select in llvm-ir, but I couldn't find a test that checks that a multi-basic-block representation of the operation is turned into an vmin/vmax/fmin/fmax.

Hi Kristof,

This revision doesn't have any deliberate heuristic changes, so clamp() will not be optimized in this revision to min/max. I've added such a test to D7507, which is where I expect this to happen.

The AArch64 backend does not, at the moment, produce fmin/fmax for this, it produces fcsel. That is an extra improvement we need to make.

Cheers,

James

Hi James,

With this patch, intrinsic calls with no side effects are now considered to be viable candidates for speculation. Before your patch, SimplifyCFG would have conservatively returned a high cost for intrinsic calls. My question is: was this functional change intended? In general, I like the idea of flattening the CFG as long as CodeGenPrepare knows how to undo a speculation if it is not profitable for the target.

For example, your patch would allow SimplifyCFG to speculate calls to cttz/ctlz. This would be done without knowing if it is cheap for the target. This may cause performance degradation on x86 targets with no LZCNT/BMI. My opinion is that, if we want to speculate intrinsic calls, then we also have to teach CodeGenPrepare how to revert the change if it turns out that the speculation was not profitable (cheap) for the target. At the moment, CodeGenPrepare doesn't know how to undo a wrong speculation on cttz/ctlz calls (I guess, we can teach CodeGenPrepare how to do it in a separate patch..).

Have you collected some performance numbers after this change?

Thanks,
Andrea

andreadb added a reviewer: andreadb.Feb 10 2015, 6:48 AM

LGTM. As Andrea points out, this is going to cause some potentially-significant behavioral swings, and we'll need to watch for performance regressions carefully.

This revision is now accepted and ready to land.Feb 10 2015, 9:43 AM

LGTM too.
I plan to send a follow-up patch to improve the cost heuristic of intrinsic cttz/ctlz for targets that don't have "cheap" count leading/trailing zeroes.

Thanks all - landed in r228826. I haven't committed the followup D7507.

andreadb mentioned this in D7554: [TTI] improved cost heuristic for cttz/ctlz calls..Feb 11 2015, 5:50 AM

Diffusion mentioned this in rL228829: [TTI] Improved cost heuristic for cttz/ctlz calls..Feb 11 2015, 6:24 AM

jmolloy closed this revision.Mar 18 2015, 3:27 AM

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

78 lines

test/

CodeGen/

AArch64/

arm64-promote-const.ll

75 lines

combine-comparisons-by-cse.ll

1 line

ARM/

2014-08-04-muls-it.ll

4 lines

coalesce-subregs.ll

1 line

Transforms/

SimplifyCFG/

2008-01-02-hoist-fp-add.ll

16 lines

PhiBlockMerge.ll

4 lines

select-gep.ll

23 lines

Diff 19588

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	static void AddPredecessorToBlock(BasicBlock Succ, BasicBlock NewPred,

PHINode *PN;		PHINode *PN;
for (BasicBlock::iterator I = Succ->begin();		for (BasicBlock::iterator I = Succ->begin();
(PN = dyn_cast<PHINode>(I)); ++I)		(PN = dyn_cast<PHINode>(I)); ++I)
PN->addIncoming(PN->getIncomingValueForBlock(ExistPred), NewPred);		PN->addIncoming(PN->getIncomingValueForBlock(ExistPred), NewPred);
}		}

/// ComputeSpeculationCost - Compute an abstract "cost" of speculating the		/// ComputeSpeculationCost - Compute an abstract "cost" of speculating the
/// given instruction, which is assumed to be safe to speculate. 1 means		/// given instruction, which is assumed to be safe to speculate. TCC_Free means
/// cheap, 2 means less cheap, and UINT_MAX means prohibitively expensive.		/// cheap, TCC_Basic means less cheap, and TCC_Expensive means prohibitively
static unsigned ComputeSpeculationCost(const User I, const DataLayout DL) {		/// expensive.
		static unsigned ComputeSpeculationCost(const User I, const DataLayout DL,
		const TargetTransformInfo &TTI) {
assert(isSafeToSpeculativelyExecute(I, DL) &&		assert(isSafeToSpeculativelyExecute(I, DL) &&
"Instruction is not safe to speculatively execute!");		"Instruction is not safe to speculatively execute!");
switch (Operator::getOpcode(I)) {		return TTI.getUserCost(I);
default:
// In doubt, be conservative.
return UINT_MAX;
case Instruction::GetElementPtr:
// GEPs are cheap if all indices are constant.
if (!cast<GEPOperator>(I)->hasAllConstantIndices())
return UINT_MAX;
return 1;
case Instruction::ExtractValue:
case Instruction::Load:
case Instruction::Add:
case Instruction::Sub:
case Instruction::And:
case Instruction::Or:
case Instruction::Xor:
case Instruction::Shl:
case Instruction::LShr:
case Instruction::AShr:
case Instruction::ICmp:
case Instruction::Trunc:
case Instruction::ZExt:
case Instruction::SExt:
case Instruction::BitCast:
case Instruction::ExtractElement:
case Instruction::InsertElement:
return 1; // These are all cheap.

case Instruction::Call:
case Instruction::Select:
return 2;
}
}		}

/// DominatesMergePoint - If we have a merge point of an "if condition" as		/// DominatesMergePoint - If we have a merge point of an "if condition" as
/// accepted above, return true if the specified value dominates the block. We		/// accepted above, return true if the specified value dominates the block. We
/// don't handle the true generality of domination here, just a special case		/// don't handle the true generality of domination here, just a special case
/// which works well enough for us.		/// which works well enough for us.
///		///
/// If AggressiveInsts is non-null, and if V does not dominate BB, we check to		/// If AggressiveInsts is non-null, and if V does not dominate BB, we check to
/// see if V (which must be an instruction) and its recursive operands		/// see if V (which must be an instruction) and its recursive operands
/// that do not dominate BB have a combined cost lower than CostRemaining and		/// that do not dominate BB have a combined cost lower than CostRemaining and
/// are non-trapping. If both are true, the instruction is inserted into the		/// are non-trapping. If both are true, the instruction is inserted into the
/// set and true is returned.		/// set and true is returned.
///		///
/// The cost for most non-trapping instructions is defined as 1 except for		/// The cost for most non-trapping instructions is defined as 1 except for
/// Select whose cost is 2.		/// Select whose cost is 2.
///		///
/// After this function returns, CostRemaining is decreased by the cost of		/// After this function returns, CostRemaining is decreased by the cost of
/// V plus its non-dominating operands. If that cost is greater than		/// V plus its non-dominating operands. If that cost is greater than
/// CostRemaining, false is returned and CostRemaining is undefined.		/// CostRemaining, false is returned and CostRemaining is undefined.
static bool DominatesMergePoint(Value V, BasicBlock BB,		static bool DominatesMergePoint(Value V, BasicBlock BB,
SmallPtrSetImpl<Instruction> AggressiveInsts,		SmallPtrSetImpl<Instruction> AggressiveInsts,
unsigned &CostRemaining,		unsigned &CostRemaining,
const DataLayout *DL) {		const DataLayout *DL,
		const TargetTransformInfo &TTI) {
Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
if (!I) {		if (!I) {
// Non-instructions all dominate instructions, but not all constantexprs		// Non-instructions all dominate instructions, but not all constantexprs
// can be executed unconditionally.		// can be executed unconditionally.
if (ConstantExpr *C = dyn_cast<ConstantExpr>(V))		if (ConstantExpr *C = dyn_cast<ConstantExpr>(V))
if (C->canTrap())		if (C->canTrap())
return false;		return false;
return true;		return true;
Show All 19 Lines	static bool DominatesMergePoint(Value V, BasicBlock BB,
if (AggressiveInsts->count(I)) return true;		if (AggressiveInsts->count(I)) return true;

// Okay, it looks like the instruction IS in the "condition". Check to		// Okay, it looks like the instruction IS in the "condition". Check to
// see if it's a cheap instruction to unconditionally compute, and if it		// see if it's a cheap instruction to unconditionally compute, and if it
// only uses stuff defined outside of the condition. If so, hoist it out.		// only uses stuff defined outside of the condition. If so, hoist it out.
if (!isSafeToSpeculativelyExecute(I, DL))		if (!isSafeToSpeculativelyExecute(I, DL))
return false;		return false;

unsigned Cost = ComputeSpeculationCost(I, DL);		unsigned Cost = ComputeSpeculationCost(I, DL, TTI);

if (Cost > CostRemaining)		if (Cost > CostRemaining)
return false;		return false;

CostRemaining -= Cost;		CostRemaining -= Cost;

// Okay, we can only really hoist these out if their operands do		// Okay, we can only really hoist these out if their operands do
// not take us over the cost threshold.		// not take us over the cost threshold.
for (User::op_iterator i = I->op_begin(), e = I->op_end(); i != e; ++i)		for (User::op_iterator i = I->op_begin(), e = I->op_end(); i != e; ++i)
if (!DominatesMergePoint(*i, BB, AggressiveInsts, CostRemaining, DL))		if (!DominatesMergePoint(*i, BB, AggressiveInsts, CostRemaining, DL, TTI))
return false;		return false;
// Okay, it's safe to do this! Remember this instruction.		// Okay, it's safe to do this! Remember this instruction.
AggressiveInsts->insert(I);		AggressiveInsts->insert(I);
return true;		return true;
}		}

/// GetConstantInt - Extract ConstantInt from value, looking through IntToPtr		/// GetConstantInt - Extract ConstantInt from value, looking through IntToPtr
/// and PointerNullValue. Return NULL if value is not a constant int.		/// and PointerNullValue. Return NULL if value is not a constant int.
▲ Show 20 Lines • Show All 1,151 Lines • ▼ Show 20 Lines
/// %cmp = icmp ult %x, %y		/// %cmp = icmp ult %x, %y
/// %sub = sub %x, %y		/// %sub = sub %x, %y
/// %cond = select i1 %cmp, 0, %sub		/// %cond = select i1 %cmp, 0, %sub
/// ...		/// ...
/// \endcode		/// \endcode
///		///
/// \returns true if the conditional block is removed.		/// \returns true if the conditional block is removed.
static bool SpeculativelyExecuteBB(BranchInst BI, BasicBlock ThenBB,		static bool SpeculativelyExecuteBB(BranchInst BI, BasicBlock ThenBB,
const DataLayout *DL) {		const DataLayout *DL,
		const TargetTransformInfo &TTI) {
// Be conservative for now. FP select instruction can often be expensive.		// Be conservative for now. FP select instruction can often be expensive.
Value *BrCond = BI->getCondition();		Value *BrCond = BI->getCondition();
if (isa<FCmpInst>(BrCond))		if (isa<FCmpInst>(BrCond))
return false;		return false;

BasicBlock *BB = BI->getParent();		BasicBlock *BB = BI->getParent();
BasicBlock *EndBB = ThenBB->getTerminator()->getSuccessor(0);		BasicBlock *EndBB = ThenBB->getTerminator()->getSuccessor(0);

Show All 32 Lines	for (BasicBlock::iterator BBI = ThenBB->begin(),

// Don't hoist the instruction if it's unsafe or expensive.		// Don't hoist the instruction if it's unsafe or expensive.
if (!isSafeToSpeculativelyExecute(I, DL) &&		if (!isSafeToSpeculativelyExecute(I, DL) &&
!(HoistCondStores &&		!(HoistCondStores &&
(SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,		(SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
EndBB))))		EndBB))))
return false;		return false;
if (!SpeculatedStoreValue &&		if (!SpeculatedStoreValue &&
ComputeSpeculationCost(I, DL) > PHINodeFoldingThreshold)		ComputeSpeculationCost(I, DL, TTI) > PHINodeFoldingThreshold *
		TargetTransformInfo::TCC_Basic)
return false;		return false;

// Store the store speculation candidate.		// Store the store speculation candidate.
if (SpeculatedStoreValue)		if (SpeculatedStoreValue)
SpeculatedStore = cast<StoreInst>(I);		SpeculatedStore = cast<StoreInst>(I);

// Do not hoist the instruction if any of its operands are defined but not		// Do not hoist the instruction if any of its operands are defined but not
// used in BB. The transformation will prevent the operand from		// used in BB. The transformation will prevent the operand from
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator I = EndBB->begin();
ConstantExpr *OrigCE = dyn_cast<ConstantExpr>(OrigV);		ConstantExpr *OrigCE = dyn_cast<ConstantExpr>(OrigV);
ConstantExpr *ThenCE = dyn_cast<ConstantExpr>(ThenV);		ConstantExpr *ThenCE = dyn_cast<ConstantExpr>(ThenV);
if (!OrigCE && !ThenCE)		if (!OrigCE && !ThenCE)
continue; // Known safe and cheap.		continue; // Known safe and cheap.

if ((ThenCE && !isSafeToSpeculativelyExecute(ThenCE, DL)) \|\|		if ((ThenCE && !isSafeToSpeculativelyExecute(ThenCE, DL)) \|\|
(OrigCE && !isSafeToSpeculativelyExecute(OrigCE, DL)))		(OrigCE && !isSafeToSpeculativelyExecute(OrigCE, DL)))
return false;		return false;
unsigned OrigCost = OrigCE ? ComputeSpeculationCost(OrigCE, DL) : 0;		unsigned OrigCost = OrigCE ? ComputeSpeculationCost(OrigCE, DL, TTI) : 0;
unsigned ThenCost = ThenCE ? ComputeSpeculationCost(ThenCE, DL) : 0;		unsigned ThenCost = ThenCE ? ComputeSpeculationCost(ThenCE, DL, TTI) : 0;
if (OrigCost + ThenCost > 2 * PHINodeFoldingThreshold)		unsigned MaxCost = 2 * PHINodeFoldingThreshold *
		TargetTransformInfo::TCC_Basic;
		if (OrigCost + ThenCost > MaxCost)
return false;		return false;

// Account for the cost of an unfolded ConstantExpr which could end up		// Account for the cost of an unfolded ConstantExpr which could end up
// getting expanded into Instructions.		// getting expanded into Instructions.
// FIXME: This doesn't account for how many operations are combined in the		// FIXME: This doesn't account for how many operations are combined in the
// constant expression.		// constant expression.
++SpeculationCost;		++SpeculationCost;
if (SpeculationCost > 1)		if (SpeculationCost > 1)
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {
return FoldCondBranchOnPHI(BI, DL) \| true;		return FoldCondBranchOnPHI(BI, DL) \| true;
}		}

return false;		return false;
}		}

/// FoldTwoEntryPHINode - Given a BB that starts with the specified two-entry		/// FoldTwoEntryPHINode - Given a BB that starts with the specified two-entry
/// PHI node, see if we can eliminate it.		/// PHI node, see if we can eliminate it.
static bool FoldTwoEntryPHINode(PHINode PN, const DataLayout DL) {		static bool FoldTwoEntryPHINode(PHINode PN, const DataLayout DL,
		const TargetTransformInfo &TTI) {
// Ok, this is a two entry PHI node. Check to see if this is a simple "if		// Ok, this is a two entry PHI node. Check to see if this is a simple "if
// statement", which has a very simple dominance structure. Basically, we		// statement", which has a very simple dominance structure. Basically, we
// are trying to find the condition that is being branched on, which		// are trying to find the condition that is being branched on, which
// subsequently causes this merge to happen. We really want control		// subsequently causes this merge to happen. We really want control
// dependence information for this check, but simplifycfg can't keep it up		// dependence information for this check, but simplifycfg can't keep it up
// to date, and this catches most of the cases we care about anyway.		// to date, and this catches most of the cases we care about anyway.
BasicBlock *BB = PN->getParent();		BasicBlock *BB = PN->getParent();
BasicBlock IfTrue, IfFalse;		BasicBlock IfTrue, IfFalse;
Show All 14 Lines	if (NumPhis > 2)
return false;		return false;

// Loop over the PHI's seeing if we can promote them all to select		// Loop over the PHI's seeing if we can promote them all to select
// instructions. While we are at it, keep track of the instructions		// instructions. While we are at it, keep track of the instructions
// that need to be moved to the dominating block.		// that need to be moved to the dominating block.
SmallPtrSet<Instruction*, 4> AggressiveInsts;		SmallPtrSet<Instruction*, 4> AggressiveInsts;
unsigned MaxCostVal0 = PHINodeFoldingThreshold,		unsigned MaxCostVal0 = PHINodeFoldingThreshold,
MaxCostVal1 = PHINodeFoldingThreshold;		MaxCostVal1 = PHINodeFoldingThreshold;
		MaxCostVal0 *= TargetTransformInfo::TCC_Basic;
		MaxCostVal1 *= TargetTransformInfo::TCC_Basic;

for (BasicBlock::iterator II = BB->begin(); isa<PHINode>(II);) {		for (BasicBlock::iterator II = BB->begin(); isa<PHINode>(II);) {
PHINode *PN = cast<PHINode>(II++);		PHINode *PN = cast<PHINode>(II++);
if (Value *V = SimplifyInstruction(PN, DL)) {		if (Value *V = SimplifyInstruction(PN, DL)) {
PN->replaceAllUsesWith(V);		PN->replaceAllUsesWith(V);
PN->eraseFromParent();		PN->eraseFromParent();
continue;		continue;
}		}

if (!DominatesMergePoint(PN->getIncomingValue(0), BB, &AggressiveInsts,		if (!DominatesMergePoint(PN->getIncomingValue(0), BB, &AggressiveInsts,
MaxCostVal0, DL) \|\|		MaxCostVal0, DL, TTI) \|\|
!DominatesMergePoint(PN->getIncomingValue(1), BB, &AggressiveInsts,		!DominatesMergePoint(PN->getIncomingValue(1), BB, &AggressiveInsts,
MaxCostVal1, DL))		MaxCostVal1, DL, TTI))
return false;		return false;
}		}

// If we folded the first phi, PN dangles at this point. Refresh it. If		// If we folded the first phi, PN dangles at this point. Refresh it. If
// we ran out of PHIs then we simplified them all.		// we ran out of PHIs then we simplified them all.
PN = dyn_cast<PHINode>(BB->begin());		PN = dyn_cast<PHINode>(BB->begin());
if (!PN) return true;		if (!PN) return true;

▲ Show 20 Lines • Show All 2,607 Lines • ▼ Show 20 Lines	if (BI->getSuccessor(1)->getSinglePredecessor()) {
if (HoistThenElseCodeToIf(BI, DL))		if (HoistThenElseCodeToIf(BI, DL))
return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;		return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;
} else {		} else {
// If Successor #1 has multiple preds, we may be able to conditionally		// If Successor #1 has multiple preds, we may be able to conditionally
// execute Successor #0 if it branches to Successor #1.		// execute Successor #0 if it branches to Successor #1.
TerminatorInst *Succ0TI = BI->getSuccessor(0)->getTerminator();		TerminatorInst *Succ0TI = BI->getSuccessor(0)->getTerminator();
if (Succ0TI->getNumSuccessors() == 1 &&		if (Succ0TI->getNumSuccessors() == 1 &&
Succ0TI->getSuccessor(0) == BI->getSuccessor(1))		Succ0TI->getSuccessor(0) == BI->getSuccessor(1))
if (SpeculativelyExecuteBB(BI, BI->getSuccessor(0), DL))		if (SpeculativelyExecuteBB(BI, BI->getSuccessor(0), DL, TTI))
return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;		return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;
}		}
} else if (BI->getSuccessor(1)->getSinglePredecessor()) {		} else if (BI->getSuccessor(1)->getSinglePredecessor()) {
// If Successor #0 has multiple preds, we may be able to conditionally		// If Successor #0 has multiple preds, we may be able to conditionally
// execute Successor #1 if it branches to Successor #0.		// execute Successor #1 if it branches to Successor #0.
TerminatorInst *Succ1TI = BI->getSuccessor(1)->getTerminator();		TerminatorInst *Succ1TI = BI->getSuccessor(1)->getTerminator();
if (Succ1TI->getNumSuccessors() == 1 &&		if (Succ1TI->getNumSuccessors() == 1 &&
Succ1TI->getSuccessor(0) == BI->getSuccessor(0))		Succ1TI->getSuccessor(0) == BI->getSuccessor(0))
if (SpeculativelyExecuteBB(BI, BI->getSuccessor(1), DL))		if (SpeculativelyExecuteBB(BI, BI->getSuccessor(1), DL, TTI))
return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;		return SimplifyCFG(BB, TTI, BonusInstThreshold, DL, AC) \| true;
}		}

// If this is a branch on a phi node in the current block, thread control		// If this is a branch on a phi node in the current block, thread control
// through this block if any PHI node entries are constants.		// through this block if any PHI node entries are constants.
if (PHINode *PN = dyn_cast<PHINode>(BI->getCondition()))		if (PHINode *PN = dyn_cast<PHINode>(BI->getCondition()))
if (PN->getParent() == BI->getParent())		if (PN->getParent() == BI->getParent())
if (FoldCondBranchOnPHI(BI, DL))		if (FoldCondBranchOnPHI(BI, DL))
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	if (MergeBlockIntoPredecessor(BB))
return true;		return true;

IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);

// If there is a trivial two-entry PHI node in this basic block, and we can		// If there is a trivial two-entry PHI node in this basic block, and we can
// eliminate it, do so now.		// eliminate it, do so now.
if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))		if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))
if (PN->getNumIncomingValues() == 2)		if (PN->getNumIncomingValues() == 2)
Changed \|= FoldTwoEntryPHINode(PN, DL);		Changed \|= FoldTwoEntryPHINode(PN, DL, TTI);

Builder.SetInsertPoint(BB->getTerminator());		Builder.SetInsertPoint(BB->getTerminator());
if (BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator())) {		if (BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator())) {
if (BI->isUnconditional()) {		if (BI->isUnconditional()) {
if (SimplifyUncondBranch(BI, Builder)) return true;		if (SimplifyUncondBranch(BI, Builder)) return true;
} else {		} else {
if (SimplifyCondBranch(BI, Builder)) return true;		if (SimplifyCondBranch(BI, Builder)) return true;
}		}
Show All 27 Lines

test/CodeGen/AArch64/arm64-promote-const.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; REGULAR-NEXT: mla.16b v0, v0, v[[REGNUM]]			; REGULAR-NEXT: mla.16b v0, v0, v[[REGNUM]]
	; REGULAR-NEXT: ret			; REGULAR-NEXT: ret
	%add.i = add <16 x i8> %arg, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>			%add.i = add <16 x i8> %arg, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>
	%mul.i = mul <16 x i8> %add.i, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>			%mul.i = mul <16 x i8> %add.i, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>
	%add.i9 = add <16 x i8> %add.i, %mul.i			%add.i9 = add <16 x i8> %add.i, %mul.i
	ret <16 x i8> %add.i9			ret <16 x i8> %add.i9
	}			}

	; Two different uses of the sane constant in two different basic blocks,			; Two different uses of the same constant in two different basic blocks,
	; one dominates the other			; one dominates the other
	define <16 x i8> @test3(<16 x i8> %arg, i32 %path) {			define <16 x i8> @test3(<16 x i8> %arg, i32 %path) {
	; PROMOTED-LABEL: test3:			; PROMOTED-LABEL: test3:
	; In stress mode, constant vector are promoted			; In stress mode, constant vector are promoted
	; Since, the constant is the same as the previous function,			; Since, the constant is the same as the previous function,
	; the same address must be used			; the same address must be used
	; PROMOTED: adrp [[PAGEADDR:x[0-9]+]], [[CSTV1]]@PAGE			; PROMOTED: ldr
	; PROMOTED-NEXT: ldr q[[REGNUM:[0-9]+]], {{\[}}[[PAGEADDR]], [[CSTV1]]@PAGEOFF]			; PROMOTED: ldr
	; Destination register is defined by ABI			; PROMOTED-NOT: ldr
	; PROMOTED-NEXT: add.16b v0, v0, v[[REGNUM]]			; PROMOTED: ret
	; PROMOTED-NEXT: cbnz w0, [[LABEL:LBB.*]]
	; Next BB
	; PROMOTED: adrp [[PAGEADDR:x[0-9]+]], [[CSTV2:__PromotedConst[0-9]+]]@PAGE
	; PROMOTED-NEXT: ldr q[[REGNUM]], {{\[}}[[PAGEADDR]], [[CSTV2]]@PAGEOFF]
	; Next BB
	; PROMOTED-NEXT: [[LABEL]]:
	; PROMOTED-NEXT: mul.16b [[DESTV:v[0-9]+]], v0, v[[REGNUM]]
	; PROMOTED-NEXT: add.16b v0, v0, [[DESTV]]
	; PROMOTED-NEXT: ret

	; REGULAR-LABEL: test3:			; REGULAR-LABEL: test3:
	; Regular mode does not elimitate common sub expression by its own.			; REGULAR: ldr
	; In other words, the same loads appears several times.			; REGULAR: ldr
	; REGULAR: adrp [[PAGEADDR:x[0-9]+]], [[CSTLABEL1:lCP.*]]@PAGE			; REGULAR-NOT: ldr
	; REGULAR-NEXT: ldr q[[REGNUM:[0-9]+]], {{\[}}[[PAGEADDR]], [[CSTLABEL1]]@PAGEOFF]			; REGULAR: ret
	; Destination register is defined by ABI
	; REGULAR-NEXT: add.16b v0, v0, v[[REGNUM]]
	; REGULAR-NEXT: cbz w0, [[LABELelse:LBB.*]]
	; Next BB
	; Redundant load
	; REGULAR: adrp [[PAGEADDR:x[0-9]+]], [[CSTLABEL1]]@PAGE
	; REGULAR-NEXT: ldr q[[REGNUM]], {{\[}}[[PAGEADDR]], [[CSTLABEL1]]@PAGEOFF]
	; REGULAR-NEXT: b [[LABELend:LBB.*]]
	; Next BB
	; REGULAR-NEXT: [[LABELelse]]
	; REGULAR-NEXT: adrp [[PAGEADDR:x[0-9]+]], [[CSTLABEL2:lCP.*]]@PAGE
	; REGULAR-NEXT: ldr q[[REGNUM]], {{\[}}[[PAGEADDR]], [[CSTLABEL2]]@PAGEOFF]
	; Next BB
	; REGULAR-NEXT: [[LABELend]]:
	; REGULAR-NEXT: mul.16b [[DESTV:v[0-9]+]], v0, v[[REGNUM]]
	; REGULAR-NEXT: add.16b v0, v0, [[DESTV]]
	; REGULAR-NEXT: ret
	entry:			entry:
	%add.i = add <16 x i8> %arg, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>			%add.i = add <16 x i8> %arg, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>
	%tobool = icmp eq i32 %path, 0			%tobool = icmp eq i32 %path, 0
	br i1 %tobool, label %if.else, label %if.then			br i1 %tobool, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%mul.i13 = mul <16 x i8> %add.i, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>			%mul.i13 = mul <16 x i8> %add.i, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>
	br label %if.end			br label %if.end
	Show All 10 Lines

	; Two different uses of the sane constant in two different basic blocks,			; Two different uses of the sane constant in two different basic blocks,
	; none dominates the other			; none dominates the other
	define <16 x i8> @test4(<16 x i8> %arg, i32 %path) {			define <16 x i8> @test4(<16 x i8> %arg, i32 %path) {
	; PROMOTED-LABEL: test4:			; PROMOTED-LABEL: test4:
	; In stress mode, constant vector are promoted			; In stress mode, constant vector are promoted
	; Since, the constant is the same as the previous function,			; Since, the constant is the same as the previous function,
	; the same address must be used			; the same address must be used
	; PROMOTED: adrp [[PAGEADDR:x[0-9]+]], [[CSTV1]]@PAGE			; PROMOTED: ldr
	; PROMOTED-NEXT: ldr q[[REGNUM:[0-9]+]], {{\[}}[[PAGEADDR]], [[CSTV1]]@PAGEOFF]			; PROMOTED-NOT: ldr
	; Destination register is defined by ABI			; PROMOTED: ret
	; PROMOTED-NEXT: add.16b v0, v0, v[[REGNUM]]
	; PROMOTED-NEXT: cbz w0, [[LABEL:LBB.*]]
	; Next BB
	; PROMOTED: mul.16b v0, v0, v[[REGNUM]]
	; Next BB
	; PROMOTED-NEXT: [[LABEL]]:
	; PROMOTED-NEXT: ret


	; REGULAR-LABEL: test4:			; REGULAR-LABEL: test4:
	; REGULAR: adrp [[PAGEADDR:x[0-9]+]], [[CSTLABEL3:lCP.*]]@PAGE			; REGULAR: ldr
	; REGULAR-NEXT: ldr q[[REGNUM:[0-9]+]], {{\[}}[[PAGEADDR]], [[CSTLABEL3]]@PAGEOFF]			; REGULAR-NOT: ldr
	; Destination register is defined by ABI			; REGULAR: ret
	; REGULAR-NEXT: add.16b v0, v0, v[[REGNUM]]
	; REGULAR-NEXT: cbz w0, [[LABEL:LBB.*]]
	; Next BB
	; Redundant expression
	; REGULAR: adrp [[PAGEADDR:x[0-9]+]], [[CSTLABEL3]]@PAGE
	; REGULAR-NEXT: ldr q[[REGNUM:[0-9]+]], {{\[}}[[PAGEADDR]], [[CSTLABEL3]]@PAGEOFF]
	; Destination register is defined by ABI
	; REGULAR-NEXT: mul.16b v0, v0, v[[REGNUM]]
	; Next BB
	; REGULAR-NEXT: [[LABEL]]:
	; REGULAR-NEXT: ret
	entry:			entry:
	%add.i = add <16 x i8> %arg, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>			%add.i = add <16 x i8> %arg, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>
	%tobool = icmp eq i32 %path, 0			%tobool = icmp eq i32 %path, 0
	br i1 %tobool, label %if.end, label %if.then			br i1 %tobool, label %if.end, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%mul.i = mul <16 x i8> %add.i, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>			%mul.i = mul <16 x i8> %add.i, <i8 -40, i8 -93, i8 -118, i8 -99, i8 -75, i8 -105, i8 74, i8 -110, i8 62, i8 -115, i8 -119, i8 -120, i8 34, i8 -124, i8 0, i8 -128>
	br label %if.end			br label %if.end
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

test/CodeGen/AArch64/combine-comparisons-by-cse.ll

	Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines
	; CHECK: cmp w0, #2			; CHECK: cmp w0, #2
	; CHECK: b.lt .LBB9_3			; CHECK: b.lt .LBB9_3
	; CHECK-NOT: cmp w0, #1			; CHECK-NOT: cmp w0, #1
	; CHECK-NOT: b.le .LBB9_3			; CHECK-NOT: b.le .LBB9_3

	; CHECK-LABEL-DAG: .LBB9_3			; CHECK-LABEL-DAG: .LBB9_3
	; CHECK: cmp w19, #0			; CHECK: cmp w19, #0
	; CHECK: fcmp d8, #0.0			; CHECK: fcmp d8, #0.0
	; CHECK: b.gt .LBB9_5
	; CHECK-NOT: cmp w19, #1			; CHECK-NOT: cmp w19, #1
	; CHECK-NOT: b.ge .LBB9_5			; CHECK-NOT: b.ge .LBB9_5

	entry:			entry:
	%cmp = icmp sgt i32 %argc, 1			%cmp = icmp sgt i32 %argc, 1
	br i1 %cmp, label %land.lhs.true, label %if.end			br i1 %cmp, label %land.lhs.true, label %if.end

	land.lhs.true: ; preds = %entry			land.lhs.true: ; preds = %entry
	Show All 36 Lines

test/CodeGen/ARM/2014-08-04-muls-it.ll

	Show All 11 Lines

	if.end: ; preds = %if.then, %entry			if.end: ; preds = %if.then, %entry
	%i.addr.0 = phi i32 [ %mul, %if.then ], [ %i, %entry ]			%i.addr.0 = phi i32 [ %mul, %if.then ], [ %i, %entry ]
	ret i32 %i.addr.0			ret i32 %i.addr.0
	}			}

	; CHECK-LABEL: function			; CHECK-LABEL: function
	; CHECK: cmp r0, r1			; CHECK: cmp r0, r1
	; CHECK: bne [[LABEL:[.*]]]
	; CHECK-NOT: mulseq r0, r0, r0			; CHECK-NOT: mulseq r0, r0, r0
	; CHECK: [[LABEL]]			; CHECK: muleq r0, r0, r0
	; CHECK: muls r0, r0, r0
	; CHECK: bx lr			; CHECK: bx lr

test/CodeGen/ARM/coalesce-subregs.ll

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	bb:
%tmp17 = insertvalue %struct.wombat.5 %tmp16, <4 x float> %tmp15, 2, 0		%tmp17 = insertvalue %struct.wombat.5 %tmp16, <4 x float> %tmp15, 2, 0
%tmp18 = insertvalue %struct.wombat.5 %tmp17, <4 x float> undef, 3, 0		%tmp18 = insertvalue %struct.wombat.5 %tmp17, <4 x float> undef, 3, 0
ret %struct.wombat.5 %tmp18		ret %struct.wombat.5 %tmp18
}		}

; CHECK: adjustCopiesBackFrom		; CHECK: adjustCopiesBackFrom
; The shuffle in if.else3 must be preserved even though adjustCopiesBackFrom		; The shuffle in if.else3 must be preserved even though adjustCopiesBackFrom
; is tempted to remove it.		; is tempted to remove it.
; CHECK: %if.else3
; CHECK: vorr d		; CHECK: vorr d
define internal void @adjustCopiesBackFrom(<2 x i64>* noalias nocapture sret %agg.result, <2 x i64> %in) {		define internal void @adjustCopiesBackFrom(<2 x i64>* noalias nocapture sret %agg.result, <2 x i64> %in) {
entry:		entry:
%0 = extractelement <2 x i64> %in, i32 0		%0 = extractelement <2 x i64> %in, i32 0
%cmp = icmp slt i64 %0, 1		%cmp = icmp slt i64 %0, 1
%.in = select i1 %cmp, <2 x i64> <i64 0, i64 undef>, <2 x i64> %in		%.in = select i1 %cmp, <2 x i64> <i64 0, i64 undef>, <2 x i64> %in
%1 = extractelement <2 x i64> %in, i32 1		%1 = extractelement <2 x i64> %in, i32 1
%cmp1 = icmp slt i64 %1, 1		%cmp1 = icmp slt i64 %1, 1
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/2008-01-02-hoist-fp-add.ll

	; The phi should not be eliminated in this case, because the fp op could trap.			; The phi should not be eliminated in this case, because the divide op could trap.
	; RUN: opt < %s -simplifycfg -S \| FileCheck %s			; RUN: opt < %s -simplifycfg -S \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
	target triple = "i686-apple-darwin8"			target triple = "i686-apple-darwin8"
	@G = weak global double 0.000000e+00, align 8 ; <double*> [#uses=2]			@G = weak global i32 0, align 8 ; <i32*> [#uses=2]

	define void @test(i32 %X, i32 %Y, double %Z) {			define void @test(i32 %X, i32 %Y, i32 %Z) {
	entry:			entry:
	%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]			%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
	%tmp = load double* @G, align 8 ; <double> [#uses=2]			%tmp = load i32* @G, align 8 ; <i32> [#uses=2]
	%tmp3 = icmp eq i32 %X, %Y ; <i1> [#uses=1]			%tmp3 = icmp eq i32 %X, %Y ; <i1> [#uses=1]
	%tmp34 = zext i1 %tmp3 to i8 ; <i8> [#uses=1]			%tmp34 = zext i1 %tmp3 to i8 ; <i8> [#uses=1]
	%toBool = icmp ne i8 %tmp34, 0 ; <i1> [#uses=1]			%toBool = icmp ne i8 %tmp34, 0 ; <i1> [#uses=1]
	br i1 %toBool, label %cond_true, label %cond_next			br i1 %toBool, label %cond_true, label %cond_next

	cond_true: ; preds = %entry			cond_true: ; preds = %entry
	%tmp7 = fadd double %tmp, %Z ; <double> [#uses=1]			%tmp7 = udiv i32 %tmp, %Z ; <i32> [#uses=1]
	br label %cond_next			br label %cond_next

	cond_next: ; preds = %cond_true, %entry			cond_next: ; preds = %cond_true, %entry
	; CHECK: = phi double			; CHECK: = phi i32
	%F.0 = phi double [ %tmp, %entry ], [ %tmp7, %cond_true ] ; <double> [#uses=1]			%F.0 = phi i32 [ %tmp, %entry ], [ %tmp7, %cond_true ] ; <i32> [#uses=1]
	store double %F.0, double* @G, align 8			store i32 %F.0, i32* @G, align 8
	ret void			ret void
	}			}

test/Transforms/SimplifyCFG/PhiBlockMerge.ll

	; Test merging of blocks that only have PHI nodes in them			; Test merging of blocks that only have PHI nodes in them
	;			;
	; RUN: opt < %s -simplifycfg -S \| FileCheck %s			; RUN: opt < %s -simplifycfg -S \| FileCheck %s
	;			;

	define i32 @test(i1 %a, i1 %b) {			define i32 @test(i1 %a, i1 %b) {
	; CHECK: br i1 %a
	br i1 %a, label %M, label %O			br i1 %a, label %M, label %O
	; CHECK: O:
	O: ; preds = %0			O: ; preds = %0
	; CHECK: select i1 %b, i32 0, i32 1			; CHECK: select i1 %b, i32 0, i32 1
	; CHECK-NOT: phi			; CHECK-NOT: phi
	br i1 %b, label %N, label %Q			br i1 %b, label %N, label %Q
	Q: ; preds = %O			Q: ; preds = %O
	br label %N			br label %N
	N: ; preds = %Q, %O			N: ; preds = %Q, %O
	; This block should be foldable into M			; This block should be foldable into M
	%Wp = phi i32 [ 0, %O ], [ 1, %Q ] ; <i32> [#uses=1]			%Wp = phi i32 [ 0, %O ], [ 1, %Q ] ; <i32> [#uses=1]
	br label %M			br label %M
	M: ; preds = %N, %0			M: ; preds = %N, %0
	; CHECK: %W = phi i32
	%W = phi i32 [ %Wp, %N ], [ 2, %0 ] ; <i32> [#uses=1]			%W = phi i32 [ %Wp, %N ], [ 2, %0 ] ; <i32> [#uses=1]
	%R = add i32 %W, 1 ; <i32> [#uses=1]			%R = add i32 %W, 1 ; <i32> [#uses=1]
	ret i32 %R			ret i32 %R
				; CHECK: ret
	}			}

test/Transforms/SimplifyCFG/select-gep.ll

	; RUN: opt -S -simplifycfg < %s \| FileCheck %s			; RUN: opt -S -simplifycfg < %s \| FileCheck %s

	define i8* @test1(i8* %x, i64 %y) nounwind {
	entry:
	%tmp1 = load i8* %x, align 1
	%cmp = icmp eq i8 %tmp1, 47
	br i1 %cmp, label %if.then, label %if.end

	if.then:
	%incdec.ptr = getelementptr inbounds i8* %x, i64 %y
	br label %if.end

	if.end:
	%x.addr = phi i8* [ %incdec.ptr, %if.then ], [ %x, %entry ]
	ret i8* %x.addr

	; CHECK-LABEL: @test1(
	; CHECK-NOT: select
	; CHECK: ret i8* %x.addr
	}

	%ST = type { i8, i8 }			%ST = type { i8, i8 }

	define i8* @test2(%ST* %x, i8* %y) nounwind {			define i8* @test1(%ST* %x, i8* %y) nounwind {
	entry:			entry:
	%cmp = icmp eq %ST* %x, null			%cmp = icmp eq %ST* %x, null
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	if.then:			if.then:
	%incdec.ptr = getelementptr %ST* %x, i32 0, i32 1			%incdec.ptr = getelementptr %ST* %x, i32 0, i32 1
	br label %if.end			br label %if.end

	if.end:			if.end:
	%x.addr = phi i8* [ %incdec.ptr, %if.then ], [ %y, %entry ]			%x.addr = phi i8* [ %incdec.ptr, %if.then ], [ %y, %entry ]
	ret i8* %x.addr			ret i8* %x.addr

	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test1(
	; CHECK: %incdec.ptr.y = select i1 %cmp, i8* %incdec.ptr, i8* %y			; CHECK: %incdec.ptr.y = select i1 %cmp, i8* %incdec.ptr, i8* %y
	; CHECK: ret i8* %incdec.ptr.y			; CHECK: ret i8* %incdec.ptr.y
	}			}