This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineCasts.cpp
-
InstCombineInternal.h
-
InstructionCombining.cpp
-
test/Transforms/InstCombine/NVPTX/
-
Transforms/
-
InstCombine/
-
NVPTX/
-
lit.local.cfg
-
no-widen-expensive.ll

Differential D10750

Make InstCombine aware of TargetTransformInfo when optimize extension
Needs ReviewPublic

Authored by wengxt on Jun 25 2015, 3:10 PM.

Download Raw Diff

Details

Reviewers

majnemer
• dberlin

Summary

In NVPTX64, 64-bit register is simulated by two 32-bit register which
makes it more expensive on 64-bit operations.

In this patch we compute the extra cost of extending the expression
tree for zext and sext by exploiting TargetTransformInfo and make sure
the transform will not bring more costs.

Diff Detail

Event Timeline

wengxt updated this revision to Diff 28507.Jun 25 2015, 3:10 PM

wengxt retitled this revision from to Make InstCombine aware of TargetTransformInfo when optimize extension.

wengxt updated this object.

wengxt edited the test plan for this revision. (Show Details)

wengxt added reviewers: • dberlin, chandlerc.

wengxt added subscribers: • tstellarAMD, jingyue, broune and 3 others.

Herald added a subscriber: jholewinski. · View Herald TranscriptJun 25 2015, 3:10 PM

wengxt added a reviewer: majnemer.Jun 25 2015, 3:13 PM

update reviewer

This doesn't seem like the right approach at all.

I don't think we want to use cost heuristics to dictate the canonical form in the IR. It makes the canonicalization *much* more complex and hard to manage.

I think that NVPTX should either change the datalayout to make i64 illegal (and that seems reasonable for instcombine to respect, but that would for example mean it would be an *absolute* decision, not a cost decision) or you'll need to do a late transform to narrow the operations as much as possible.

This revision now requires changes to proceed.Jun 26 2015, 1:19 AM

add datalayout test case to make sure it will not pass in old code

What's special about NVPTX is that NVPTX emits PTX, a virtual ISA, instead of real machine code (aka SASS). The CUDA driver JIT compiles PTX to machine code at runtime. Ideally, NVPTX should codegen machine code directly, but that requires SASS ISA to be public.

Therefore, i64 is a legal DL type for PTX, but just more expensive at runtime because the machine code needs two 32-bit registers to simulate an i64. Does LLVM's target-independent IR optimizer implicitly assume legal integer types are equally cheap? I don't think that's a right assumption. IIRC, AMDGPU also has similar issues where widening integers can hurt performance [https://llvm.org/bugs/show_bug.cgi?id=21148]. We solved that by disabling indvar widening if 64-bit arithmetics are more expensive than 32-bit.

I am not sure how viable the second approach (i.e. narrowing the operations as much as possible) is. While widening is sound, narrowing can be unsound in many cases, not mentioning the complexity of bit tracking etc. Anyhow, it doesn't feel right to undo something in certain targets rather than not doing that in the target-independent phase (with TTI checks of course) at the first place.

Alternatively, what do you think about modifying ShouldChangeType to disallow such conversion for NVPTX instead of introducing a cost model? ShouldChangeType sounds to me like a place we put legality checks. Although i64 is a DL legal type, such sext/zext can be considered "illegal" due to performance reasons.

LMK. Thanks!

chandlerc removed a reviewer: chandlerc.Apr 6 2016, 10:54 PM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

186 lines

InstCombineInternal.h

8 lines

InstructionCombining.cpp

17 lines

test/

Transforms/

InstCombine/

NVPTX/

lit.local.cfg

2 lines

no-widen-expensive.ll

22 lines

Diff 28589

lib/Transforms/InstCombine/InstCombineCasts.cpp

Show All 10 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
using namespace llvm;		using namespace llvm;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

/// DecomposeSimpleLinearExpr - Analyze 'Val', seeing if it is a simple linear		/// DecomposeSimpleLinearExpr - Analyze 'Val', seeing if it is a simple linear
/// expression. If so, decompose it, returning some value X, such that Val is		/// expression. If so, decompose it, returning some value X, such that Val is
/// X*Scale+Offset.		/// X*Scale+Offset.
▲ Show 20 Lines • Show All 605 Lines • ▼ Show 20 Lines	if (IntegerType *ITy = dyn_cast<IntegerType>(CI.getType())) {
}		}
}		}
}		}
}		}

return nullptr;		return nullptr;
}		}

/// CanEvaluateZExtd - Determine if the specified value can be computed in the		/// ComputeEvaluateZExtdCost - Compute the extra cost if the specified value is
/// specified wider type and produce the same low bits. If not, return false.		/// computed in the specified wider type and produce the same low bits.
		/// If such evalutation is not possible, return INT_MAX.
///		///
/// If this function returns true, it can also return a non-zero number of bits		/// If this function returns true, it can also return a non-zero number of bits
/// (in BitsToClear) which indicates that the value it computes is correct for		/// (in BitsToClear) which indicates that the value it computes is correct for
/// the zero extend, but that the additional BitsToClear bits need to be zero'd		/// the zero extend, but that the additional BitsToClear bits need to be zero'd
/// out. For example, to promote something like:		/// out. For example, to promote something like:
///		///
/// %B = trunc i64 %A to i32		/// %B = trunc i64 %A to i32
/// %C = lshr i32 %B, 8		/// %C = lshr i32 %B, 8
/// %E = zext i32 %C to i64		/// %E = zext i32 %C to i64
///		///
/// CanEvaluateZExtd for the 'lshr' will return true, and BitsToClear will be		/// ComputeEvaluateZExtdCost for the 'lshr' will set BitsToClear 8 to indicate
/// set to 8 to indicate that the promoted value needs to have bits 24-31		/// that the promoted value needs to have bits 24-31 cleared in addition to bits
/// cleared in addition to bits 32-63. Since an 'and' will be generated to		/// 32-63. Since an 'and' will be generated to clear the top bits anyway, doing
/// clear the top bits anyway, doing this has no extra cost.		/// this has no extra cost.
///		///
/// This function works on both vectors and scalars.		/// This function works on both vectors and scalars.
static bool CanEvaluateZExtd(Value V, Type Ty, unsigned &BitsToClear,		static int ComputeEvaluateZExtdCost(Value V, Type Ty, unsigned &BitsToClear,
InstCombiner &IC, Instruction *CxtI) {		InstCombiner &IC, Instruction *CxtI,
		const TargetTransformInfo &TTI) {
BitsToClear = 0;		BitsToClear = 0;
if (isa<Constant>(V))		if (isa<Constant>(V))
return true;		return 0;

		Type *OrigTy = V->getType();

Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
if (!I) return false;		if (!I) return INT_MAX;

// If the input is a truncate from the destination type, we can trivially		// If the input is a truncate from the destination type, we can trivially
// eliminate it.		// eliminate it.
if (isa<TruncInst>(I) && I->getOperand(0)->getType() == Ty)		if (isa<TruncInst>(I) && I->getOperand(0)->getType() == Ty)
return true;		return 0;

// We can't extend or shrink something that has multiple uses: doing so would		// We can't extend or shrink something that has multiple uses: doing so would
// require duplicating the instruction in general, which isn't profitable.		// require duplicating the instruction in general, which isn't profitable.
if (!I->hasOneUse()) return false;		if (!I->hasOneUse()) return INT_MAX;

unsigned Opc = I->getOpcode(), Tmp;		unsigned Opc = I->getOpcode(), Tmp;
switch (Opc) {		switch (Opc) {
case Instruction::ZExt: // zext(zext(x)) -> zext(x).		case Instruction::ZExt: // zext(zext(x)) -> zext(x).
case Instruction::SExt: // zext(sext(x)) -> sext(x).		case Instruction::SExt: // zext(sext(x)) -> sext(x).
case Instruction::Trunc: // zext(trunc(x)) -> trunc(x) or zext(x)		case Instruction::Trunc: // zext(trunc(x)) -> trunc(x) or zext(x)
return true;		return 0;
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Mul:		case Instruction::Mul: {
if (!CanEvaluateZExtd(I->getOperand(0), Ty, BitsToClear, IC, CxtI) \|\|		int LHSCost = ComputeEvaluateZExtdCost(I->getOperand(0), Ty, BitsToClear,
!CanEvaluateZExtd(I->getOperand(1), Ty, Tmp, IC, CxtI))		IC, CxtI, TTI);
return false;		if (LHSCost == INT_MAX) return INT_MAX;
		int RHSCost =
		ComputeEvaluateZExtdCost(I->getOperand(1), Ty, Tmp, IC, CxtI, TTI);
		if (RHSCost == INT_MAX) return INT_MAX;
// These can all be promoted if neither operand has 'bits to clear'.		// These can all be promoted if neither operand has 'bits to clear'.
if (BitsToClear == 0 && Tmp == 0)		if (BitsToClear == 0 && Tmp == 0)
return true;		return LHSCost + RHSCost + TTI.getArithmeticInstrCost(Opc, Ty) -
		TTI.getArithmeticInstrCost(Opc, OrigTy);

// If the operation is an AND/OR/XOR and the bits to clear are zero in the		// If the operation is an AND/OR/XOR and the bits to clear are zero in the
// other side, BitsToClear is ok.		// other side, BitsToClear is ok.
if (Tmp == 0 &&		if (Tmp == 0 &&
(Opc == Instruction::And \|\| Opc == Instruction::Or \|\|		(Opc == Instruction::And \|\| Opc == Instruction::Or \|\|
Opc == Instruction::Xor)) {		Opc == Instruction::Xor)) {
// We use MaskedValueIsZero here for generality, but the case we care		// We use MaskedValueIsZero here for generality, but the case we care
// about the most is constant RHS.		// about the most is constant RHS.
unsigned VSize = V->getType()->getScalarSizeInBits();		unsigned VSize = V->getType()->getScalarSizeInBits();
if (IC.MaskedValueIsZero(I->getOperand(1),		if (IC.MaskedValueIsZero(I->getOperand(1),
APInt::getHighBitsSet(VSize, BitsToClear),		APInt::getHighBitsSet(VSize, BitsToClear),
0, CxtI))		0, CxtI))
return true;		return LHSCost + RHSCost + TTI.getArithmeticInstrCost(Opc, Ty) -
		TTI.getArithmeticInstrCost(Opc, OrigTy);
}		}

// Otherwise, we don't know how to analyze this BitsToClear case yet.		// Otherwise, we don't know how to analyze this BitsToClear case yet.
return false;		return INT_MAX;
		}

case Instruction::Shl:		case Instruction::Shl:
// We can promote shl(x, cst) if we can promote x. Since shl overwrites the		// We can promote shl(x, cst) if we can promote x. Since shl overwrites the
// upper bits we can reduce BitsToClear by the shift amount.		// upper bits we can reduce BitsToClear by the shift amount.
if (ConstantInt *Amt = dyn_cast<ConstantInt>(I->getOperand(1))) {		if (ConstantInt *Amt = dyn_cast<ConstantInt>(I->getOperand(1))) {
if (!CanEvaluateZExtd(I->getOperand(0), Ty, BitsToClear, IC, CxtI))		int Cost = ComputeEvaluateZExtdCost(I->getOperand(0), Ty, BitsToClear, IC,
return false;		CxtI, TTI);
		if (Cost == INT_MAX)
		return INT_MAX;
uint64_t ShiftAmt = Amt->getZExtValue();		uint64_t ShiftAmt = Amt->getZExtValue();
BitsToClear = ShiftAmt < BitsToClear ? BitsToClear - ShiftAmt : 0;		BitsToClear = ShiftAmt < BitsToClear ? BitsToClear - ShiftAmt : 0;
return true;		return Cost + TTI.getArithmeticInstrCost(Opc, Ty) -
		TTI.getArithmeticInstrCost(Opc, OrigTy);
}		}
return false;		return INT_MAX;
case Instruction::LShr:		case Instruction::LShr:
// We can promote lshr(x, cst) if we can promote x. This requires the		// We can promote lshr(x, cst) if we can promote x. This requires the
// ultimate 'and' to clear out the high zero bits we're clearing out though.		// ultimate 'and' to clear out the high zero bits we're clearing out though.
if (ConstantInt *Amt = dyn_cast<ConstantInt>(I->getOperand(1))) {		if (ConstantInt *Amt = dyn_cast<ConstantInt>(I->getOperand(1))) {
if (!CanEvaluateZExtd(I->getOperand(0), Ty, BitsToClear, IC, CxtI))		int Cost = ComputeEvaluateZExtdCost(I->getOperand(0), Ty, BitsToClear, IC,
return false;		CxtI, TTI);
		if (Cost == INT_MAX)
		return INT_MAX;
BitsToClear += Amt->getZExtValue();		BitsToClear += Amt->getZExtValue();
if (BitsToClear > V->getType()->getScalarSizeInBits())		if (BitsToClear > V->getType()->getScalarSizeInBits())
BitsToClear = V->getType()->getScalarSizeInBits();		BitsToClear = V->getType()->getScalarSizeInBits();
return true;		return Cost + TTI.getArithmeticInstrCost(Opc, Ty) -
		TTI.getArithmeticInstrCost(Opc, OrigTy);
}		}
// Cannot promote variable LSHR.		// Cannot promote variable LSHR.
return false;		return INT_MAX;
case Instruction::Select:		case Instruction::Select: {
if (!CanEvaluateZExtd(I->getOperand(1), Ty, Tmp, IC, CxtI) \|\|		int Cost1 =
!CanEvaluateZExtd(I->getOperand(2), Ty, BitsToClear, IC, CxtI) \|\|		ComputeEvaluateZExtdCost(I->getOperand(1), Ty, Tmp, IC, CxtI, TTI);
		if (Cost1 == INT_MAX)
		return INT_MAX;
		int Cost2 = ComputeEvaluateZExtdCost(I->getOperand(2), Ty, BitsToClear, IC,
		CxtI, TTI);
// TODO: If important, we could handle the case when the BitsToClear are		// TODO: If important, we could handle the case when the BitsToClear are
// known zero in the disagreeing side.		// known zero in the disagreeing side.
Tmp != BitsToClear)		if (Cost2 == INT_MAX \|\| Tmp != BitsToClear) {
return false;		return INT_MAX;
return true;		}
		return Cost1 + Cost2;
		}

case Instruction::PHI: {		case Instruction::PHI: {
// We can change a phi if we can change all operands. Note that we never		// We can change a phi if we can change all operands. Note that we never
// get into trouble with cyclic PHIs here because we only consider		// get into trouble with cyclic PHIs here because we only consider
// instructions with a single use.		// instructions with a single use.
PHINode *PN = cast<PHINode>(I);		PHINode *PN = cast<PHINode>(I);
if (!CanEvaluateZExtd(PN->getIncomingValue(0), Ty, BitsToClear, IC, CxtI))		int Cost = ComputeEvaluateZExtdCost(PN->getIncomingValue(0), Ty,
return false;		BitsToClear, IC, CxtI, TTI);
for (unsigned i = 1, e = PN->getNumIncomingValues(); i != e; ++i)		if (Cost == INT_MAX)
if (!CanEvaluateZExtd(PN->getIncomingValue(i), Ty, Tmp, IC, CxtI) \|\|		return INT_MAX;
		int AccumCost = Cost;
		for (unsigned i = 1, e = PN->getNumIncomingValues(); i != e; ++i) {
		Cost = ComputeEvaluateZExtdCost(PN->getIncomingValue(i), Ty, Tmp, IC,
		CxtI, TTI);
		if (Cost == INT_MAX \|\|
// TODO: If important, we could handle the case when the BitsToClear		// TODO: If important, we could handle the case when the BitsToClear
// are known zero in the disagreeing input.		// are known zero in the disagreeing input.
Tmp != BitsToClear)		Tmp != BitsToClear)
return false;		return INT_MAX;
return true;		AccumCost += Cost;
		}
		return AccumCost;
}		}
default:		default:
// TODO: Can handle more cases here.		// TODO: Can handle more cases here.
return false;		return INT_MAX;
}		}
}		}

Instruction *InstCombiner::visitZExt(ZExtInst &CI) {		Instruction *InstCombiner::visitZExt(ZExtInst &CI) {
// If this zero extend is only used by a truncate, let the truncate be		// If this zero extend is only used by a truncate, let the truncate be
// eliminated before we try to optimize this zext.		// eliminated before we try to optimize this zext.
if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))		if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))
return nullptr;		return nullptr;
Show All 11 Lines	Instruction *InstCombiner::visitZExt(ZExtInst &CI) {
Type SrcTy = Src->getType(), DestTy = CI.getType();		Type SrcTy = Src->getType(), DestTy = CI.getType();

// Attempt to extend the entire input expression tree to the destination		// Attempt to extend the entire input expression tree to the destination
// type. Only do this if the dest type is a simple type, don't convert the		// type. Only do this if the dest type is a simple type, don't convert the
// expression tree to something weird like i93 unless the source is also		// expression tree to something weird like i93 unless the source is also
// strange.		// strange.
unsigned BitsToClear;		unsigned BitsToClear;
if ((DestTy->isVectorTy() \|\| ShouldChangeType(SrcTy, DestTy)) &&		if ((DestTy->isVectorTy() \|\| ShouldChangeType(SrcTy, DestTy)) &&
CanEvaluateZExtd(Src, DestTy, BitsToClear, *this, &CI)) {		ComputeEvaluateZExtdCost(Src, DestTy, BitsToClear, *this, &CI, TTI) <= 0) {
assert(BitsToClear < SrcTy->getScalarSizeInBits() &&		assert(BitsToClear < SrcTy->getScalarSizeInBits() &&
"Unreasonable BitsToClear");		"Unreasonable BitsToClear");

// Okay, we can transform this! Insert the new expression now.		// Okay, we can transform this! Insert the new expression now.
DEBUG(dbgs() << "ICE: EvaluateInDifferentType converting expression type"		DEBUG(dbgs() << "ICE: EvaluateInDifferentType converting expression type"
" to avoid zero extend: " << CI);		" to avoid zero extend: " << CI << '\n');
Value *Res = EvaluateInDifferentType(Src, DestTy, false);		Value *Res = EvaluateInDifferentType(Src, DestTy, false);
assert(Res->getType() == DestTy);		assert(Res->getType() == DestTy);

uint32_t SrcBitsKept = SrcTy->getScalarSizeInBits()-BitsToClear;		uint32_t SrcBitsKept = SrcTy->getScalarSizeInBits()-BitsToClear;
uint32_t DestBitSize = DestTy->getScalarSizeInBits();		uint32_t DestBitSize = DestTy->getScalarSizeInBits();

// If the high bits are already filled with zeros, just replace this		// If the high bits are already filled with zeros, just replace this
// cast with the result.		// cast with the result.
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	if (ICI->hasOneUse() &&
return CastInst::CreateIntegerCast(In, CI.getType(), true/SExt/);		return CastInst::CreateIntegerCast(In, CI.getType(), true/SExt/);
}		}
}		}
}		}

return nullptr;		return nullptr;
}		}

/// CanEvaluateSExtd - Return true if we can take the specified value		/// ComputeEvaluateSExtdCost - Return the extra cost if we take the specified
/// and return it as type Ty without inserting any new casts and without		/// value and return it as type Ty without inserting any new casts and without
/// changing the value of the common low bits. This is used by code that tries		/// changing the value of the common low bits. If such evalution is not
/// to promote integer operations to a wider types will allow us to eliminate		/// possible, return INT_MAX.
/// the extension.		///
		/// This is used by code that tries to promote integer operations to a wider
		/// types will allow us to eliminate the extension.
///		///
/// This function works on both vectors and scalars.		/// This function works on both vectors and scalars.
///		///
static bool CanEvaluateSExtd(Value V, Type Ty) {		static int ComputeEvaluateSExtdCost(Value V, Type Ty,
		const TargetTransformInfo &TTI) {
assert(V->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits() &&		assert(V->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits() &&
"Can't sign extend type to a smaller type");		"Can't sign extend type to a smaller type");
// If this is a constant, it can be trivially promoted.		// If this is a constant, it can be trivially promoted.
if (isa<Constant>(V))		if (isa<Constant>(V))
return true;		return 0;

Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
if (!I) return false;		if (!I) return INT_MAX;

// If this is a truncate from the dest type, we can trivially eliminate it.		// If this is a truncate from the dest type, we can trivially eliminate it.
if (isa<TruncInst>(I) && I->getOperand(0)->getType() == Ty)		if (isa<TruncInst>(I) && I->getOperand(0)->getType() == Ty)
return true;		return 0;

// We can't extend or shrink something that has multiple uses: doing so would		// We can't extend or shrink something that has multiple uses: doing so would
// require duplicating the instruction in general, which isn't profitable.		// require duplicating the instruction in general, which isn't profitable.
if (!I->hasOneUse()) return false;		if (!I->hasOneUse()) return INT_MAX;

switch (I->getOpcode()) {		unsigned Opc = I->getOpcode();
		switch (Opc) {
case Instruction::SExt: // sext(sext(x)) -> sext(x)		case Instruction::SExt: // sext(sext(x)) -> sext(x)
case Instruction::ZExt: // sext(zext(x)) -> zext(x)		case Instruction::ZExt: // sext(zext(x)) -> zext(x)
case Instruction::Trunc: // sext(trunc(x)) -> trunc(x) or sext(x)		case Instruction::Trunc: // sext(trunc(x)) -> trunc(x) or sext(x)
return true;		return 0;
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Mul:		case Instruction::Mul: {
// These operators can all arbitrarily be extended if their inputs can.		// These operators can all arbitrarily be extended if their inputs can.
return CanEvaluateSExtd(I->getOperand(0), Ty) &&		int LHSCost = ComputeEvaluateSExtdCost(I->getOperand(0), Ty, TTI);
CanEvaluateSExtd(I->getOperand(1), Ty);		if (LHSCost == INT_MAX) return INT_MAX;
		int RHSCost = ComputeEvaluateSExtdCost(I->getOperand(1), Ty, TTI);
		if (RHSCost == INT_MAX) return INT_MAX;
		return LHSCost + RHSCost + TTI.getArithmeticInstrCost(Opc, Ty) -
		TTI.getArithmeticInstrCost(Opc, I->getType());
		}

//case Instruction::Shl: TODO		//case Instruction::Shl: TODO
//case Instruction::LShr: TODO		//case Instruction::LShr: TODO

case Instruction::Select:		case Instruction::Select: {
return CanEvaluateSExtd(I->getOperand(1), Ty) &&		auto TrueCost = ComputeEvaluateSExtdCost(I->getOperand(1), Ty, TTI);
CanEvaluateSExtd(I->getOperand(2), Ty);		if (TrueCost == INT_MAX) return INT_MAX;
		auto FalseCost = ComputeEvaluateSExtdCost(I->getOperand(2), Ty, TTI);
		if (FalseCost == INT_MAX) return INT_MAX;
		return TrueCost + FalseCost;
		}

case Instruction::PHI: {		case Instruction::PHI: {
// We can change a phi if we can change all operands. Note that we never		// We can change a phi if we can change all operands. Note that we never
// get into trouble with cyclic PHIs here because we only consider		// get into trouble with cyclic PHIs here because we only consider
// instructions with a single use.		// instructions with a single use.
PHINode *PN = cast<PHINode>(I);		PHINode *PN = cast<PHINode>(I);
for (Value *IncValue : PN->incoming_values())		int AccumCost = 0;
if (!CanEvaluateSExtd(IncValue, Ty)) return false;		for (Value *IncValue : PN->incoming_values()) {
return true;		int Cost = ComputeEvaluateSExtdCost(IncValue, Ty, TTI);
		if (Cost == INT_MAX)
		return INT_MAX;
		AccumCost += Cost;
		}
		return AccumCost;
}		}
default:		default:
// TODO: Can handle more cases here.		// TODO: Can handle more cases here.
break;		break;
}		}

return false;		return INT_MAX;
}		}

Instruction *InstCombiner::visitSExt(SExtInst &CI) {		Instruction *InstCombiner::visitSExt(SExtInst &CI) {
// If this sign extend is only used by a truncate, let the truncate be		// If this sign extend is only used by a truncate, let the truncate be
// eliminated before we try to optimize this sext.		// eliminated before we try to optimize this sext.
if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))		if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))
return nullptr;		return nullptr;

Show All 17 Lines	if (KnownZero) {
return ReplaceInstUsesWith(CI, ZExt);		return ReplaceInstUsesWith(CI, ZExt);
}		}

// Attempt to extend the entire input expression tree to the destination		// Attempt to extend the entire input expression tree to the destination
// type. Only do this if the dest type is a simple type, don't convert the		// type. Only do this if the dest type is a simple type, don't convert the
// expression tree to something weird like i93 unless the source is also		// expression tree to something weird like i93 unless the source is also
// strange.		// strange.
if ((DestTy->isVectorTy() \|\| ShouldChangeType(SrcTy, DestTy)) &&		if ((DestTy->isVectorTy() \|\| ShouldChangeType(SrcTy, DestTy)) &&
CanEvaluateSExtd(Src, DestTy)) {		ComputeEvaluateSExtdCost(Src, DestTy, TTI) <= 0) {
// Okay, we can transform this! Insert the new expression now.		// Okay, we can transform this! Insert the new expression now.
DEBUG(dbgs() << "ICE: EvaluateInDifferentType converting expression type"		DEBUG(dbgs() << "ICE: EvaluateInDifferentType converting expression type"
" to avoid sign extend: " << CI);		" to avoid sign extend: " << CI << '\n');
Value *Res = EvaluateInDifferentType(Src, DestTy, true);		Value *Res = EvaluateInDifferentType(Src, DestTy, true);
assert(Res->getType() == DestTy);		assert(Res->getType() == DestTy);

uint32_t SrcBitSize = SrcTy->getScalarSizeInBits();		uint32_t SrcBitSize = SrcTy->getScalarSizeInBits();
uint32_t DestBitSize = DestTy->getScalarSizeInBits();		uint32_t DestBitSize = DestTy->getScalarSizeInBits();

// If the high bits are already filled with sign bit, just replace this		// If the high bits are already filled with sign bit, just replace this
// cast with the result.		// cast with the result.
▲ Show 20 Lines • Show All 806 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineInternal.h

Show All 29 Lines

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

namespace llvm {		namespace llvm {
class CallSite;		class CallSite;
class DataLayout;		class DataLayout;
class DominatorTree;		class DominatorTree;
class TargetLibraryInfo;		class TargetLibraryInfo;
		class TargetTransformInfo;
class DbgDeclareInst;		class DbgDeclareInst;
class MemIntrinsic;		class MemIntrinsic;
class MemSetInst;		class MemSetInst;

/// \brief Assign a complexity or rank value to LLVM Values.		/// \brief Assign a complexity or rank value to LLVM Values.
///		///
/// This routine maps IR values to various complexity ranks:		/// This routine maps IR values to various complexity ranks:
/// 0 -> undef		/// 0 -> undef
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	private:
const bool MinimizeSize;		const bool MinimizeSize;

// Required analyses.		// Required analyses.
// FIXME: These can never be null and should be references.		// FIXME: These can never be null and should be references.
AssumptionCache *AC;		AssumptionCache *AC;
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;
DominatorTree *DT;		DominatorTree *DT;
const DataLayout &DL;		const DataLayout &DL;
		const TargetTransformInfo &TTI;

// Optional analyses. When non-null, these can both be used to do better		// Optional analyses. When non-null, these can both be used to do better
// combining and will be updated to reflect any changes.		// combining and will be updated to reflect any changes.
LoopInfo *LI;		LoopInfo *LI;

bool MadeIRChange;		bool MadeIRChange;

public:		public:
InstCombiner(InstCombineWorklist &Worklist, BuilderTy *Builder,		InstCombiner(InstCombineWorklist &Worklist, BuilderTy *Builder,
bool MinimizeSize, AssumptionCache AC, TargetLibraryInfo TLI,		bool MinimizeSize, AssumptionCache AC, TargetLibraryInfo TLI,
DominatorTree DT, const DataLayout &DL, LoopInfo LI)		DominatorTree *DT, const DataLayout &DL,
		const TargetTransformInfo &TTI, LoopInfo *LI)
: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),		: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),
AC(AC), TLI(TLI), DT(DT), DL(DL), LI(LI), MadeIRChange(false) {}		AC(AC), TLI(TLI), DT(DT), DL(DL), TTI(TTI), LI(LI),
		MadeIRChange(false) {}

/// \brief Run the combiner over the entire worklist until it is empty.		/// \brief Run the combiner over the entire worklist until it is empty.
///		///
/// \returns true if the IR is changed.		/// \returns true if the IR is changed.
bool run();		bool run();

AssumptionCache *getAssumptionCache() const { return AC; }		AssumptionCache *getAssumptionCache() const { return AC; }

▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LibCallSemantics.h"		#include "llvm/Analysis/LibCallSemantics.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
▲ Show 20 Lines • Show All 2,909 Lines • ▼ Show 20 Lines	static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,
}		}

return MadeIRChange;		return MadeIRChange;
}		}

static bool		static bool
combineInstructionsOverFunction(Function &F, InstCombineWorklist &Worklist,		combineInstructionsOverFunction(Function &F, InstCombineWorklist &Worklist,
AssumptionCache &AC, TargetLibraryInfo &TLI,		AssumptionCache &AC, TargetLibraryInfo &TLI,
DominatorTree &DT, LoopInfo *LI = nullptr) {		DominatorTree &DT,
		const TargetTransformInfo &TTI,
		LoopInfo *LI = nullptr) {
// Minimizing size?		// Minimizing size?
bool MinimizeSize = F.hasFnAttribute(Attribute::MinSize);		bool MinimizeSize = F.hasFnAttribute(Attribute::MinSize);
auto &DL = F.getParent()->getDataLayout();		auto &DL = F.getParent()->getDataLayout();

/// Builder - This is an IRBuilder that automatically inserts new		/// Builder - This is an IRBuilder that automatically inserts new
/// instructions into the worklist when they are created.		/// instructions into the worklist when they are created.
IRBuilder<true, TargetFolder, InstCombineIRInserter> Builder(		IRBuilder<true, TargetFolder, InstCombineIRInserter> Builder(
F.getContext(), TargetFolder(DL), InstCombineIRInserter(Worklist, &AC));		F.getContext(), TargetFolder(DL), InstCombineIRInserter(Worklist, &AC));

// Lower dbg.declare intrinsics otherwise their value may be clobbered		// Lower dbg.declare intrinsics otherwise their value may be clobbered
// by instcombiner.		// by instcombiner.
bool DbgDeclaresChanged = LowerDbgDeclare(F);		bool DbgDeclaresChanged = LowerDbgDeclare(F);

// Iterate while there is work to do.		// Iterate while there is work to do.
int Iteration = 0;		int Iteration = 0;
for (;;) {		for (;;) {
++Iteration;		++Iteration;
DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

bool Changed = false;		bool Changed = false;
if (prepareICWorklistFromFunction(F, DL, &TLI, Worklist))		if (prepareICWorklistFromFunction(F, DL, &TLI, Worklist))
Changed = true;		Changed = true;

InstCombiner IC(Worklist, &Builder, MinimizeSize, &AC, &TLI, &DT, DL, LI);		InstCombiner IC(Worklist, &Builder, MinimizeSize, &AC, &TLI, &DT, DL, TTI,
		LI);
if (IC.run())		if (IC.run())
Changed = true;		Changed = true;

if (!Changed)		if (!Changed)
break;		break;
}		}

return DbgDeclaresChanged \|\| Iteration > 1;		return DbgDeclaresChanged \|\| Iteration > 1;
}		}

PreservedAnalyses InstCombinePass::run(Function &F,		PreservedAnalyses InstCombinePass::run(Function &F,
AnalysisManager<Function> *AM) {		AnalysisManager<Function> *AM) {
auto &AC = AM->getResult<AssumptionAnalysis>(F);		auto &AC = AM->getResult<AssumptionAnalysis>(F);
auto &DT = AM->getResult<DominatorTreeAnalysis>(F);		auto &DT = AM->getResult<DominatorTreeAnalysis>(F);
auto &TLI = AM->getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM->getResult<TargetLibraryAnalysis>(F);
		const auto &TTI = AM->getResult<TargetIRAnalysis>(F);
auto *LI = AM->getCachedResult<LoopAnalysis>(F);		auto *LI = AM->getCachedResult<LoopAnalysis>(F);

if (!combineInstructionsOverFunction(F, Worklist, AC, TLI, DT, LI))		if (!combineInstructionsOverFunction(F, Worklist, AC, TLI, DT, TTI, LI))
// No changes, all analyses are preserved.		// No changes, all analyses are preserved.
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// Mark all the analyses that instcombine updates as preserved.		// Mark all the analyses that instcombine updates as preserved.
// FIXME: Need a way to preserve CFG analyses here!		// FIXME: Need a way to preserve CFG analyses here!
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
return PA;		return PA;
Show All 18 Lines	public:
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;
};		};
}		}

void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {		void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
}		}

bool InstructionCombiningPass::runOnFunction(Function &F) {		bool InstructionCombiningPass::runOnFunction(Function &F) {
if (skipOptnoneFunction(F))		if (skipOptnoneFunction(F))
return false;		return false;

// Required analyses.		// Required analyses.
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
		const auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();

// Optional analyses.		// Optional analyses.
auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();		auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;		auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;

return combineInstructionsOverFunction(F, Worklist, AC, TLI, DT, LI);		return combineInstructionsOverFunction(F, Worklist, AC, TLI, DT, TTI, LI);
}		}

char InstructionCombiningPass::ID = 0;		char InstructionCombiningPass::ID = 0;
INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",		INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",
"Combine redundant instructions", false, false)		"Combine redundant instructions", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",		INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",
"Combine redundant instructions", false, false)		"Combine redundant instructions", false, false)

// Initialization Routines		// Initialization Routines
void llvm::initializeInstCombine(PassRegistry &Registry) {		void llvm::initializeInstCombine(PassRegistry &Registry) {
initializeInstructionCombiningPassPass(Registry);		initializeInstructionCombiningPassPass(Registry);
}		}

void LLVMInitializeInstCombine(LLVMPassRegistryRef R) {		void LLVMInitializeInstCombine(LLVMPassRegistryRef R) {
initializeInstructionCombiningPassPass(*unwrap(R));		initializeInstructionCombiningPassPass(*unwrap(R));
}		}

FunctionPass *llvm::createInstructionCombiningPass() {		FunctionPass *llvm::createInstructionCombiningPass() {
return new InstructionCombiningPass();		return new InstructionCombiningPass();
}		}

test/Transforms/InstCombine/NVPTX/lit.local.cfg

This file was added.

				if not 'NVPTX' in config.root.targets:
				config.unsupported = True

test/Transforms/InstCombine/NVPTX/no-widen-expensive.ll

This file was added.

				; RUN: opt < %s -instcombine -S \| FileCheck %s
				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
				target triple = "nvptx64-unknown-unknown"

				; For the nvptx64 architecture, the cost of an arithmetic instruction on a
				; 64-bit integer is twice as expensive as that on a 32-bit integer, because the
				; hardware needs to simulate a 64-bit integer using two 32-bit integers.
				; Therefore, in this particular architecture, we should not widen induction
				; variables to 64-bit integers even though i64 is a legal type in the 64-bit
				; PTX ISA.

				define i64 @test(i64 %A) {
				; CHECK-LABEL: @test
				%trunc = trunc i64 %A to i32
				; CHECK-NOT: xor i64
				%xor = xor i32 %trunc, 23
				; CHECK-NOT: and i64
				%and = and i32 %xor, 112
				; CHECK: zext i32
				%zext = zext i32 %and to i64
				ret i64 %zext
				}