This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
6
SLPVectorizer.cpp

Differential D49225

[SLPVectorizer] Move scalar/vector costs to helper functions (NFCI).
AbandonedPublic

Authored by RKSimon on Jul 12 2018, 2:40 AM.

Download Raw Diff

Details

Reviewers

ABataev
dtemirbulatov
spatel

Summary

As detailed on D49135, this patch moves most of the opcode scalar/vector cost calculations into helper functions. This is primarily to avoid repetition in the shufflevector case for alternate opcodes, but it also demonstrates how many of the opcode groups share the same cost calculation code.

I haven't touched the ExtractValue/ExtractElement costs - these might require more work before we can use them correctly for PR30787/D28907 'copyable' cases.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon created this revision.Jul 12 2018, 2:40 AM

ping?

ABataev added inline comments.Jul 16 2018, 10:55 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610	Can we just use `TTI->getInstructionCost(I, TargetTransformInfo::TCK_RecipThroughput);` instead of this function?
612–615	The same question here

RKSimon added inline comments.Jul 16 2018, 1:28 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610	I've kept more closely to the original code than might be necessary - more of the instructions could use TTI->getInstructionCost directly in the switch statements - but enums like GEP and Cast have minor diffs that we seem to be relying on....
612–615	Again, there are some diffs in the calls but we might be able to reuse more than we do - the TTI->getInstructionCost isn't really designed to take a scalar Instruction and a vector Type.

ABataev added inline comments.Jul 16 2018, 1:35 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610	Can we use this for load, store and calls?

RKSimon added inline comments.Jul 17 2018, 3:51 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610	Load/Store could use TTI->getInstructionCost but it will mean we're not hard coding the address space to 0 anymore (we'll use the StoreInst/LoadInst getPointerAddressSpace()) Call should be fine - we might want to keep an assert to check that its an IntrinsicInst type.

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

In D49225#1164807, @RKSimon wrote:

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

I tried to do something similar some time ago, but I did not like it. Instead of one switch for opcode we have 2. But we already know the opcode in many cases and, actually, the second switch is required only for the shuffles. Maybe it is worth it to outline several standalone functions for PHIs, CmpInsts, BinOps etc. and use them directly where we know the opcode and use these getScalarCost/getVectorCost only for shuffles? Of course, these 2 functions also should call these outlined cost functions for each particular opcode.

In D49225#1164986, @ABataev wrote:

In D49225#1164807, @RKSimon wrote:

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

I tried to do something similar some time ago, but I did not like it. Instead of one switch for opcode we have 2. But we already know the opcode in many cases and, actually, the second switch is required only for the shuffles. Maybe it is worth it to outline several standalone functions for PHIs, CmpInsts, BinOps etc. and use them directly where we know the opcode and use these getScalarCost/getVectorCost only for shuffles? Of course, these 2 functions also should call these outlined cost functions for each particular opcode.

I can investigate ways to return std::pair<int, int> if you think that might be better? That would avoid the scalar/vector switch statement duplication, TTI->getInstructionCost will be doing something very similar.

An alternative would be to handle shufflevector/alternate separately (like we do for gather stages) and split getEntryCost into 2 functions - the special cases (gathers/alternates etc.) and the costs for particular opcodes.

In D49225#1165016, @RKSimon wrote:

In D49225#1164986, @ABataev wrote:

In D49225#1164807, @RKSimon wrote:

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

I tried to do something similar some time ago, but I did not like it. Instead of one switch for opcode we have 2. But we already know the opcode in many cases and, actually, the second switch is required only for the shuffles. Maybe it is worth it to outline several standalone functions for PHIs, CmpInsts, BinOps etc. and use them directly where we know the opcode and use these getScalarCost/getVectorCost only for shuffles? Of course, these 2 functions also should call these outlined cost functions for each particular opcode.

I can investigate ways to return std::pair<int, int> if you think that might be better? That would avoid the scalar/vector switch statement duplication, TTI->getInstructionCost will be doing something very similar.

Why do we need to return pair? You want to reuse the same function for scalar/vector? I meant, that we have the first switch when we looking through the opcodes in getEntryCost and then we have 2 additional switches when we calling getScalarCost and getVectorCost.
BTW, do we really need the separate cost for scalar/vector ops or we can calculate the final cost immediately? I mean, instead of getSclarCost/getVectorCost we may have just getPHICost, getBinOpCOst, etc. + getCost that will call all these functions for the shuffle? These functions will calculate the difference between the vector/scalar cost.

An alternative would be to handle shufflevector/alternate separately (like we do for gather stages) and split getEntryCost into 2 functions - the special cases (gathers/alternates etc.) and the costs for particular opcodes.

Yes, probably it would be better and that's what I actually suggested.

RKSimon abandoned this revision.Jul 18 2018, 6:57 AM

RKSimon mentioned this in rL337390: [SLPVectorizer] Avoid duplicate scalar cost calculations in BoUpSLP….Jul 18 2018, 7:00 AM

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

	SLPVectorizer.cpp
	SLPVectorizer.cpp (revision 336895)

383 lines

Diff 155133

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 600 Lines • ▼ Show 20 Lines	private:
struct TreeEntry;		struct TreeEntry;

/// Checks if all users of \p I are the part of the vectorization tree.		/// Checks if all users of \p I are the part of the vectorization tree.
bool areAllUsersVectorized(Instruction *I) const;		bool areAllUsersVectorized(Instruction *I) const;

/// \returns the cost of the vectorizable entry.		/// \returns the cost of the vectorizable entry.
int getEntryCost(TreeEntry *E);		int getEntryCost(TreeEntry *E);

		/// \returns the cost of the scalar instruction \p I.
		int getScalarCost(Instruction I, Type DstSclTy);
		ABataevUnsubmitted Not Done Reply Inline Actions Can we just use `TTI->getInstructionCost(I, TargetTransformInfo::TCK_RecipThroughput);` instead of this function? ABataev: Can we just use `TTI->getInstructionCost(I, TargetTransformInfo::TCK_RecipThroughput);` instead…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions I've kept more closely to the original code than might be necessary - more of the instructions could use TTI->getInstructionCost directly in the switch statements - but enums like GEP and Cast have minor diffs that we seem to be relying on.... RKSimon: I've kept more closely to the original code than might be necessary - more of the instructions…
		ABataevUnsubmitted Not Done Reply Inline Actions Can we use this for load, store and calls? ABataev: Can we use this for load, store and calls?
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Load/Store could use TTI->getInstructionCost but it will mean we're not hard coding the address space to 0 anymore (we'll use the StoreInst/LoadInst getPointerAddressSpace()) Call should be fine - we might want to keep an assert to check that its an IntrinsicInst type. RKSimon: Load/Store could use TTI->getInstructionCost but it will mean we're not hard coding the address…

		/// \returns the cost of a vectorized version of the scalar instruction \p I.
		int getVectorCost(Instruction I, ArrayRef<Value > VL, Type *DstSclTy,
		VectorType *DstVecTy);

		ABataevUnsubmitted Not Done Reply Inline Actions The same question here ABataev: The same question here
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Again, there are some diffs in the calls but we might be able to reuse more than we do - the TTI->getInstructionCost isn't really designed to take a scalar Instruction and a vector Type. RKSimon: Again, there are some diffs in the calls but we might be able to reuse more than we do - the…
/// This is the recursive part of buildTree.		/// This is the recursive part of buildTree.
void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth, int);		void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth, int);

/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can		/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can
/// be vectorized to use the original vector (or aggregate "bitcast" to a		/// be vectorized to use the original vector (or aggregate "bitcast" to a
/// vector) and sets \p CurrentOrder to the identity permutation; otherwise		/// vector) and sets \p CurrentOrder to the identity permutation; otherwise
/// returns false, setting \p CurrentOrder to either an empty vector or a		/// returns false, setting \p CurrentOrder to either an empty vector or a
/// non-identity permutation that allows to reuse extract instructions.		/// non-identity permutation that allows to reuse extract instructions.
▲ Show 20 Lines • Show All 1,562 Lines • ▼ Show 20 Lines	switch (ShuffleOrOp) {
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		int ScalarEltCost = getScalarCost(VL0, ScalarTy);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -=		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
(ReuseShuffleNumbers - VL.size()) *
TTI->getCastInstrCost(S.getOpcode(), ScalarTy, SrcTy, VL0);
}		}

// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
int ScalarCost = VL.size() * TTI->getCastInstrCost(VL0->getOpcode(),		int ScalarCost = VL.size() * ScalarEltCost;
VL0->getType(), SrcTy, VL0);

		Type *SrcTy = VL0->getOperand(0)->getType();
VectorType *SrcVecTy = VectorType::get(SrcTy, VL.size());		VectorType *SrcVecTy = VectorType::get(SrcTy, VL.size());
int VecCost = 0;		int VecCost = 0;
// Check if the values are candidates to demote.		// Check if the values are candidates to demote.
if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {		if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {
VecCost = ReuseShuffleCost +		VecCost = ReuseShuffleCost + getVectorCost(VL0, VL, ScalarTy, VecTy);
TTI->getCastInstrCost(VL0->getOpcode(), VecTy, SrcVecTy, VL0);
}		}
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Select: {		case Instruction::Select: {
// Calculate the cost of this instruction.		int ScalarEltCost = getScalarCost(VL0, ScalarTy);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
TTI->getCmpSelInstrCost(S.getOpcode(), ScalarTy,		}
Builder.getInt1Ty(), VL0);		int ScalarCost = VecTy->getNumElements() * ScalarEltCost;
}		int VecCost = getVectorCost(VL0, VL, ScalarTy, VecTy);
VectorType *MaskTy = VectorType::get(Builder.getInt1Ty(), VL.size());
int ScalarCost = VecTy->getNumElements() *
TTI->getCmpSelInstrCost(S.getOpcode(), ScalarTy,
Builder.getInt1Ty(), VL0);
int VecCost = TTI->getCmpSelInstrCost(S.getOpcode(), VecTy, MaskTy, VL0);
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::FDiv:		case Instruction::FDiv:
case Instruction::URem:		case Instruction::URem:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::FRem:		case Instruction::FRem:
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
// Certain instructions can be cheaper to vectorize if they have a		int ScalarEltCost = getScalarCost(VL0, ScalarTy);
// constant second vector operand.
TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;
TargetTransformInfo::OperandValueProperties Op1VP =
TargetTransformInfo::OP_None;
TargetTransformInfo::OperandValueProperties Op2VP =
TargetTransformInfo::OP_PowerOf2;

// If all operands are exactly the same ConstantInt then set the
// operand kind to OK_UniformConstantValue.
// If instead not all operands are constants, then set the operand kind
// to OK_AnyValue. If all operands are constants but not the same,
// then set the operand kind to OK_NonUniformConstantValue.
ConstantInt *CInt0 = nullptr;
for (unsigned i = 0, e = VL.size(); i < e; ++i) {
const Instruction *I = cast<Instruction>(VL[i]);
ConstantInt *CInt = dyn_cast<ConstantInt>(I->getOperand(1));
if (!CInt) {
Op2VK = TargetTransformInfo::OK_AnyValue;
Op2VP = TargetTransformInfo::OP_None;
break;
}
if (Op2VP == TargetTransformInfo::OP_PowerOf2 &&
!CInt->getValue().isPowerOf2())
Op2VP = TargetTransformInfo::OP_None;
if (i == 0) {
CInt0 = CInt;
continue;
}
if (CInt0 != CInt)
Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
}

SmallVector<const Value *, 4> Operands(VL0->operand_values());
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -=		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
(ReuseShuffleNumbers - VL.size()) *		}
TTI->getArithmeticInstrCost(S.getOpcode(), ScalarTy, Op1VK, Op2VK,		int ScalarCost = VecTy->getNumElements() * ScalarEltCost;
Op1VP, Op2VP, Operands);		int VecCost = getVectorCost(VL0, VL, ScalarTy, VecTy);
}
int ScalarCost =
VecTy->getNumElements() *
TTI->getArithmeticInstrCost(S.getOpcode(), ScalarTy, Op1VK, Op2VK,
Op1VP, Op2VP, Operands);
int VecCost = TTI->getArithmeticInstrCost(S.getOpcode(), VecTy, Op1VK,
Op2VK, Op1VP, Op2VP, Operands);
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		int ScalarEltCost = getScalarCost(VL0, ScalarTy);
TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;

if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
TTI->getArithmeticInstrCost(Instruction::Add,		}
ScalarTy, Op1VK, Op2VK);		int ScalarCost = VecTy->getNumElements() * ScalarEltCost;
}		int VecCost = getVectorCost(VL0, VL, ScalarTy, VecTy);
int ScalarCost =
VecTy->getNumElements() *
TTI->getArithmeticInstrCost(Instruction::Add, ScalarTy, Op1VK, Op2VK);
int VecCost =
TTI->getArithmeticInstrCost(Instruction::Add, VecTy, Op1VK, Op2VK);

return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Cost of wide load - cost of scalar loads.		// Cost of wide load - cost of scalar loads.
unsigned alignment = cast<LoadInst>(VL0)->getAlignment();		int ScalarEltCost = getScalarCost(VL0, ScalarTy);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
TTI->getMemoryOpCost(Instruction::Load, ScalarTy,		}
alignment, 0, VL0);		int ScalarLdCost = VecTy->getNumElements() * ScalarEltCost;
}		int VecLdCost = getVectorCost(VL0, VL, ScalarTy, VecTy);
int ScalarLdCost = VecTy->getNumElements() *
TTI->getMemoryOpCost(Instruction::Load, ScalarTy, alignment, 0, VL0);
int VecLdCost = TTI->getMemoryOpCost(Instruction::Load,
VecTy, alignment, 0, VL0);
if (!E->ReorderIndices.empty()) {		if (!E->ReorderIndices.empty()) {
// TODO: Merge this shuffle with the ReuseShuffleCost.		// TODO: Merge this shuffle with the ReuseShuffleCost.
VecLdCost += TTI->getShuffleCost(		VecLdCost += TTI->getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc, VecTy);		TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
}		}
return ReuseShuffleCost + VecLdCost - ScalarLdCost;		return ReuseShuffleCost + VecLdCost - ScalarLdCost;
}		}
case Instruction::Store: {		case Instruction::Store: {
// We know that we can merge the stores. Calculate the cost.		// We know that we can merge the stores. Calculate the cost.
unsigned alignment = cast<StoreInst>(VL0)->getAlignment();		int ScalarEltCost = getScalarCost(VL0, ScalarTy);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
TTI->getMemoryOpCost(Instruction::Store, ScalarTy,		}
alignment, 0, VL0);		int ScalarStCost = VecTy->getNumElements() * ScalarEltCost;
}		int VecStCost = getVectorCost(VL0, VL, ScalarTy, VecTy);
int ScalarStCost = VecTy->getNumElements() *
TTI->getMemoryOpCost(Instruction::Store, ScalarTy, alignment, 0, VL0);
int VecStCost = TTI->getMemoryOpCost(Instruction::Store,
VecTy, alignment, 0, VL0);
return ReuseShuffleCost + VecStCost - ScalarStCost;		return ReuseShuffleCost + VecStCost - ScalarStCost;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		int ScalarEltCost = getScalarCost(VL0, ScalarTy);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// Calculate the cost of the scalar and vector calls.
SmallVector<Type*, 4> ScalarTys;
for (unsigned op = 0, opc = CI->getNumArgOperands(); op!= opc; ++op)
ScalarTys.push_back(CI->getArgOperand(op)->getType());

FastMathFlags FMF;
if (auto *FPMO = dyn_cast<FPMathOperator>(CI))
FMF = FPMO->getFastMathFlags();

if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -=		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
(ReuseShuffleNumbers - VL.size()) *
TTI->getIntrinsicInstrCost(ID, ScalarTy, ScalarTys, FMF);
}		}
int ScalarCallCost = VecTy->getNumElements() *		int ScalarCallCost = VecTy->getNumElements() * ScalarEltCost;
TTI->getIntrinsicInstrCost(ID, ScalarTy, ScalarTys, FMF);		int VecCallCost = getVectorCost(VL0, VL, ScalarTy, VecTy);

SmallVector<Value *, 4> Args(CI->arg_operands());
int VecCallCost = TTI->getIntrinsicInstrCost(ID, CI->getType(), Args, FMF,
VecTy->getNumElements());

LLVM_DEBUG(dbgs() << "SLP: Call cost " << VecCallCost - ScalarCallCost		LLVM_DEBUG(dbgs() << "SLP: Call cost " << VecCallCost - ScalarCallCost
<< " (" << VecCallCost << "-" << ScalarCallCost << ")"		<< " (" << VecCallCost << "-" << ScalarCallCost << ")"
<< " for " << *CI << "\n");		<< " for " << *cast<CallInst>(VL0) << "\n");

return ReuseShuffleCost + VecCallCost - ScalarCallCost;		return ReuseShuffleCost + VecCallCost - ScalarCallCost;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
assert(S.isAltShuffle() &&		assert(S.isAltShuffle() &&
((Instruction::isBinaryOp(S.getOpcode()) &&		((Instruction::isBinaryOp(S.getOpcode()) &&
Instruction::isBinaryOp(S.getAltOpcode())) \|\|		Instruction::isBinaryOp(S.getAltOpcode())) \|\|
(Instruction::isCast(S.getOpcode()) &&		(Instruction::isCast(S.getOpcode()) &&
Instruction::isCast(S.getAltOpcode()))) &&		Instruction::isCast(S.getAltOpcode()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");
int ScalarCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
for (unsigned Idx : E->ReuseShuffleIndices) {		for (unsigned Idx : E->ReuseShuffleIndices) {
Instruction *I = cast<Instruction>(VL[Idx]);		Instruction *I = cast<Instruction>(VL[Idx]);
ReuseShuffleCost -= TTI->getInstructionCost(		ReuseShuffleCost -= getScalarCost(I, I->getType());
I, TargetTransformInfo::TCK_RecipThroughput);
}		}
for (Value *V : VL) {		for (Value *V : VL) {
Instruction *I = cast<Instruction>(V);		Instruction *I = cast<Instruction>(V);
ReuseShuffleCost += TTI->getInstructionCost(		ReuseShuffleCost += getScalarCost(I, I->getType());
I, TargetTransformInfo::TCK_RecipThroughput);
}		}
}		}
int VecCost = 0;		int ScalarCost = 0;
for (Value *i : VL) {		for (Value *i : VL) {
Instruction *I = cast<Instruction>(i);		Instruction *I = cast<Instruction>(i);
assert(S.isOpcodeOrAlt(I) && "Unexpected main/alternate opcode");		assert(S.isOpcodeOrAlt(I) && "Unexpected main/alternate opcode");
ScalarCost += TTI->getInstructionCost(		ScalarCost += TTI->getInstructionCost(
I, TargetTransformInfo::TCK_RecipThroughput);		I, TargetTransformInfo::TCK_RecipThroughput);
}		}
// VecCost is equal to sum of the cost of creating 2 vectors		// VecCost is equal to sum of the cost of creating 2 vectors
// and the cost of creating shuffle.		// and the cost of creating shuffle.
if (Instruction::isBinaryOp(S.getOpcode())) {		int VecCost = 0;
VecCost = TTI->getArithmeticInstrCost(S.getOpcode(), VecTy);		VecCost = getVectorCost(S.MainOp, VL, S.MainOp->getType(), VecTy);
VecCost += TTI->getArithmeticInstrCost(S.getAltOpcode(), VecTy);		VecCost += getVectorCost(S.AltOp, VL, S.AltOp->getType(), VecTy);
} else {
Type *Src0SclTy = S.MainOp->getOperand(0)->getType();
Type *Src1SclTy = S.AltOp->getOperand(0)->getType();
VectorType *Src0Ty = VectorType::get(Src0SclTy, VL.size());
VectorType *Src1Ty = VectorType::get(Src1SclTy, VL.size());
VecCost = TTI->getCastInstrCost(S.getOpcode(), VecTy, Src0Ty);
VecCost += TTI->getCastInstrCost(S.getAltOpcode(), VecTy, Src1Ty);
}
VecCost += TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, 0);		VecCost += TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, 0);
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
default:		default:
llvm_unreachable("Unknown instruction");		llvm_unreachable("Unknown instruction");
}		}
}		}

		int BoUpSLP::getScalarCost(Instruction I, Type DstTy) {
		unsigned Opcode = I->getOpcode();
		switch (Opcode) {
		case Instruction::PHI:
		return 0;
		case Instruction::ZExt:
		case Instruction::SExt:
		case Instruction::FPToUI:
		case Instruction::FPToSI:
		case Instruction::FPExt:
		case Instruction::PtrToInt:
		case Instruction::IntToPtr:
		case Instruction::SIToFP:
		case Instruction::UIToFP:
		case Instruction::Trunc:
		case Instruction::FPTrunc:
		case Instruction::BitCast: {
		Type *SrcTy = I->getOperand(0)->getType();
		return TTI->getCastInstrCost(Opcode, DstTy, SrcTy, I);
		}
		case Instruction::FCmp:
		case Instruction::ICmp:
		case Instruction::Select:
		return TTI->getCmpSelInstrCost(Opcode, DstTy, Builder.getInt1Ty(), I);
		case Instruction::Add:
		case Instruction::FAdd:
		case Instruction::Sub:
		case Instruction::FSub:
		case Instruction::Mul:
		case Instruction::FMul:
		case Instruction::UDiv:
		case Instruction::SDiv:
		case Instruction::FDiv:
		case Instruction::URem:
		case Instruction::SRem:
		case Instruction::FRem:
		case Instruction::Shl:
		case Instruction::LShr:
		case Instruction::AShr:
		case Instruction::And:
		case Instruction::Or:
		case Instruction::Xor:
		return TTI->getInstructionCost(I, TargetTransformInfo::TCK_RecipThroughput);
		case Instruction::GetElementPtr: {
		TargetTransformInfo::OperandValueKind Op1VK =
		TargetTransformInfo::OK_AnyValue;
		TargetTransformInfo::OperandValueKind Op2VK =
		TargetTransformInfo::OK_UniformConstantValue;
		return TTI->getArithmeticInstrCost(Instruction::Add, DstTy, Op1VK, Op2VK);
		}
		case Instruction::Load: {
		unsigned alignment = cast<LoadInst>(I)->getAlignment();
		return TTI->getMemoryOpCost(Instruction::Load, DstTy, alignment, 0, I);
		}
		case Instruction::Store: {
		unsigned alignment = cast<StoreInst>(I)->getAlignment();
		return TTI->getMemoryOpCost(Instruction::Store, DstTy, alignment, 0, I);
		}
		case Instruction::Call: {
		CallInst *CI = cast<CallInst>(I);
		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

		SmallVector<Type *, 4> ScalarTys;
		for (unsigned op = 0, opc = CI->getNumArgOperands(); op != opc; ++op)
		ScalarTys.push_back(CI->getArgOperand(op)->getType());

		FastMathFlags FMF;
		if (auto *FPMO = dyn_cast<FPMathOperator>(CI))
		FMF = FPMO->getFastMathFlags();

		return TTI->getIntrinsicInstrCost(ID, DstTy, ScalarTys, FMF);
		}
		}

		llvm_unreachable("Unknown instruction");
		}

		int BoUpSLP::getVectorCost(Instruction I, ArrayRef<Value > VL, Type *DstSclTy,
		VectorType *DstVecTy) {
		unsigned Opcode = I->getOpcode();
		switch (Opcode) {
		case Instruction::PHI:
		return 0;
		case Instruction::ZExt:
		case Instruction::SExt:
		case Instruction::FPToUI:
		case Instruction::FPToSI:
		case Instruction::FPExt:
		case Instruction::PtrToInt:
		case Instruction::IntToPtr:
		case Instruction::SIToFP:
		case Instruction::UIToFP:
		case Instruction::Trunc:
		case Instruction::FPTrunc:
		case Instruction::BitCast: {
		Type *SrcSclTy = I->getOperand(0)->getType();
		Type *SrcVecTy = VectorType::get(SrcSclTy, VL.size());
		return TTI->getCastInstrCost(Opcode, DstVecTy, SrcVecTy, I);
		}
		case Instruction::FCmp:
		case Instruction::ICmp:
		case Instruction::Select: {
		Type *MaskSclTy = Builder.getInt1Ty();
		Type *MaskVecTy = VectorType::get(MaskSclTy, VL.size());
		return TTI->getCmpSelInstrCost(Opcode, DstVecTy, MaskVecTy, I);
		}
		case Instruction::Add:
		case Instruction::FAdd:
		case Instruction::Sub:
		case Instruction::FSub:
		case Instruction::Mul:
		case Instruction::FMul:
		case Instruction::UDiv:
		case Instruction::SDiv:
		case Instruction::FDiv:
		case Instruction::URem:
		case Instruction::SRem:
		case Instruction::FRem:
		case Instruction::Shl:
		case Instruction::LShr:
		case Instruction::AShr:
		case Instruction::And:
		case Instruction::Or:
		case Instruction::Xor: {
		// Certain instructions can be cheaper to vectorize if they have a
		// constant second vector operand.
		TargetTransformInfo::OperandValueKind Op1VK =
		TargetTransformInfo::OK_AnyValue;
		TargetTransformInfo::OperandValueKind Op2VK =
		TargetTransformInfo::OK_UniformConstantValue;
		TargetTransformInfo::OperandValueProperties Op1VP =
		TargetTransformInfo::OP_None;
		TargetTransformInfo::OperandValueProperties Op2VP =
		TargetTransformInfo::OP_PowerOf2;

		// If all operands are exactly the same ConstantInt then set the
		// operand kind to OK_UniformConstantValue.
		// If instead not all operands are constants, then set the operand kind
		// to OK_AnyValue. If all operands are constants but not the same,
		// then set the operand kind to OK_NonUniformConstantValue.
		ConstantInt *CInt0 = nullptr;
		for (unsigned i = 0, e = VL.size(); i < e; ++i) {
		const Instruction *I = cast<Instruction>(VL[i]);
		ConstantInt *CInt = dyn_cast<ConstantInt>(I->getOperand(1));
		if (!CInt) {
		Op2VK = TargetTransformInfo::OK_AnyValue;
		Op2VP = TargetTransformInfo::OP_None;
		break;
		}
		if (Op2VP == TargetTransformInfo::OP_PowerOf2 &&
		!CInt->getValue().isPowerOf2())
		Op2VP = TargetTransformInfo::OP_None;
		if (i == 0) {
		CInt0 = CInt;
		continue;
		}
		if (CInt0 != CInt)
		Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
		}
		SmallVector<const Value *, 4> Operands(I->operand_values());
		return TTI->getArithmeticInstrCost(Opcode, DstVecTy, Op1VK, Op2VK, Op1VP,
		Op2VP, Operands);
		}
		case Instruction::GetElementPtr: {
		TargetTransformInfo::OperandValueKind Op1VK =
		TargetTransformInfo::OK_AnyValue;
		TargetTransformInfo::OperandValueKind Op2VK =
		TargetTransformInfo::OK_UniformConstantValue;
		return TTI->getArithmeticInstrCost(Instruction::Add, DstVecTy, Op1VK,
		Op2VK);
		}
		case Instruction::Load: {
		unsigned alignment = cast<LoadInst>(I)->getAlignment();
		return TTI->getMemoryOpCost(Instruction::Load, DstVecTy, alignment, 0, I);
		}
		case Instruction::Store: {
		unsigned alignment = cast<StoreInst>(I)->getAlignment();
		return TTI->getMemoryOpCost(Instruction::Store, DstVecTy, alignment, 0, I);
		}
		case Instruction::Call: {
		CallInst *CI = cast<CallInst>(I);
		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

		FastMathFlags FMF;
		if (auto *FPMO = dyn_cast<FPMathOperator>(CI))
		FMF = FPMO->getFastMathFlags();

		SmallVector<Value *, 4> Args(CI->arg_operands());
		return TTI->getIntrinsicInstrCost(ID, DstSclTy, Args, FMF, VL.size());
		}
		}

		llvm_unreachable("Unknown instruction");
		}

bool BoUpSLP::isFullyVectorizableTinyTree() {		bool BoUpSLP::isFullyVectorizableTinyTree() {
LLVM_DEBUG(dbgs() << "SLP: Check whether the tree with height "		LLVM_DEBUG(dbgs() << "SLP: Check whether the tree with height "
<< VectorizableTree.size() << " is fully vectorizable .\n");		<< VectorizableTree.size() << " is fully vectorizable .\n");

// We only handle trees of heights 1 and 2.		// We only handle trees of heights 1 and 2.
if (VectorizableTree.size() == 1 && !VectorizableTree[0].NeedToGather)		if (VectorizableTree.size() == 1 && !VectorizableTree[0].NeedToGather)
return true;		return true;

▲ Show 20 Lines • Show All 4,055 Lines • Show Last 20 Lines