This is an archive of the discontinued LLVM Phabricator instance.

[SLP] PR32078: convert scalar operations to vector.
Needs ReviewPublic

Authored by ABataev on Mar 7 2017, 1:38 AM.

Download Raw Diff

Details

Reviewers

mzolotukhin
mkuper
RKSimon
spatel
vporpo
anton-afanasyev

Summary

If we have a code like:

%xn = extractelement <2 x i32> %x, i32 %n
%yn = extractelement <2 x i32> %y, i32 %n
%cmpn = icmp eq i32 %xn, %yn

we can convert it to something like this:

%cmp = icmp eq <2 x i32> %x, %y
%cmpn = extractelement <2 x i1> %cmp0, i32 %n

if the cost of the second is less than the cost of the original code.

Diff Detail

Build Status

Buildable 4572
Build 4572: arc lint + arc unit

Event Timeline

ABataev created this revision.Mar 7 2017, 1:38 AM

ABataev added a parent revision: D30649: [SLP] Function for instruction cost calculation, NFC..Mar 7 2017, 1:39 AM

ABataev added a subscriber: spatel.

I haven't looked at the patch in detail yet, but I don't understand the rationale for putting it in the SLP vectorizer.
If transforming this pattern is a win, it's a win regardless of whether it originated in the SLP vectorizer, or just happened to appear in the IR.

Would this be something InstCombine could do? Or, if it's necessarily cost-dependent, then some later cleanup pass? Or DAGCombine?
If the reason you're not doing it in a later pass is because you don't have the cost model, then it may be a good idea to split out the change you have in D30649 into a utility, instead of having it live in the SLP vectorizer. We arleady have lib/Analysis/CostModel, but it doesn't really provide any useful interface.

ABataev abandoned this revision.Apr 14 2017, 11:48 AM

Michael, tried to implement it in InstCombiner, but we need cost analysis. Later passes are not suitable for this, because we need transformed code in InstCombiner. So, reopened it.

For reference, the instcombine patch proposal was D32093.

RKSimon added a subscriber: RKSimon.May 18 2017, 8:49 AM

RKSimon added reviewers: RKSimon, spatel.Sep 8 2017, 2:21 PM

@ABataev Would it be worth resurrecting this or starting again? We have a similar test case in https://bugs.llvm.org/show_bug.cgi?id=44008

current:

%L0 = extractelement <2 x float> %1, i32 0
%L1 = extractelement <2 x float> %1, i32 1
%Mul0 = fmul float %L0, 2.000000e+00
%Mul1 = fmul float %L1, 2.000000e+00

better:

%Mul = fmul <2 x float> %1, <2.000000e+00, 2.000000e+00>
%L0 = extractelement <2 x float> %Mul, i32 0
%L1 = extractelement <2 x float> %Mul, i32 1

In D30686#1795400, @RKSimon wrote:
@ABataev Would it be worth resurrecting this or starting again? We have a similar test case in https://bugs.llvm.org/show_bug.cgi?id=44008

current:
%L0 = extractelement <2 x float> %1, i32 0
%L1 = extractelement <2 x float> %1, i32 1
%Mul0 = fmul float %L0, 2.000000e+00
%Mul1 = fmul float %L1, 2.000000e+00
better:
%Mul = fmul <2 x float> %1, <2.000000e+00, 2.000000e+00>
%L0 = extractelement <2 x float> %Mul, i32 0
%L1 = extractelement <2 x float> %Mul, i32 1

Will try to rework it.

In D30686#758530, @ABataev wrote:

Michael, tried to implement it in InstCombiner, but we need cost analysis. Later passes are not suitable for this, because we need transformed code in InstCombiner. So, reopened it.

Use vectorcombine now?

@spatel

In D30686#2456753, @xbolva00 wrote:

In D30686#758530, @ABataev wrote:

Michael, tried to implement it in InstCombiner, but we need cost analysis. Later passes are not suitable for this, because we need transformed code in InstCombiner. So, reopened it.

Use vectorcombine now?

@spatel

I think so: https://godbolt.org/z/YW6rMc

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

549 lines

test/

Transforms/

SLPVectorizer/

X86/

vector.ll

14 lines

Diff 90817

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 400 Lines • ▼ Show 20 Lines	unsigned getMinVecRegSize() const {
return MinVecRegSize;		return MinVecRegSize;
}		}

/// \brief Check if ArrayType or StructType is isomorphic to some VectorType.		/// \brief Check if ArrayType or StructType is isomorphic to some VectorType.
///		///
/// \returns number of elements in vector if isomorphism exists, 0 otherwise.		/// \returns number of elements in vector if isomorphism exists, 0 otherwise.
unsigned canMapToVector(Type *T, const DataLayout &DL) const;		unsigned canMapToVector(Type *T, const DataLayout &DL) const;

		/// Try to convert instructions with extractelement operands into a vector
		/// form with the single extractelement instruction.
		bool tryToWidenExtractElementInsts(ArrayRef<WeakVH> ExtractInsts);

/// \returns True if the VectorizableTree is both tiny and not fully		/// \returns True if the VectorizableTree is both tiny and not fully
/// vectorizable. We do not vectorize such trees.		/// vectorizable. We do not vectorize such trees.
bool isTreeTinyAndNotFullyVectorizable();		bool isTreeTinyAndNotFullyVectorizable();

private:		private:
struct TreeEntry;		struct TreeEntry;

		/// Calculates the cost of the transformation of \p VL instructions from
		/// scalar to vector form.
		Optional<int> getCost(unsigned Opcode, ArrayRef<Value > VL, Type ScalarTy,
		Type *VecTy) const;

/// \returns the cost of the vectorizable entry.		/// \returns the cost of the vectorizable entry.
int getEntryCost(TreeEntry *E);		int getEntryCost(TreeEntry *E);

/// This is the recursive part of buildTree.		/// This is the recursive part of buildTree.
void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth);		void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth);

/// \returns True if the ExtractElement/ExtractValue instructions in VL can		/// \returns True if the ExtractElement/ExtractValue instructions in VL can
/// be vectorized to use the original vector (or aggregate "bitcast" to a vector).		/// be vectorized to use the original vector (or aggregate "bitcast" to a vector).
▲ Show 20 Lines • Show All 1,155 Lines • ▼ Show 20 Lines	if (ST) {
// Check that struct is homogeneous.		// Check that struct is homogeneous.
for (const auto *Ty : ST->elements())		for (const auto *Ty : ST->elements())
if (Ty != EltTy)		if (Ty != EltTy)
return 0;		return 0;
}		}
return N;		return N;
}		}

		bool BoUpSLP::tryToWidenExtractElementInsts(ArrayRef<WeakVH> ExtractInsts) {
		bool Changed = false;
		// Store the extractelement instruct + tree hight.
		SmallVector<std::pair<WeakVH, unsigned>, 4> Insts;
		Insts.reserve(ExtractInsts.size());
		for (auto &V : ExtractInsts)
		Insts.emplace_back(V, 0);
		for (unsigned Idx = 0, E = Insts.size(); Idx < E; ++Idx) {
		auto *EE = dyn_cast<ExtractElementInst>(Insts[Idx].first);
		// Skip analysis of already deleted extractelements or instruction trees
		// with height >= RecursionMaxDepth.
		if (!EE \|\| Insts[Idx].second == RecursionMaxDepth)
		continue;
		unsigned NE = EE->getVectorOperandType()->getNumElements();
		auto *EIdx = EE->getIndexOperand();
		for (auto *U : EE->users()) {
		auto *I = dyn_cast<Instruction>(U);
		DEBUG(dbgs() << "SLP: trying ti widen instruction " << *I << "\n");
		// Check if user instruction is vectorizable.
		if (!I \|\| !isValidElementType(I->getType()) \|\| I->mayHaveSideEffects() \|\|
		EphValues.count(I) > 0)
		continue;
		Optional<int> Cost = getCost(I->getOpcode(), I, I->getType(),
		VectorType::get(I->getType(), NE));
		if (!Cost)
		continue;
		// Check that all of the user instruction are extractelement from the
		// vectors of the same size and from the same lanes.
		if (!std::all_of(I->op_begin(), I->op_end(), [NE, EIdx](const Value *V) {
		auto *EEI = dyn_cast<ExtractElementInst>(V);
		return EEI && EEI->getVectorOperandType()->getNumElements() == NE &&
		EEI->getIndexOperand() == EIdx;
		}))
		continue;
		int EIdxVal = -1;
		if (auto *EIdxC = dyn_cast<ConstantInt>(EIdx))
		if (EIdxC->getValue().isNonNegative())
		EIdxVal = EIdxC->getZExtValue();
		// Estimate scalar cost of instructions to be transformed into a vector
		// form.
		int ScalarCost = 0;
		DenseSet<ExtractElementInst *> EEWithCost;
		EEWithCost.reserve(I->getNumOperands());
		for (auto *Op : I->operand_values()) {
		auto *EEOp = cast<ExtractElementInst>(Op);
		const Instruction *UserLast = EEOp->user_back();
		// If the only user of the extractelement instruction is the
		// to-be-vectorized-user instruction, include the cost of this
		// extractelement into the scalar cost (it safely can be removed during
		// vectorization).
		// EEWithCost is used to count the cost of the extractelement
		// instruction only once.
		if (EEWithCost.insert(EEOp).second &&
		(EEOp->hasOneUse() \|\|
		std::all_of(EEOp->user_begin(), EEOp->user_end(),
		[UserLast](User *U) { return U == UserLast; }))) {
		ScalarCost +=
		TTI->getVectorInstrCost(Instruction::ExtractElement,
		EEOp->getVectorOperandType(), EIdxVal);
		}
		}
		// Get the vector cost of the new vectorized code: vectorized user
		// instruction + extractelement <vec_user_instruction>, i32 EIdx.
		int VecCost =
		TTI->getVectorInstrCost(Instruction::ExtractElement,
		VectorType::get(I->getType(), NE), EIdxVal);
		int ResCost = Cost.getValue() + VecCost - ScalarCost;
		if (ResCost >= -SLPCostThreshold)
		continue;
		DEBUG(dbgs() << "SLP: Decided to widen cost=" << ResCost << "\n");
		// Generate vector code instead of the scalar one.
		Builder.SetInsertPoint(I->getParent(), ++I->getIterator());
		Builder.SetCurrentDebugLocation(I->getDebugLoc());
		// Create vectorized version of the user instruction.
		Instruction *NewI = I->clone();
		NewI->mutateType(VectorType::get(I->getType(), NE));
		for (unsigned Idx = 0, EIdx = NewI->getNumOperands(); Idx < EIdx; ++Idx) {
		auto *EE = cast<ExtractElementInst>(NewI->getOperand(Idx));
		NewI->setOperand(Idx, EE->getVectorOperand());
		// Remove extractelement instruction only iff this is its last use.
		if (EE->hasOneUse()) {
		EE->replaceAllUsesWith(UndefValue::get(EE->getType()));
		eraseInstruction(EE);
		}
		}
		Builder.Insert(NewI, "widen.vect");
		// %widen.extract = extractelement <ty x n> %widen.vect, i32 Idx
		Value *NewEE = Builder.CreateExtractElement(NewI, EIdx, "widen.extract");
		// Replace uses of the scalar instruction by the %widen.extract
		// instruction.
		I->replaceAllUsesWith(NewEE);
		eraseInstruction(I);
		// Add %widen.extract to the list of the extractelement instructions for
		// future analysis of possibly vectorizable tree.
		Insts.emplace_back(NewEE, Insts[Idx].second + 1);
		E = Insts.size();
		Changed = true;
		}
		}
		return Changed;
		}

bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, unsigned Opcode) const {		bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, unsigned Opcode) const {
assert(Opcode == Instruction::ExtractElement \|\|		assert(Opcode == Instruction::ExtractElement \|\|
Opcode == Instruction::ExtractValue);		Opcode == Instruction::ExtractValue);
assert(Opcode == getSameOpcode(VL) && "Invalid opcode");		assert(Opcode == getSameOpcode(VL) && "Invalid opcode");
// Check if all of the extracts come from the same vector and from the		// Check if all of the extracts come from the same vector and from the
// correct offset.		// correct offset.
Value *VL0 = VL[0];		Value *VL0 = VL[0];
Instruction *E0 = cast<Instruction>(VL0);		Instruction *E0 = cast<Instruction>(VL0);
Show All 27 Lines	if (!matchExtractIndex(E, i, Opcode))
return false;		return false;
if (E->getOperand(0) != Vec)		if (E->getOperand(0) != Vec)
return false;		return false;
}		}

return true;		return true;
}		}

int BoUpSLP::getEntryCost(TreeEntry *E) {		Optional<int> BoUpSLP::getCost(unsigned Opcode, ArrayRef<Value *> VL,
ArrayRef<Value*> VL = E->Scalars;		Type ScalarTy, Type VecTy) const {
		assert(ScalarTy && VecTy &&
Type *ScalarTy = VL[0]->getType();		"both ScalarTy/VectorTy parameters must be specified.");
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		assert(Opcode && "Expected non-null opcode.");
ScalarTy = SI->getValueOperand()->getType();		auto *VL0 = cast<Instruction>(VL[0]);
VectorType *VecTy = VectorType::get(ScalarTy, VL.size());		int VecCost;
		int ScalarCost;
// If we have computed a smaller type for the expression, update VecTy so
// that the costs will be accurate.
if (MinBWs.count(VL[0]))
VecTy = VectorType::get(
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());

if (E->NeedToGather) {
if (allConstant(VL))
return 0;
if (isSplat(VL)) {
return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, 0);
}
return getGatherCost(E->Scalars);
}
unsigned Opcode = getSameOpcode(VL);
assert(Opcode && allSameType(VL) && allSameBlock(VL) && "Invalid VL");
Instruction *VL0 = cast<Instruction>(VL[0]);
switch (Opcode) {		switch (Opcode) {
case Instruction::PHI: {
return 0;
}
case Instruction::ExtractValue:
case Instruction::ExtractElement: {
if (canReuseExtract(VL, Opcode)) {
int DeadCost = 0;
for (unsigned i = 0, e = VL.size(); i < e; ++i) {
Instruction *E = cast<Instruction>(VL[i]);
// If all users are going to be vectorized, instruction can be
// considered as dead.
// The same, if have only one user, it will be vectorized for sure.
if (E->hasOneUse() \|\|
std::all_of(E->user_begin(), E->user_end(), [this](User *U) {
return ScalarToTreeEntry.count(U) > 0;
}))
// Take credit for instruction that will become dead.
DeadCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, i);
}
return -DeadCost;
}
return getGatherCost(VecTy);
}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
		VecCost = TTI->getCastInstrCost(
		Opcode, VecTy, VectorType::get(SrcTy, VecTy->getVectorNumElements()));

// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
int ScalarCost = VL.size() * TTI->getCastInstrCost(VL0->getOpcode(),		ScalarCost = VL.size() * TTI->getCastInstrCost(Opcode, ScalarTy, SrcTy);
VL0->getType(), SrcTy);		break;

VectorType *SrcVecTy = VectorType::get(SrcTy, VL.size());
int VecCost = TTI->getCastInstrCost(VL0->getOpcode(), VecTy, SrcVecTy);
return VecCost - ScalarCost;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Select: {		case Instruction::Select: {
// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
VectorType *MaskTy = VectorType::get(Builder.getInt1Ty(), VL.size());		VecCost = TTI->getCmpSelInstrCost(
int ScalarCost = VecTy->getNumElements() *		Opcode, VecTy,
TTI->getCmpSelInstrCost(Opcode, ScalarTy, Builder.getInt1Ty());		VectorType::get(Type::getInt1Ty(VL0->getContext()),
int VecCost = TTI->getCmpSelInstrCost(Opcode, VecTy, MaskTy);		VecTy->getVectorNumElements()));
return VecCost - ScalarCost;		ScalarCost =
		VL.size() * TTI->getCmpSelInstrCost(Opcode, ScalarTy,
		Type::getInt1Ty(VL0->getContext()));
		break;
}		}
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::FDiv:		case Instruction::FDiv:
case Instruction::URem:		case Instruction::URem:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::FRem:		case Instruction::FRem:
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
// Certain instructions can be cheaper to vectorize if they have a		// Certain instructions can be cheaper to vectorize if they have a
// constant second vector operand.		// constant second vector operand.
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;
TargetTransformInfo::OperandValueProperties Op1VP =		TargetTransformInfo::OperandValueProperties Op1VP =
TargetTransformInfo::OP_None;		TargetTransformInfo::OP_None;
TargetTransformInfo::OperandValueProperties Op2VP =		TargetTransformInfo::OperandValueProperties Op2VP =
TargetTransformInfo::OP_None;		TargetTransformInfo::OP_None;

// If all operands are exactly the same ConstantInt then set the		// If all operands are exactly the same ConstantInt then set the
// operand kind to OK_UniformConstantValue.		// operand kind to OK_UniformConstantValue.
// If instead not all operands are constants, then set the operand kind		// If instead not all operands are constants, then set the operand kind
// to OK_AnyValue. If all operands are constants but not the same,		// to OK_AnyValue. If all operands are constants but not the same,
// then set the operand kind to OK_NonUniformConstantValue.		// then set the operand kind to OK_NonUniformConstantValue.
ConstantInt *CInt = nullptr;		ConstantInt *CInt = nullptr;
for (unsigned i = 0; i < VL.size(); ++i) {		for (unsigned i = 0; i < VL.size(); ++i) {
const Instruction *I = cast<Instruction>(VL[i]);		const Instruction *I = cast<Instruction>(VL[i]);
if (!isa<ConstantInt>(I->getOperand(1))) {		if (!isa<ConstantInt>(I->getOperand(1))) {
Op2VK = TargetTransformInfo::OK_AnyValue;		Op2VK = TargetTransformInfo::OK_AnyValue;
break;		break;
}		}
if (i == 0) {		if (i == 0) {
CInt = cast<ConstantInt>(I->getOperand(1));		CInt = cast<ConstantInt>(I->getOperand(1));
continue;		continue;
}		}
if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&		if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&
CInt != cast<ConstantInt>(I->getOperand(1)))		CInt != cast<ConstantInt>(I->getOperand(1)))
Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;		Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
}		}
// FIXME: Currently cost of model modification for division by power of		// FIXME: Currently cost of model modification for division by power of
// 2 is handled for X86 and AArch64. Add support for other targets.		// 2 is handled for X86 and AArch64. Add support for other targets.
if (Op2VK == TargetTransformInfo::OK_UniformConstantValue && CInt &&		if (Op2VK == TargetTransformInfo::OK_UniformConstantValue && CInt &&
CInt->getValue().isPowerOf2())		CInt->getValue().isPowerOf2())
Op2VP = TargetTransformInfo::OP_PowerOf2;		Op2VP = TargetTransformInfo::OP_PowerOf2;

int ScalarCost = VecTy->getNumElements() *		VecCost =
TTI->getArithmeticInstrCost(Opcode, ScalarTy, Op1VK,		TTI->getArithmeticInstrCost(Opcode, VecTy, Op1VK, Op2VK, Op1VP, Op2VP);
Op2VK, Op1VP, Op2VP);		ScalarCost = VL.size() * TTI->getArithmeticInstrCost(
int VecCost = TTI->getArithmeticInstrCost(Opcode, VecTy, Op1VK, Op2VK,		Opcode, ScalarTy, Op1VK, Op2VK, Op1VP, Op2VP);
Op1VP, Op2VP);		break;
return VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;

int ScalarCost =		VecCost =
VecTy->getNumElements() *
TTI->getArithmeticInstrCost(Instruction::Add, ScalarTy, Op1VK, Op2VK);
int VecCost =
TTI->getArithmeticInstrCost(Instruction::Add, VecTy, Op1VK, Op2VK);		TTI->getArithmeticInstrCost(Instruction::Add, VecTy, Op1VK, Op2VK);
		ScalarCost = VL.size() * TTI->getArithmeticInstrCost(
return VecCost - ScalarCost;		Instruction::Add, ScalarTy, Op1VK, Op2VK);
		break;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Cost of wide load - cost of scalar loads.		// Cost of wide load - cost of scalar loads.
unsigned alignment = dyn_cast<LoadInst>(VL0)->getAlignment();		unsigned Alignment = cast<LoadInst>(VL0)->getAlignment();
int ScalarLdCost = VecTy->getNumElements() *		VecCost = TTI->getMemoryOpCost(Instruction::Load, VecTy, Alignment,
TTI->getMemoryOpCost(Instruction::Load, ScalarTy, alignment, 0);		/AddressSpace=/0);
int VecLdCost = TTI->getMemoryOpCost(Instruction::Load,		ScalarCost =
VecTy, alignment, 0);		VL.size() * TTI->getMemoryOpCost(Instruction::Load, ScalarTy, Alignment,
if (E->NeedToShuffle) {		/AddressSpace=/0);
VecLdCost += TTI->getShuffleCost(		break;
TargetTransformInfo::SK_PermuteSingleSrc, VecTy, 0);
}
return VecLdCost - ScalarLdCost;
}		}
case Instruction::Store: {		case Instruction::Store: {
// We know that we can merge the stores. Calculate the cost.		// We know that we can merge the stores. Calculate the cost.
unsigned alignment = dyn_cast<StoreInst>(VL0)->getAlignment();		auto *SI = cast<StoreInst>(VL0);
int ScalarStCost = VecTy->getNumElements() *		unsigned Alignment = SI->getAlignment();
TTI->getMemoryOpCost(Instruction::Store, ScalarTy, alignment, 0);		VecCost = TTI->getMemoryOpCost(Instruction::Store, VecTy, Alignment,
int VecStCost = TTI->getMemoryOpCost(Instruction::Store,		/AddressSpace=/0);
VecTy, alignment, 0);		ScalarCost =
return VecStCost - ScalarStCost;		VL.size() * TTI->getMemoryOpCost(Instruction::Store, ScalarTy,
		Alignment, /AddressSpace=/0);
		break;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

		FastMathFlags FMF;
		if (auto *FPMO = dyn_cast<FPMathOperator>(CI))
		FMF = FPMO->getFastMathFlags();

// Calculate the cost of the scalar and vector calls.		// Calculate the cost of the scalar and vector calls.
SmallVector<Type*, 4> ScalarTys, VecTys;		SmallVector<Type *, 4> VecTys;
for (unsigned op = 0, opc = CI->getNumArgOperands(); op!= opc; ++op) {		for (unsigned op = 0, opc = CI->getNumArgOperands(); op != opc; ++op) {
ScalarTys.push_back(CI->getArgOperand(op)->getType());
VecTys.push_back(VectorType::get(CI->getArgOperand(op)->getType(),		VecTys.push_back(VectorType::get(CI->getArgOperand(op)->getType(),
VecTy->getNumElements()));		VecTy->getVectorNumElements()));
}		}

FastMathFlags FMF;		VecCost = TTI->getIntrinsicInstrCost(ID, VecTy, VecTys, FMF);
if (auto *FPMO = dyn_cast<FPMathOperator>(CI))		ScalarCost =
FMF = FPMO->getFastMathFlags();		VL.size() * TTI->getIntrinsicInstrCost(
		ID, ScalarTy, CI->getFunctionType()->params(), FMF);
		DEBUG(dbgs() << "SLP: Call cost " << VecCost - ScalarCost << " (" << VecCost
		<< "-" << ScalarCost << ")"
		<< " for " << *CI << "\n");

int ScalarCallCost = VecTy->getNumElements() *		break;
TTI->getIntrinsicInstrCost(ID, ScalarTy, ScalarTys, FMF);		}
		default:
		return None;
		}
		return VecCost - ScalarCost;
		}

int VecCallCost = TTI->getIntrinsicInstrCost(ID, VecTy, VecTys, FMF);		int BoUpSLP::getEntryCost(TreeEntry *E) {
		ArrayRef<Value *> VL = E->Scalars;

DEBUG(dbgs() << "SLP: Call cost "<< VecCallCost - ScalarCallCost		Type *ScalarTy = VL[0]->getType();
<< " (" << VecCallCost << "-" << ScalarCallCost << ")"		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
<< " for " << *CI << "\n");		ScalarTy = SI->getValueOperand()->getType();
		VectorType *VecTy = VectorType::get(ScalarTy, VL.size());

return VecCallCost - ScalarCallCost;		// If we have computed a smaller type for the expression, update VecTy so
		// that the costs will be accurate.
		if (MinBWs.count(VL[0]))
		VecTy = VectorType::get(
		IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());

		if (E->NeedToGather) {
		if (allConstant(VL))
		return 0;
		if (isSplat(VL)) {
		return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, 0);
}		}
		return getGatherCost(E->Scalars);
		}
		unsigned Opcode = getSameOpcode(VL);
		assert(Opcode && allSameType(VL) && allSameBlock(VL) && "Invalid VL");
		switch (Opcode) {
		case Instruction::PHI:
		return 0;
		case Instruction::ExtractValue:
		case Instruction::ExtractElement:
		if (canReuseExtract(VL, Opcode)) {
		int DeadCost = 0;
		for (unsigned i = 0, e = VL.size(); i < e; ++i) {
		Instruction *E = cast<Instruction>(VL[i]);
		// If all users are going to be vectorized, instruction can be
		// considered as dead.
		// The same, if have only one user, it will be vectorized for sure.
		if (E->hasOneUse() \|\|
		std::all_of(E->user_begin(), E->user_end(), [this](User *U) {
		return ScalarToTreeEntry.count(U) > 0;
		}))
		// Take credit for instruction that will become dead.
		DeadCost +=
		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, i);
		}
		return -DeadCost;
		}
		return getGatherCost(VecTy);
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
int ScalarCost = 0;		int ScalarCost = 0;
int VecCost = 0;		int VecCost = 0;
for (Value *i : VL) {		for (Value *i : VL) {
Instruction *I = cast<Instruction>(i);		Instruction *I = cast<Instruction>(i);
if (!I)		if (!I)
break;		break;
ScalarCost +=		ScalarCost +=
TTI->getArithmeticInstrCost(I->getOpcode(), ScalarTy, Op1VK, Op2VK);		TTI->getArithmeticInstrCost(I->getOpcode(), ScalarTy, Op1VK, Op2VK);
}		}
// VecCost is equal to sum of the cost of creating 2 vectors		// VecCost is equal to sum of the cost of creating 2 vectors
// and the cost of creating shuffle.		// and the cost of creating shuffle.
Instruction *I0 = cast<Instruction>(VL[0]);		Instruction *I0 = cast<Instruction>(VL[0]);
VecCost =		VecCost = TTI->getArithmeticInstrCost(I0->getOpcode(), VecTy, Op1VK, Op2VK);
TTI->getArithmeticInstrCost(I0->getOpcode(), VecTy, Op1VK, Op2VK);
Instruction *I1 = cast<Instruction>(VL[1]);		Instruction *I1 = cast<Instruction>(VL[1]);
VecCost +=		VecCost +=
TTI->getArithmeticInstrCost(I1->getOpcode(), VecTy, Op1VK, Op2VK);		TTI->getArithmeticInstrCost(I1->getOpcode(), VecTy, Op1VK, Op2VK);
VecCost +=		VecCost += TTI->getShuffleCost(TargetTransformInfo::SK_Alternate, VecTy, 0);
TTI->getShuffleCost(TargetTransformInfo::SK_Alternate, VecTy, 0);
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
		case Instruction::Load: {
		int Cost = getCost(Opcode, VL, ScalarTy, VecTy).getValue();
		if (E->NeedToShuffle) {
		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
		VecTy, 0);
		}
		return Cost;
		}
default:		default:
llvm_unreachable("Unknown instruction");		if (Optional<int> Cost = getCost(Opcode, VL, ScalarTy, VecTy))
		return Cost.getValue();
		break;
}		}
		llvm_unreachable("Unknown instruction");
}		}

bool BoUpSLP::isFullyVectorizableTinyTree() {		bool BoUpSLP::isFullyVectorizableTinyTree() {
DEBUG(dbgs() << "SLP: Check whether the tree with height " <<		DEBUG(dbgs() << "SLP: Check whether the tree with height " <<
VectorizableTree.size() << " is fully vectorizable .\n");		VectorizableTree.size() << " is fully vectorizable .\n");

// We only handle trees of heights 1 and 2.		// We only handle trees of heights 1 and 2.
if (VectorizableTree.size() == 1 && !VectorizableTree[0].NeedToGather)		if (VectorizableTree.size() == 1 && !VectorizableTree[0].NeedToGather)
▲ Show 20 Lines • Show All 2,992 Lines • ▼ Show 20 Lines	for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),

// Start over at the next instruction of a different type (or the end).		// Start over at the next instruction of a different type (or the end).
IncIt = SameTypeIt;		IncIt = SameTypeIt;
}		}
}		}

VisitedInstrs.clear();		VisitedInstrs.clear();

		SmallVector<WeakVH, 4> ExtractInsts;
for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; it++) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; it++) {
// We may go through BB multiple times so skip the one we have checked.		// We may go through BB multiple times so skip the one we have checked.
if (!VisitedInstrs.insert(&*it).second)		if (!VisitedInstrs.insert(&*it).second)
continue;		continue;

if (isa<DbgInfoIntrinsic>(it))		if (isa<DbgInfoIntrinsic>(it))
continue;		continue;

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	if (StoreInst *SI = dyn_cast<StoreInst>(it)) {
Changed = true;		Changed = true;
it = BB->begin();		it = BB->begin();
e = BB->end();		e = BB->end();
}		}
continue;		continue;
}		}
}		}
}		}

		if (auto *EE = dyn_cast<ExtractElementInst>(it)) {
		ExtractInsts.push_back(EE);
		continue;
}		}
		}

		Changed \|= R.tryToWidenExtractElementInsts(ExtractInsts);

return Changed;		return Changed;
}		}

bool SLPVectorizerPass::vectorizeGEPIndices(BasicBlock *BB, BoUpSLP &R) {		bool SLPVectorizerPass::vectorizeGEPIndices(BasicBlock *BB, BoUpSLP &R) {
auto Changed = false;		auto Changed = false;
for (auto &Entry : GEPs) {		for (auto &Entry : GEPs) {

▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/vector.ll

	Show All 11 Lines
	;			;
	%k = icmp eq <4 x i32> %in, %in2			%k = icmp eq <4 x i32> %in, %in2
	ret void			ret void
	}			}

	define i1 @cmpv2f32(<2 x i32> %x, <2 x i32> %y) {			define i1 @cmpv2f32(<2 x i32> %x, <2 x i32> %y) {
	; CHECK-LABEL: @cmpv2f32(			; CHECK-LABEL: @cmpv2f32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x i32> [[X:%.]], i32 0			; CHECK-NEXT: [[WIDEN_VECT1:%.]] = icmp eq <2 x i32> [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[Y0:%.]] = extractelement <2 x i32> [[Y:%.]], i32 0			; CHECK-NEXT: [[WIDEN_EXTRACT2:%.*]] = extractelement <2 x i1> [[WIDEN_VECT1]], i32 0
	; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[X0]], [[Y0]]			; CHECK-NEXT: br i1 [[WIDEN_EXTRACT2]], label [[IF:%.]], label [[ENDIF:%.]]
	; CHECK-NEXT: br i1 [[CMP0]], label [[IF:%.]], label [[ENDIF:%.]]
	; CHECK: if:			; CHECK: if:
	; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x i32> [[X]], i32 1			; CHECK-NEXT: [[WIDEN_VECT:%.*]] = icmp eq <2 x i32> [[X]], [[Y]]
	; CHECK-NEXT: [[Y1:%.*]] = extractelement <2 x i32> [[Y]], i32 1			; CHECK-NEXT: [[WIDEN_EXTRACT:%.*]] = extractelement <2 x i1> [[WIDEN_VECT]], i32 1
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[X1]], [[Y1]]
	; CHECK-NEXT: br label [[ENDIF]]			; CHECK-NEXT: br label [[ENDIF]]
	; CHECK: endif:			; CHECK: endif:
	; CHECK-NEXT: [[AND_OF_CMPS:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[CMP1]], [[IF]] ]			; CHECK-NEXT: [[AND_OF_CMPS:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[WIDEN_EXTRACT]], [[IF]] ]
	; CHECK-NEXT: ret i1 [[AND_OF_CMPS]]			; CHECK-NEXT: ret i1 [[AND_OF_CMPS]]
	;			;
	entry:			entry:
	%x0 = extractelement <2 x i32> %x, i32 0			%x0 = extractelement <2 x i32> %x, i32 0
	%y0 = extractelement <2 x i32> %y, i32 0			%y0 = extractelement <2 x i32> %y, i32 0
	%cmp0 = icmp eq i32 %x0, %y0			%cmp0 = icmp eq i32 %x0, %y0
	br i1 %cmp0, label %if, label %endif			br i1 %cmp0, label %if, label %endif

	Show All 11 Lines