This is an archive of the discontinued LLVM Phabricator instance.

refactor / improve getScalarizationOverhead()
ClosedPublic

Authored by jonpa on Jan 23 2017, 4:36 AM.

Download Raw Diff

Details

Reviewers

RKSimon
mkuper
javed.absar
hfinkel

Summary

getScalarizationOverhead() was duplicated and found in three different places. It was also lacking in that the number of unique operands were not checked, so it could be that extract costs for two operands using same Value could be computed. It could also happen in LoopVectorizer that for e.g. an Add instruction, only extracts for one operand are accounted for (true for all arithmetic instructions).

This patch improves on this:

There is a single function definition in BasicTTIImpl, which is now public so that it can be called by TargetTransformInfo, which also gets a method with same name so that LoopVectorizer can access it.
Removed X86 duplicated method and also useless declarations of the method in AArch64 and ARM derivations.
Removed the LoopVectorizer duplicated method and changed so that TTI is queried instead. I assumed that the check for void Type is only relevant for the RetTy.
A new method getOperandsScalarizationOverhead(), that takes a list of operands and computes the cost of extract operations needed for the unique Values among them. Default implementation of getArithmeticInstrCost() now uses this if operands are provided. If not it keeps current behavior by assuming just one operand.
LoopVectorizer improved by utilizing the new TTI method and thereby removing one of its wrapper static functions it used before. It should now also get the right number of extracts accounted for in getInstructionCost() for arithmetic instructions.

Even in just LoopVectorize.cpp, there are more places to go over and see what would work best. I am however happy at this point to ask for feedback and suggestions. Does this seem to be going in the right direction?

Discussion on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109382.html

Simon, apart from duplicate operands, you mentioned extracted immediates. Do you have any
example or reference of how to best treat that case?

Diff Detail

Event Timeline

jonpa created this revision.Jan 23 2017, 4:36 AM

Herald added subscribers: mzolotukhin, aemerson. · View Herald TranscriptJan 23 2017, 4:36 AM

Patch uploaded this time with all files - sorry.

Herald added a reviewer: javed.absar. · View Herald TranscriptJan 23 2017, 4:46 AM

mssimpso added a subscriber: mssimpso.Jan 23 2017, 5:51 AM

RKSimon added a reviewer: mkuper.Jan 23 2017, 11:16 AM

Please remove the assert and add a comment as noted below. Otherwise, LGTM.

include/llvm/CodeGen/BasicTTIImpl.h
307	I don't see the point in asserting here. If there are no arguments, then there's no cost, so you'll just return zero. That seems useful to avoid unnecessary special cases in potential callers.
366	Please add a comment here that, when no information on arguments is provided, as a heuristic, we add the cost associated with one argument.

This revision is now accepted and ready to land.Jan 25 2017, 4:49 PM

Pushed as r293155. I updated patch per review, and I also removed a line-break in a comment near end of getArithmeticInstrCost().

@simon: Extracted immediates have not yet been handled, but it should be simple to do in getOperandsScalarizationOverhead(). I don't have a test case for this, so I would appreciate your help if you think it is needed.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

17 lines

TargetTransformInfoImpl.h

7 lines

CodeGen/

BasicTTIImpl.h

58 lines

lib/

Analysis/

TargetTransformInfo.cpp

11 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

4 lines

ARM/

ARMTargetTransformInfo.h

4 lines

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

14 lines

Transforms/

Vectorize/

LoopVectorize.cpp

48 lines

Diff 85356

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 405 Lines • ▼ Show 20 Lines	public:
/// \brief Return true if switches should be turned into lookup tables for the		/// \brief Return true if switches should be turned into lookup tables for the
/// target.		/// target.
bool shouldBuildLookupTables() const;		bool shouldBuildLookupTables() const;

/// \brief Return true if switches should be turned into lookup tables		/// \brief Return true if switches should be turned into lookup tables
/// containing this constant value for the target.		/// containing this constant value for the target.
bool shouldBuildLookupTablesForConstant(Constant *C) const;		bool shouldBuildLookupTablesForConstant(Constant *C) const;

		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;

		unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
		unsigned VF) const;

/// \brief Don't restrict interleaved unrolling to small loops.		/// \brief Don't restrict interleaved unrolling to small loops.
bool enableAggressiveInterleaving(bool LoopHasReductions) const;		bool enableAggressiveInterleaving(bool LoopHasReductions) const;

/// \brief Enable matching of interleaved access groups.		/// \brief Enable matching of interleaved access groups.
bool enableInterleavedAccessVectorization() const;		bool enableInterleavedAccessVectorization() const;

/// \brief Indicate that it is potentially unsafe to automatically vectorize		/// \brief Indicate that it is potentially unsafe to automatically vectorize
/// floating-point operations because the semantics of vector and scalar		/// floating-point operations because the semantics of vector and scalar
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	public:
virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) = 0;		virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual unsigned getJumpBufAlignment() = 0;		virtual unsigned getJumpBufAlignment() = 0;
virtual unsigned getJumpBufSize() = 0;		virtual unsigned getJumpBufSize() = 0;
virtual bool shouldBuildLookupTables() = 0;		virtual bool shouldBuildLookupTables() = 0;
virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;		virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;
		virtual unsigned
		getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) = 0;
		virtual unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
		unsigned VF) = 0;
virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;		virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;
virtual bool enableInterleavedAccessVectorization() = 0;		virtual bool enableInterleavedAccessVectorization() = 0;
virtual bool isFPVectorizationPotentiallyUnsafe() = 0;		virtual bool isFPVectorizationPotentiallyUnsafe() = 0;
virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth,		unsigned BitWidth,
unsigned AddressSpace,		unsigned AddressSpace,
unsigned Alignment,		unsigned Alignment,
bool *Fast) = 0;		bool *Fast) = 0;
▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	public:
unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }		unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }
unsigned getJumpBufSize() override { return Impl.getJumpBufSize(); }		unsigned getJumpBufSize() override { return Impl.getJumpBufSize(); }
bool shouldBuildLookupTables() override {		bool shouldBuildLookupTables() override {
return Impl.shouldBuildLookupTables();		return Impl.shouldBuildLookupTables();
}		}
bool shouldBuildLookupTablesForConstant(Constant *C) override {		bool shouldBuildLookupTablesForConstant(Constant *C) override {
return Impl.shouldBuildLookupTablesForConstant(C);		return Impl.shouldBuildLookupTablesForConstant(C);
}		}
		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
		return Impl.getScalarizationOverhead(Ty, Insert, Extract);
		}
		unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
		unsigned VF) {
		return Impl.getOperandsScalarizationOverhead(Args, VF);
		}

bool enableAggressiveInterleaving(bool LoopHasReductions) override {		bool enableAggressiveInterleaving(bool LoopHasReductions) override {
return Impl.enableAggressiveInterleaving(LoopHasReductions);		return Impl.enableAggressiveInterleaving(LoopHasReductions);
}		}
bool enableInterleavedAccessVectorization() override {		bool enableInterleavedAccessVectorization() override {
return Impl.enableInterleavedAccessVectorization();		return Impl.enableInterleavedAccessVectorization();
}		}
bool isFPVectorizationPotentiallyUnsafe() override {		bool isFPVectorizationPotentiallyUnsafe() override {
return Impl.isFPVectorizationPotentiallyUnsafe();		return Impl.isFPVectorizationPotentiallyUnsafe();
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	public:

unsigned getJumpBufAlignment() { return 0; }		unsigned getJumpBufAlignment() { return 0; }

unsigned getJumpBufSize() { return 0; }		unsigned getJumpBufSize() { return 0; }

bool shouldBuildLookupTables() { return true; }		bool shouldBuildLookupTables() { return true; }
bool shouldBuildLookupTablesForConstant(Constant *C) { return true; }		bool shouldBuildLookupTablesForConstant(Constant *C) { return true; }

		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
		return 0;
		}

		unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
		unsigned VF) { return 0; }

bool enableAggressiveInterleaving(bool LoopHasReductions) { return false; }		bool enableAggressiveInterleaving(bool LoopHasReductions) { return false; }

bool enableInterleavedAccessVectorization() { return false; }		bool enableInterleavedAccessVectorization() { return false; }

bool isFPVectorizationPotentiallyUnsafe() { return false; }		bool isFPVectorizationPotentiallyUnsafe() { return false; }

bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth,		unsigned BitWidth,
▲ Show 20 Lines • Show All 402 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show All 36 Lines
/// We need these methods implemented in the derived class so that this class		/// We need these methods implemented in the derived class so that this class
/// doesn't have to duplicate storage for them.		/// doesn't have to duplicate storage for them.
template <typename T>		template <typename T>
class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {		class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
private:		private:
typedef TargetTransformInfoImplCRTPBase<T> BaseT;		typedef TargetTransformInfoImplCRTPBase<T> BaseT;
typedef TargetTransformInfo TTI;		typedef TargetTransformInfo TTI;

/// Estimate the overhead of scalarizing an instruction. Insert and Extract
/// are set if the result needs to be inserted and/or extracted from vectors.
unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
assert(Ty->isVectorTy() && "Can only scalarize vectors");
unsigned Cost = 0;

for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
if (Insert)
Cost += static_cast<T *>(this)
->getVectorInstrCost(Instruction::InsertElement, Ty, i);
if (Extract)
Cost += static_cast<T *>(this)
->getVectorInstrCost(Instruction::ExtractElement, Ty, i);
}

return Cost;
}

/// Estimate a cost of shuffle as a sequence of extract and insert		/// Estimate a cost of shuffle as a sequence of extract and insert
/// operations.		/// operations.
unsigned getPermuteShuffleOverhead(Type *Ty) {		unsigned getPermuteShuffleOverhead(Type *Ty) {
assert(Ty->isVectorTy() && "Can only shuffle vectors");		assert(Ty->isVectorTy() && "Can only shuffle vectors");
unsigned Cost = 0;		unsigned Cost = 0;
// Shuffle cost is equal to the cost of extracting element from its argument		// Shuffle cost is equal to the cost of extracting element from its argument
// plus the cost of inserting them onto the result vector.		// plus the cost of inserting them onto the result vector.

▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	public:

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector) { return Vector ? 0 : 1; }		unsigned getNumberOfRegisters(bool Vector) { return Vector ? 0 : 1; }

unsigned getRegisterBitWidth(bool Vector) { return 32; }		unsigned getRegisterBitWidth(bool Vector) { return 32; }

		/// Estimate the overhead of scalarizing an instruction. Insert and Extract
		/// are set if the result needs to be inserted and/or extracted from vectors.
		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
		assert(Ty->isVectorTy() && "Can only scalarize vectors");
		unsigned Cost = 0;

		for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
		if (Insert)
		Cost += static_cast<T *>(this)
		->getVectorInstrCost(Instruction::InsertElement, Ty, i);
		if (Extract)
		Cost += static_cast<T *>(this)
		->getVectorInstrCost(Instruction::ExtractElement, Ty, i);
		}

		return Cost;
		}

		/// Estimate the overhead of scalarizing an instructions unique operands.
		unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
		unsigned VF) {
		assert (!Args.empty() && "Should only be called with existing arguments");
		hfinkelUnsubmitted Not Done Reply Inline Actions I don't see the point in asserting here. If there are no arguments, then there's no cost, so you'll just return zero. That seems useful to avoid unnecessary special cases in potential callers. hfinkel: I don't see the point in asserting here. If there are no arguments, then there's no cost, so…
		unsigned Cost = 0;
		SmallPtrSet<const Value*, 4> UniqueOperands;
		for (const Value *A : Args) {
		if (UniqueOperands.insert(A).second)
		Cost += getScalarizationOverhead(VectorType::get(A->getType(), VF),
		false, true);
		}
		return Cost;
		}

unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }		unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }

unsigned getArithmeticInstrCost(		unsigned getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
Show All 27 Lines	unsigned getArithmeticInstrCost(
// similarly to what getCastInstrCost() does.		// similarly to what getCastInstrCost() does.
if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
unsigned Num = Ty->getVectorNumElements();		unsigned Num = Ty->getVectorNumElements();
unsigned Cost = static_cast<T *>(this)		unsigned Cost = static_cast<T *>(this)
->getArithmeticInstrCost(Opcode, Ty->getScalarType());		->getArithmeticInstrCost(Opcode, Ty->getScalarType());
// return the cost of multiple scalar invocation plus the cost of		// return the cost of multiple scalar invocation plus the cost of
// inserting		// inserting
// and extracting the values.		// and extracting the values.
return getScalarizationOverhead(Ty, true, true) + Num * Cost;
		unsigned TotCost = getScalarizationOverhead(Ty, true, false) + Num * Cost;
		if (!Args.empty())
		TotCost += getOperandsScalarizationOverhead(Args, Num);
		else
		TotCost += getScalarizationOverhead(Ty, false, true);
		hfinkelUnsubmitted Not Done Reply Inline Actions Please add a comment here that, when no information on arguments is provided, as a heuristic, we add the cost associated with one argument. hfinkel: Please add a comment here that, when no information on arguments is provided, as a heuristic…
		return TotCost;
}		}

// We don't know anything about this scalar instruction.		// We don't know anything about this scalar instruction.
return OpCost;		return OpCost;
}		}

unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
▲ Show 20 Lines • Show All 674 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines

	bool TargetTransformInfo::shouldBuildLookupTables() const {			bool TargetTransformInfo::shouldBuildLookupTables() const {
	return TTIImpl->shouldBuildLookupTables();			return TTIImpl->shouldBuildLookupTables();
	}			}
	bool TargetTransformInfo::shouldBuildLookupTablesForConstant(Constant *C) const {			bool TargetTransformInfo::shouldBuildLookupTablesForConstant(Constant *C) const {
	return TTIImpl->shouldBuildLookupTablesForConstant(C);			return TTIImpl->shouldBuildLookupTablesForConstant(C);
	}			}

				unsigned TargetTransformInfo::
				getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const {
				return TTIImpl->getScalarizationOverhead(Ty, Insert, Extract);
				}

				unsigned TargetTransformInfo::
				getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
				unsigned VF) const {
				return TTIImpl->getOperandsScalarizationOverhead(Args, VF);
				}

	bool TargetTransformInfo::enableAggressiveInterleaving(bool LoopHasReductions) const {			bool TargetTransformInfo::enableAggressiveInterleaving(bool LoopHasReductions) const {
	return TTIImpl->enableAggressiveInterleaving(LoopHasReductions);			return TTIImpl->enableAggressiveInterleaving(LoopHasReductions);
	}			}

	bool TargetTransformInfo::enableInterleavedAccessVectorization() const {			bool TargetTransformInfo::enableInterleavedAccessVectorization() const {
	return TTIImpl->enableInterleavedAccessVectorization();			return TTIImpl->enableInterleavedAccessVectorization();
	}			}

	▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.h

	Show All 28 Lines
	class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {			class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
	typedef BasicTTIImplBase<AArch64TTIImpl> BaseT;			typedef BasicTTIImplBase<AArch64TTIImpl> BaseT;
	typedef TargetTransformInfo TTI;			typedef TargetTransformInfo TTI;
	friend BaseT;			friend BaseT;

	const AArch64Subtarget *ST;			const AArch64Subtarget *ST;
	const AArch64TargetLowering *TLI;			const AArch64TargetLowering *TLI;

	/// Estimate the overhead of scalarizing an instruction. Insert and Extract
	/// are set if the result needs to be inserted and/or extracted from vectors.
	unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract);

	const AArch64Subtarget *getST() const { return ST; }			const AArch64Subtarget *getST() const { return ST; }
	const AArch64TargetLowering *getTLI() const { return TLI; }			const AArch64TargetLowering *getTLI() const { return TLI; }

	enum MemIntrinsicType {			enum MemIntrinsicType {
	VECTOR_LDST_TWO_ELEMENTS,			VECTOR_LDST_TWO_ELEMENTS,
	VECTOR_LDST_THREE_ELEMENTS,			VECTOR_LDST_THREE_ELEMENTS,
	VECTOR_LDST_FOUR_ELEMENTS			VECTOR_LDST_FOUR_ELEMENTS
	};			};
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

lib/Target/ARM/ARMTargetTransformInfo.h

	Show All 27 Lines
	class ARMTTIImpl : public BasicTTIImplBase<ARMTTIImpl> {			class ARMTTIImpl : public BasicTTIImplBase<ARMTTIImpl> {
	typedef BasicTTIImplBase<ARMTTIImpl> BaseT;			typedef BasicTTIImplBase<ARMTTIImpl> BaseT;
	typedef TargetTransformInfo TTI;			typedef TargetTransformInfo TTI;
	friend BaseT;			friend BaseT;

	const ARMSubtarget *ST;			const ARMSubtarget *ST;
	const ARMTargetLowering *TLI;			const ARMTargetLowering *TLI;

	/// Estimate the overhead of scalarizing an instruction. Insert and Extract
	/// are set if the result needs to be inserted and/or extracted from vectors.
	unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract);

	const ARMSubtarget *getST() const { return ST; }			const ARMSubtarget *getST() const { return ST; }
	const ARMTargetLowering *getTLI() const { return TLI; }			const ARMTargetLowering *getTLI() const { return TLI; }

	public:			public:
	explicit ARMTTIImpl(const ARMBaseTargetMachine *TM, const Function &F)			explicit ARMTTIImpl(const ARMBaseTargetMachine *TM, const Function &F)
	: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),			: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
	TLI(ST->getTargetLowering()) {}			TLI(ST->getTargetLowering()) {}

	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

	Show All 27 Lines
	class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {			class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {
	typedef BasicTTIImplBase<X86TTIImpl> BaseT;			typedef BasicTTIImplBase<X86TTIImpl> BaseT;
	typedef TargetTransformInfo TTI;			typedef TargetTransformInfo TTI;
	friend BaseT;			friend BaseT;

	const X86Subtarget *ST;			const X86Subtarget *ST;
	const X86TargetLowering *TLI;			const X86TargetLowering *TLI;

	int getScalarizationOverhead(Type *Ty, bool Insert, bool Extract);

	const X86Subtarget *getST() const { return ST; }			const X86Subtarget *getST() const { return ST; }
	const X86TargetLowering *getTLI() const { return TLI; }			const X86TargetLowering *getTLI() const { return TLI; }

	public:			public:
	explicit X86TTIImpl(const X86TargetMachine *TM, const Function &F)			explicit X86TTIImpl(const X86TargetMachine *TM, const Function &F)
	: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),			: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
	TLI(ST->getTargetLowering()) {}			TLI(ST->getTargetLowering()) {}

	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,573 Lines • ▼ Show 20 Lines	int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
// destined to be moved to and used in the integer register file.		// destined to be moved to and used in the integer register file.
int RegisterFileMoveCost = 0;		int RegisterFileMoveCost = 0;
if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())		if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())
RegisterFileMoveCost = 1;		RegisterFileMoveCost = 1;

return BaseT::getVectorInstrCost(Opcode, Val, Index) + RegisterFileMoveCost;		return BaseT::getVectorInstrCost(Opcode, Val, Index) + RegisterFileMoveCost;
}		}

int X86TTIImpl::getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
assert (Ty->isVectorTy() && "Can only scalarize vectors");
int Cost = 0;

for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
if (Insert)
Cost += getVectorInstrCost(Instruction::InsertElement, Ty, i);
if (Extract)
Cost += getVectorInstrCost(Instruction::ExtractElement, Ty, i);
}

return Cost;
}

int X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,		int X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
unsigned AddressSpace) {		unsigned AddressSpace) {
// Handle non-power-of-two vectors such as <3 x float>		// Handle non-power-of-two vectors such as <3 x float>
if (VectorType *VTy = dyn_cast<VectorType>(Src)) {		if (VectorType *VTy = dyn_cast<VectorType>(Src)) {
unsigned NumElem = VTy->getVectorNumElements();		unsigned NumElem = VTy->getVectorNumElements();

// Handle a few common cases:		// Handle a few common cases:
// <3 x float>		// <3 x float>
▲ Show 20 Lines • Show All 662 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,603 Lines • ▼ Show 20 Lines	static Value addFastMathFlag(Value V) {
if (isa<FPMathOperator>(V)) {		if (isa<FPMathOperator>(V)) {
FastMathFlags Flags;		FastMathFlags Flags;
Flags.setUnsafeAlgebra();		Flags.setUnsafeAlgebra();
cast<Instruction>(V)->setFastMathFlags(Flags);		cast<Instruction>(V)->setFastMathFlags(Flags);
}		}
return V;		return V;
}		}

/// \brief Estimate the overhead of scalarizing a value based on its type.
/// Insert and Extract are set if the result needs to be inserted and/or
/// extracted from vectors.
static unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract,
const TargetTransformInfo &TTI) {
if (Ty->isVoidTy())
return 0;

assert(Ty->isVectorTy() && "Can only scalarize vectors");
unsigned Cost = 0;

for (unsigned I = 0, E = Ty->getVectorNumElements(); I < E; ++I) {
if (Extract)
Cost += TTI.getVectorInstrCost(Instruction::ExtractElement, Ty, I);
if (Insert)
Cost += TTI.getVectorInstrCost(Instruction::InsertElement, Ty, I);
}

return Cost;
}

/// \brief Estimate the overhead of scalarizing an Instruction based on the		/// \brief Estimate the overhead of scalarizing an Instruction based on the
/// types of its operands and return value.		/// types of its operands and return value.
static unsigned getScalarizationOverhead(SmallVectorImpl<Type *> &OpTys,		static unsigned getScalarizationOverhead(SmallVectorImpl<Type *> &OpTys,
Type *RetTy,		Type *RetTy,
const TargetTransformInfo &TTI) {		const TargetTransformInfo &TTI) {
unsigned ScalarizationCost =		unsigned ScalarizationCost = 0;
getScalarizationOverhead(RetTy, true, false, TTI);
		if (!RetTy->isVoidTy())
		ScalarizationCost += TTI.getScalarizationOverhead(RetTy, true, false);

for (Type *Ty : OpTys)		for (Type *Ty : OpTys)
ScalarizationCost += getScalarizationOverhead(Ty, false, true, TTI);		ScalarizationCost += TTI.getScalarizationOverhead(Ty, false, true);

return ScalarizationCost;		return ScalarizationCost;
}		}

/// \brief Estimate the overhead of scalarizing an instruction. This is a		/// \brief Estimate the overhead of scalarizing an instruction. This is a
/// convenience wrapper for the type-based getScalarizationOverhead API.		/// convenience wrapper for the type-based getScalarizationOverhead API.
static unsigned getScalarizationOverhead(Instruction *I, unsigned VF,		static unsigned getScalarizationOverhead(Instruction *I, unsigned VF,
const TargetTransformInfo &TTI) {		const TargetTransformInfo &TTI) {
if (VF == 1)		if (VF == 1)
return 0;		return 0;

		unsigned Cost = 0;
Type *RetTy = ToVectorTy(I->getType(), VF);		Type *RetTy = ToVectorTy(I->getType(), VF);
		if (!RetTy->isVoidTy())
		Cost += TTI.getScalarizationOverhead(RetTy, true, false);

SmallVector<Type *, 4> OpTys;		SmallVector<const Value *, 4> Operands(I->operand_values());
unsigned OperandsNum = I->getNumOperands();		Cost += TTI.getOperandsScalarizationOverhead(Operands, VF);
for (unsigned OpInd = 0; OpInd < OperandsNum; ++OpInd)
OpTys.push_back(ToVectorTy(I->getOperand(OpInd)->getType(), VF));

unsigned Cost = getScalarizationOverhead(OpTys, RetTy, TTI);

// if (supportsVectorElementAccess() &&		// if (supportsVectorElementAccess() &&
if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {		if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {
assert (Cost > 1);		assert (Cost > 1);
Cost -= 1;		Cost -= 1;
}		}

return Cost;		return Cost;
▲ Show 20 Lines • Show All 3,163 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// the instruction as if it wasn't if-converted and instead remained in the		// the instruction as if it wasn't if-converted and instead remained in the
// predicated block. We will scale this cost by block probability after		// predicated block. We will scale this cost by block probability after
// computing the scalarization overhead.		// computing the scalarization overhead.
unsigned ScalarCost = VF * getInstructionCost(I, 1).first;		unsigned ScalarCost = VF * getInstructionCost(I, 1).first;

// Compute the scalarization overhead of needed insertelement instructions		// Compute the scalarization overhead of needed insertelement instructions
// and phi nodes.		// and phi nodes.
if (Legal->isScalarWithPredication(I) && !I->getType()->isVoidTy()) {		if (Legal->isScalarWithPredication(I) && !I->getType()->isVoidTy()) {
ScalarCost += getScalarizationOverhead(ToVectorTy(I->getType(), VF), true,		ScalarCost += TTI.getScalarizationOverhead(ToVectorTy(I->getType(), VF),
false, TTI);		true, false);
ScalarCost += VF * TTI.getCFInstrCost(Instruction::PHI);		ScalarCost += VF * TTI.getCFInstrCost(Instruction::PHI);
}		}

// Compute the scalarization overhead of needed extractelement		// Compute the scalarization overhead of needed extractelement
// instructions. For each of the instruction's operands, if the operand can		// instructions. For each of the instruction's operands, if the operand can
// be scalarized, add it to the worklist; otherwise, account for the		// be scalarized, add it to the worklist; otherwise, account for the
// overhead.		// overhead.
for (Use &U : I->operands())		for (Use &U : I->operands())
if (auto *J = dyn_cast<Instruction>(U.get())) {		if (auto *J = dyn_cast<Instruction>(U.get())) {
assert(VectorType::isValidElementType(J->getType()) &&		assert(VectorType::isValidElementType(J->getType()) &&
"Instruction has non-scalar type");		"Instruction has non-scalar type");
if (canBeScalarized(J))		if (canBeScalarized(J))
Worklist.push_back(J);		Worklist.push_back(J);
else if (needsExtract(J))		else if (needsExtract(J))
ScalarCost += getScalarizationOverhead(ToVectorTy(J->getType(), VF),		ScalarCost += TTI.getScalarizationOverhead(
false, true, TTI);		ToVectorTy(J->getType(),VF), false, true);
}		}

// Scale the total scalar cost by block probability.		// Scale the total scalar cost by block probability.
ScalarCost /= getReciprocalPredBlockProb();		ScalarCost /= getReciprocalPredBlockProb();

// Compute the discount. A non-negative discount means the vector version		// Compute the discount. A non-negative discount means the vector version
// of the instruction costs more, and scalarizing would be beneficial.		// of the instruction costs more, and scalarizing would be beneficial.
Discount += VectorCost - ScalarCost;		Discount += VectorCost - ScalarCost;
▲ Show 20 Lines • Show All 995 Lines • Show Last 20 Lines