This is an archive of the discontinued LLVM Phabricator instance.

lib/Analysis/LoopAccessAnalysis.cpp
1735	I don't think that this is obvious. If IsAnnotatedParallel is true, then this is fine. If not, I don't think that vector-variants implies anything about the memory-referencing behavior of the variant (and it might be some external thing we can't analyze). I imagine that, at least most of the time, when locally defined, these will be argmemonly functions (but I don't think that even that is guaranteed, unfortunately). Also, if we've been told to vectorize, and assured that it is safe, we should do this even if there are calls without vector variants (we'll just scalarize in that case). I see no reason that a function with inapplicable vector variants should be special in this sense. You might want to do that and then rebase on top of that change.
lib/Analysis/VectorUtils.cpp
601	Please make this comment more formal. In part, you should explain that the ABI for _Complex is platform dependent (including at the IR level).
lib/Transforms/Vectorize/LoopVectorize.cpp
4731	the only -> The only
4736	the only -> The only
4965	SimdVariant might be nullptr; don't segfault in the DEBUG statement in that case.
5082	Don't do this. VecClone needs to be able to detect when it doesn't need to generate a new function.

huntergr added a subscriber: huntergr.Jan 16 2018, 2:15 AM

Hi Matt, this is very nice. I have a couple of comments to add to Hal's one.

I would like to see some description of which bits of the patch the tests are verifying. For example, I can see that you have two similar tests with masked and unmasked functions, but it is not clear from the IR I see and from the opt invocation why one tests generates a masked call and the other one an unmasked one. It would be great if you could also add the original C code you intended to vectorize with this patch as a comment.
Would it be possible to reduce the tests to minimal size? For example, you have generated "vector-variants"="_ZGVbN8vlu_dowork,_ZGVcN8vlu_dowork,_ZGVdN8vlu_dowork,_ZGVeN8vlu_dowork,_ZGVbM8vlu_dowork,_ZGVcM8vlu_dowork,_ZGVdM8vlu_dowork,_ZGVeM8vlu_dowork", but it should also work with vector-variants consisting only of the single variant you want to use in the vecttorizer.
You are testing linear and uniform. Could you also add tests for simple cases like the following?

#pragma omp declare simd linear(y)
double f(double x) {
//...
}

void loop(double *x, double *y, int N) {
  for (int i = 0; i < N; ++i) {
    y[i] = f(x[i]);
  }
}

include/llvm/Analysis/VectorUtils.h
182	Given that this is likely to change for different architectures, I wonder whether it is worth to redirect it to an overload method of TargetTransformInfo.
test/Transforms/LoopVectorize/masked_simd_func.ll
91	Could you unit test this only adding the _ZGVdM8vlu_dowork variant to the attribute?

dcaballe added a subscriber: dcaballe.Jan 19 2018, 11:37 AM

kmitropo added a subscriber: kmitropo.Jan 24 2018, 5:40 PM

a.elovikov added a subscriber: a.elovikov.Jan 27 2018, 9:53 AM

joel_k_jones added a subscriber: joel_k_jones.Aug 20 2018, 4:39 PM

vchuravy added a subscriber: vchuravy.Oct 11 2018, 4:09 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

VectorUtils.h

5 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

4 lines

VectorUtils.cpp

45 lines

Transforms/

Vectorize/

LoopVectorize.cpp

160 lines

test/

Transforms/

LoopVectorize/

masked_simd_func.ll

107 lines

simd_func.ll

99 lines

simd_func_scalar.ll

111 lines

Diff 124624

include/llvm/Analysis/VectorUtils.h

	Show All 22 Lines
	template <typename T> class ArrayRef;			template <typename T> class ArrayRef;
	class DemandedBits;			class DemandedBits;
	class GetElementPtrInst;			class GetElementPtrInst;
	class Loop;			class Loop;
	class ScalarEvolution;			class ScalarEvolution;
	class TargetTransformInfo;			class TargetTransformInfo;
	class Type;			class Type;
	class Value;			class Value;
				class VectorVariant;

	namespace Intrinsic {			namespace Intrinsic {
	enum ID : unsigned;			enum ID : unsigned;
	}			}

	/// \brief Identify if the intrinsic is trivially vectorizable.			/// \brief Identify if the intrinsic is trivially vectorizable.
	/// This method returns true if the intrinsic's argument types are all			/// This method returns true if the intrinsic's argument types are all
	/// scalars for the scalar form of the intrinsic and all vectors for			/// scalars for the scalar form of the intrinsic and all vectors for
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	///			///
	/// This function generates code that concatenate the vectors in \p Vecs into a			/// This function generates code that concatenate the vectors in \p Vecs into a
	/// single large vector. The number of vectors should be greater than one, and			/// single large vector. The number of vectors should be greater than one, and
	/// their element types should be the same. The number of elements in the			/// their element types should be the same. The number of elements in the
	/// vectors should also be the same; however, if the last vector has fewer			/// vectors should also be the same; however, if the last vector has fewer
	/// elements, it will be padded with undefs.			/// elements, it will be padded with undefs.
	Value concatenateVectors(IRBuilder<> &Builder, ArrayRef<Value > Vecs);			Value concatenateVectors(IRBuilder<> &Builder, ArrayRef<Value > Vecs);

				/// \brief Determine the characteristic type of the vector function as specified
				/// according to the vector function ABI.
				Type* calcCharacteristicType(Function& F, VectorVariant& Variant);
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Given that this is likely to change for different architectures, I wonder whether it is worth to redirect it to an overload method of TargetTransformInfo. fpetrogalli: Given that this is likely to change for different architectures, I wonder whether it is worth…

	} // llvm namespace			} // llvm namespace

	#endif			#endif

lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,724 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
continue;		continue;

// If the function has an explicit vectorized counterpart, we can safely		// If the function has an explicit vectorized counterpart, we can safely
// assume that it can be vectorized.		// assume that it can be vectorized.
if (Call && !Call->isNoBuiltin() && Call->getCalledFunction() &&		if (Call && !Call->isNoBuiltin() && Call->getCalledFunction() &&
TLI->isFunctionVectorizable(Call->getCalledFunction()->getName()))		TLI->isFunctionVectorizable(Call->getCalledFunction()->getName()))
continue;		continue;

		if (Call &&
		Call->getCalledFunction()->hasFnAttribute("vector-variants"))
		continue;
		hfinkelUnsubmitted Not Done Reply Inline Actions I don't think that this is obvious. If IsAnnotatedParallel is true, then this is fine. If not, I don't think that vector-variants implies anything about the memory-referencing behavior of the variant (and it might be some external thing we can't analyze). I imagine that, at least most of the time, when locally defined, these will be argmemonly functions (but I don't think that even that is guaranteed, unfortunately). Also, if we've been told to vectorize, and assured that it is safe, we should do this even if there are calls without vector variants (we'll just scalarize in that case). I see no reason that a function with inapplicable vector variants should be special in this sense. You might want to do that and then rebase on top of that change. hfinkel: I don't think that this is obvious. If IsAnnotatedParallel is true, then this is fine. If not…

auto *Ld = dyn_cast<LoadInst>(&I);		auto *Ld = dyn_cast<LoadInst>(&I);
if (!Ld \|\| (!Ld->isSimple() && !IsAnnotatedParallel)) {		if (!Ld \|\| (!Ld->isSimple() && !IsAnnotatedParallel)) {
recordAnalysis("NonSimpleLoad", Ld)		recordAnalysis("NonSimpleLoad", Ld)
<< "read with atomic ordering or volatile read";		<< "read with atomic ordering or volatile read";
DEBUG(dbgs() << "LAA: Found a non-simple load.\n");		DEBUG(dbgs() << "LAA: Found a non-simple load.\n");
CanVecMem = false;		CanVecMem = false;
return;		return;
}		}
▲ Show 20 Lines • Show All 521 Lines • Show Last 20 Lines

lib/Analysis/VectorUtils.cpp

Show All 13 Lines
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/ADT/EquivalenceClasses.h"		#include "llvm/ADT/EquivalenceClasses.h"
#include "llvm/Analysis/DemandedBits.h"		#include "llvm/Analysis/DemandedBits.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorVariant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;
▲ Show 20 Lines • Show All 539 Lines • ▼ Show 20 Lines	if (NumVecs % 2 != 0)
TmpList.push_back(ResList[NumVecs - 1]);		TmpList.push_back(ResList[NumVecs - 1]);

ResList = TmpList;		ResList = TmpList;
NumVecs = ResList.size();		NumVecs = ResList.size();
} while (NumVecs > 1);		} while (NumVecs > 1);

return ResList[0];		return ResList[0];
}		}

		Type *llvm::calcCharacteristicType(Function &F, VectorVariant &Variant) {
		Type *ReturnType = F.getReturnType();
		Type *CharacteristicDataType = nullptr;

		if (!ReturnType->isVoidTy())
		CharacteristicDataType = ReturnType;

		if (!CharacteristicDataType) {

		std::vector<VectorKind> &ParmKinds = Variant.getParameters();
		Function::const_arg_iterator ArgIt = F.arg_begin();
		Function::const_arg_iterator ArgEnd = F.arg_end();
		std::vector<VectorKind>::iterator VKIt = ParmKinds.begin();

		for (; ArgIt != ArgEnd; ++ArgIt, ++VKIt) {
		if (VKIt->isVector()) {
		CharacteristicDataType = (*ArgIt).getType();
		break;
		}
		}
		}

		// TODO except Clang's ComplexType
		hfinkelUnsubmitted Not Done Reply Inline Actions Please make this comment more formal. In part, you should explain that the ABI for _Complex is platform dependent (including at the IR level). hfinkel: Please make this comment more formal. In part, you should explain that the ABI for _Complex is…
		if (!CharacteristicDataType \|\| CharacteristicDataType->isStructTy()) {
		CharacteristicDataType = Type::getInt32Ty(F.getContext());
		}

		// Legalize the characteristic type based on target requirements.
		CharacteristicDataType =
		Variant.promoteToSupportedType(CharacteristicDataType);

		if (CharacteristicDataType->isPointerTy()) {
		// For such cases as 'int* foo(int x)', where x is a non-vector type, the
		// characteristic type at this point will be i32*. If we use the DataLayout
		// to query the supported pointer size, then a promotion to i64* is
		// incorrect because the mask element type will mismatch the element type
		// of the characteristic type.
		PointerType *PointerTy = cast<PointerType>(CharacteristicDataType);
		CharacteristicDataType = PointerTy->getElementType();
		}

		return CharacteristicDataType;
		}

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
		#include "llvm/Analysis/VectorVariant.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
▲ Show 20 Lines • Show All 3,758 Lines • ▼ Show 20 Lines	static unsigned getVectorCallCost(CallInst *CI, unsigned VF,
// packing the return values to a vector.		// packing the return values to a vector.
unsigned ScalarizationCost = getScalarizationOverhead(CI, VF, TTI);		unsigned ScalarizationCost = getScalarizationOverhead(CI, VF, TTI);

unsigned Cost = ScalarCallCost * VF + ScalarizationCost;		unsigned Cost = ScalarCallCost * VF + ScalarizationCost;

// If we can't emit a vector call for this function, then the currently found		// If we can't emit a vector call for this function, then the currently found
// cost is the cost we need to return.		// cost is the cost we need to return.
NeedToScalarize = true;		NeedToScalarize = true;
if (!TLI \|\| !TLI->isFunctionVectorizable(FnName, VF) \|\| CI->isNoBuiltin())		if ((!TLI \|\| !TLI->isFunctionVectorizable(FnName, VF) \|\| CI->isNoBuiltin()) &&
		!CI->getCalledFunction()->hasFnAttribute("vector-variants"))
return Cost;		return Cost;

// If the corresponding vector cost is cheaper, return its cost.		// If the corresponding vector cost is cheaper, return its cost.
unsigned VectorCallCost = TTI.getCallInstrCost(nullptr, RetTy, Tys);		unsigned VectorCallCost = TTI.getCallInstrCost(nullptr, RetTy, Tys);
if (VectorCallCost < Cost) {		if (VectorCallCost < Cost) {
NeedToScalarize = false;		NeedToScalarize = false;
return VectorCallCost;		return VectorCallCost;
}		}
▲ Show 20 Lines • Show All 818 Lines • ▼ Show 20 Lines	assert((I.getOpcode() == Instruction::UDiv \|\|
I.getOpcode() == Instruction::URem \|\|		I.getOpcode() == Instruction::URem \|\|
I.getOpcode() == Instruction::SRem) &&		I.getOpcode() == Instruction::SRem) &&
"Unexpected instruction");		"Unexpected instruction");
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

		static VectorVariant* matchVectorVariant(Function *CalledFunc, unsigned VF,
		bool IsMasked,
		const TargetTransformInfo *TTI) {

		DEBUG(dbgs() << "\nCall VF: " << VF << "\n");
		unsigned TargetMaxRegWidth = TTI->getRegisterBitWidth(true);
		DEBUG(dbgs() << "Target Max Register Width: " << TargetMaxRegWidth << "\n");

		TargetTransformInfo::ISAClass TargetIsaClass =
		TTI->getISAClassForMaxVecRegSize();
		DEBUG(dbgs() << "Target ISA Class: "
		<< TTI->ISAClassToString(TargetIsaClass) << "\n\n");

		Attribute Attr = CalledFunc->getFnAttribute("vector-variants");
		StringRef VariantsStr = Attr.getValueAsString();
		SmallVector<StringRef, 4> Variants;
		VariantsStr.split(Variants, ",");
		for (unsigned i = 0; i < Variants.size(); i++) {
		VectorVariant *Variant = new VectorVariant(Variants[i], TTI);
		TargetTransformInfo::ISAClass VariantIsaClass = Variant->getISA();
		DEBUG(dbgs() << "Variant ISA Class: "
		<< TTI->ISAClassToString(VariantIsaClass) << "\n");
		unsigned IsaClassMaxRegWidth =
		TTI->ISAClassMaxRegisterWidth(VariantIsaClass);
		DEBUG(dbgs() << "Isa Class Max Vector Register Width: "
		<< IsaClassMaxRegWidth << "\n");
		unsigned FuncVF = Variant->getVlen();
		DEBUG(dbgs() << "Func VF: " << FuncVF << "\n\n");

		// Pick candidate functions based on the target ISA class, masked property,
		// and loop VF == Variant VF. For now, matching is limited to exact matching
		// on VF. If no match exits based on these criteria, the calls will be
		// scalarized. This could be extended in the future for when:
		//
		// 1) the only available simd function variants have a VF that is less than
		hfinkelUnsubmitted Not Done Reply Inline Actions the only -> The only hfinkel: the only -> The only
		// the loop VF. In this case, multiple calls can be made to the simd
		// function. Currently, however, LV only keeps a 1-1 scalar -> vector
		// Value mapping.
		//
		// 2) the only available simd function variants have a VF that is greater
		hfinkelUnsubmitted Not Done Reply Inline Actions the only -> The only hfinkel: the only -> The only
		// than the loop VF. In this case, we can make the call to the simd
		// function and effectively mask off the unused vector parts.
		//
		if (VariantIsaClass == TargetIsaClass && Variant->isMasked() == IsMasked &&
		VF == Variant->getVlen()) {
		return Variant;
		}
		}

		return nullptr;
		}

void InnerLoopVectorizer::widenInstruction(Instruction &I) {		void InnerLoopVectorizer::widenInstruction(Instruction &I) {
switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Br:		case Instruction::Br:
case Instruction::PHI:		case Instruction::PHI:
llvm_unreachable("This instruction is handled by a different recipe.");		llvm_unreachable("This instruction is handled by a different recipe.");
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
// Construct a vector GEP by widening the operands of the scalar GEP as		// Construct a vector GEP by widening the operands of the scalar GEP as
// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP		// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenInstruction(Instruction &I) {
case Instruction::Call: {		case Instruction::Call: {
// Ignore dbg intrinsics.		// Ignore dbg intrinsics.
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(I))
break;		break;
setDebugLocFromInst(Builder, &I);		setDebugLocFromInst(Builder, &I);

Module *M = I.getParent()->getParent()->getParent();		Module *M = I.getParent()->getParent()->getParent();
auto *CI = cast<CallInst>(&I);		auto *CI = cast<CallInst>(&I);

StringRef FnName = CI->getCalledFunction()->getName();		StringRef FnName = CI->getCalledFunction()->getName();
Function *F = CI->getCalledFunction();		Function *F = CI->getCalledFunction();

		// Find the appropriate simd function match and generate the mask if this
		// is a masked simd function.
		bool UseSimdFunction = F->hasFnAttribute("vector-variants") ? true : false;
		std::vector<VectorKind> SimdFuncParms;
		VectorVariant *SimdVariant = nullptr;
		bool IsMasked = Legal->isMaskRequired(&I);
		Type *CharacteristicTy = nullptr;
		VectorParts Mask;
		if (UseSimdFunction) {
		SimdVariant = matchVectorVariant(F, VF, IsMasked, TTI);
		DEBUG(dbgs() << "Matched Variant: " << SimdVariant->encode() << "\n");
		hfinkelUnsubmitted Not Done Reply Inline Actions SimdVariant might be nullptr; don't segfault in the DEBUG statement in that case. hfinkel: SimdVariant might be nullptr; don't segfault in the DEBUG statement in that case.
		SimdFuncParms = SimdVariant->getParameters();
		CharacteristicTy = calcCharacteristicType(F, SimdVariant);
		if (IsMasked) {
		Mask = createBlockInMask(CI->getParent());
		}
		}

Type *RetTy = ToVectorTy(CI->getType(), VF);		Type *RetTy = ToVectorTy(CI->getType(), VF);
SmallVector<Type *, 4> Tys;		SmallVector<Type *, 4> Tys;
for (Value *ArgOperand : CI->arg_operands())		for (unsigned i = 0; i < CI->getNumArgOperands(); i++) {
		Value *ArgOperand = CI->getArgOperand(i);
		if (!UseSimdFunction \|\| SimdFuncParms[i].isVector())
Tys.push_back(ToVectorTy(ArgOperand->getType(), VF));		Tys.push_back(ToVectorTy(ArgOperand->getType(), VF));
		else
		Tys.push_back(ArgOperand->getType());
		}

		// Masked simd functions need an extra mask parameter, so add its type to
		// the Tys list.
		if (UseSimdFunction && IsMasked) {
		Tys.push_back(ToVectorTy(CharacteristicTy, VF));
		}

Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// The flag shows whether we use Intrinsic or a usual Call for vectorized		// The flag shows whether we use Intrinsic or a usual Call for vectorized
// version of the instruction.		// version of the instruction.
// Is it beneficial to perform intrinsic call compared to lib call?		// Is it beneficial to perform intrinsic call compared to lib call?
bool NeedToScalarize;		bool NeedToScalarize;
unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
bool UseVectorIntrinsic =		bool UseVectorIntrinsic =
ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;		ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
assert((UseVectorIntrinsic \|\| !NeedToScalarize) &&		assert((UseVectorIntrinsic \|\| !NeedToScalarize) &&
"Instruction should be scalarized elsewhere.");		"Instruction should be scalarized elsewhere.");

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Value *, 4> Args;		SmallVector<Value *, 4> Args;
for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {		for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
Value *Arg = CI->getArgOperand(i);		Value *Arg = CI->getArgOperand(i);
// Some intrinsics have a scalar argument - don't replace it with a		// Some intrinsics have a scalar argument - don't replace it with a
// vector.		// vector. Likewise, linear and uniform parameters for simd functions
if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i))		// are passed as scalars, so don't vectorize those either.
		bool IsScalarSimdArg = UseSimdFunction &&
		(SimdFuncParms[i].isLinear() \|\| SimdFuncParms[i].isUniform());
		if (IsScalarSimdArg)
		Arg = getOrCreateScalarValue(Arg, {Part, 0});
		else if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i))
Arg = getOrCreateVectorValue(CI->getArgOperand(i), Part);		Arg = getOrCreateVectorValue(CI->getArgOperand(i), Part);
Args.push_back(Arg);		Args.push_back(Arg);
}		}

		// Promote mask of <VF x i1> mask to <VF x characteristic type> and add
		// it to the Args list.
		if (UseSimdFunction && IsMasked) {
		unsigned CharacteristicTySize =
		CharacteristicTy->getPrimitiveSizeInBits();
		// Mask is vector of i1. Promote it to an integer type that has the
		// same size as the characteristic type.
		Type *ScalarToType = IntegerType::get(CharacteristicTy->getContext(),
		CharacteristicTySize);
		VectorType *VecToType = VectorType::get(ScalarToType, VF);
		Value *MaskExt = Builder.CreateSExt(Mask[Part], VecToType, "mask.ext");

		// Bitcast if the promoted type is not the same as the characteristic
		// type.
		if (ScalarToType != CharacteristicTy) {
		Type *MaskCastTy = VectorType::get(CharacteristicTy, VF);
		Value *MaskCast = Builder.CreateBitCast(MaskExt, MaskCastTy,
		"mask.cast");
		Args.push_back(MaskCast);
		} else {
		Args.push_back(MaskExt);
		}
		}

Function *VectorF;		Function *VectorF;
if (UseVectorIntrinsic) {		if (UseVectorIntrinsic) {
// Use vector version of the intrinsic.		// Use vector version of the intrinsic.
Type *TysForDecl[] = {CI->getType()};		Type *TysForDecl[] = {CI->getType()};
if (VF > 1)		if (VF > 1)
TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);		TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);
VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);		VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);
} else {		} else {
// Use vector version of the library call.		// Use vector version of the library call or a simd function.
StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);		StringRef VFnName;
		if (UseSimdFunction) {
		std::string VariantName = SimdVariant->encode() + FnName.str();
		VFnName = VariantName;
		} else
		VFnName = TLI->getVectorizedFunction(FnName, VF);
assert(!VFnName.empty() && "Vector function name is empty.");		assert(!VFnName.empty() && "Vector function name is empty.");
VectorF = M->getFunction(VFnName);		VectorF = M->getFunction(VFnName);
if (!VectorF) {		if (!VectorF) {
// Generate a declaration		// Generate a declaration
FunctionType *FTy = FunctionType::get(RetTy, Tys, false);		FunctionType *FTy = FunctionType::get(RetTy, Tys, false);
VectorF =		VectorF =
Function::Create(FTy, Function::ExternalLinkage, VFnName, M);		Function::Create(FTy, Function::ExternalLinkage, VFnName, M);
VectorF->copyAttributesFrom(F);		VectorF->copyAttributesFrom(F);
}		}
}		}
assert(VectorF && "Can't create vector function.");		assert(VectorF && "Can't create vector function.");

SmallVector<OperandBundleDef, 1> OpBundles;		SmallVector<OperandBundleDef, 1> OpBundles;
CI->getOperandBundlesAsDefs(OpBundles);		CI->getOperandBundlesAsDefs(OpBundles);
CallInst *V = Builder.CreateCall(VectorF, Args, OpBundles);		CallInst *V = Builder.CreateCall(VectorF, Args, OpBundles);

if (isa<FPMathOperator>(V))		if (isa<FPMathOperator>(V))
V->copyFastMathFlags(CI);		V->copyFastMathFlags(CI);

VectorLoopValueMap.setVectorValue(&I, Part, V);		VectorLoopValueMap.setVectorValue(&I, Part, V);
addMetadata(V, &I);		addMetadata(V, &I);
}		}

		if (UseSimdFunction) {
		// Remove simd function attributes from the original function just
		// in case VecClone runs again.
		F->removeFnAttr("vector-variants");
		hfinkelUnsubmitted Not Done Reply Inline Actions Don't do this. VecClone needs to be able to detect when it doesn't need to generate a new function. hfinkel: Don't do this. VecClone needs to be able to detect when it doesn't need to generate a new…
		delete SimdVariant;
		}

break;		break;
}		}

default:		default:
// All other instructions are scalarized.		// All other instructions are scalarized.
DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);		DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);
llvm_unreachable("Unhandled instruction!");		llvm_unreachable("Unhandled instruction!");
} // end of switch.		} // end of switch.
▲ Show 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
// We handle calls that:		// We handle calls that:
// * Are debug info intrinsics.		// * Are debug info intrinsics.
// * Have a mapping to an IR intrinsic.		// * Have a mapping to an IR intrinsic.
// * Have a vector version available.		// * Have a vector version available.
auto *CI = dyn_cast<CallInst>(&I);		auto *CI = dyn_cast<CallInst>(&I);
if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&		if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
!isa<DbgInfoIntrinsic>(CI) &&		!isa<DbgInfoIntrinsic>(CI) &&
!(CI->getCalledFunction() && TLI &&		!(CI->getCalledFunction() && TLI &&
TLI->isFunctionVectorizable(CI->getCalledFunction()->getName()))) {		TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())) &&
		!(CI->getCalledFunction()->hasFnAttribute("vector-variants"))) {
ORE->emit(createMissedAnalysis("CantVectorizeCall", CI)		ORE->emit(createMissedAnalysis("CantVectorizeCall", CI)
<< "call instruction cannot be vectorized");		<< "call instruction cannot be vectorized");
DEBUG(dbgs() << "LV: Found a non-intrinsic, non-libfunc callsite.\n");		DEBUG(dbgs() << "LV: Found a non-intrinsic, non-libfunc callsite.\n");
return false;		return false;
}		}

// Intrinsics such as powi,cttz and ctlz are legal to vectorize if the		// Intrinsics such as powi,cttz and ctlz are legal to vectorize if the
// second argument is the same (i.e. loop invariant)		// second argument is the same (i.e. loop invariant)
▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::blockCanBePredicated(

for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// Check that we don't have a constant expression that can trap as operand.		// Check that we don't have a constant expression that can trap as operand.
for (Value *Operand : I.operands()) {		for (Value *Operand : I.operands()) {
if (auto *C = dyn_cast<Constant>(Operand))		if (auto *C = dyn_cast<Constant>(Operand))
if (C->canTrap())		if (C->canTrap())
return false;		return false;
}		}
		// We will need a mask for masked SIMD functions.
		auto *CI = dyn_cast<CallInst>(&I);
		if (CI && CI->getCalledFunction()->hasFnAttribute("vector-variants")) {
		MaskedOp.insert(CI);
		continue;
		}
// We might be able to hoist the load.		// We might be able to hoist the load.
if (I.mayReadFromMemory()) {		if (I.mayReadFromMemory()) {
auto *LI = dyn_cast<LoadInst>(&I);		auto *LI = dyn_cast<LoadInst>(&I);
if (!LI)		if (!LI)
return false;		return false;
if (!SafePtrs.count(LI->getPointerOperand())) {		if (!SafePtrs.count(LI->getPointerOperand())) {
if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) \|\|		if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) \|\|
isLegalMaskedGather(LI->getType())) {		isLegalMaskedGather(LI->getType())) {
▲ Show 20 Lines • Show All 2,249 Lines • ▼ Show 20 Lines	if (CallInst *CI = dyn_cast<CallInst>(I)) {
// The following case may be scalarized depending on the VF.		// The following case may be scalarized depending on the VF.
// The flag shows whether we use Intrinsic or a usual Call for vectorized		// The flag shows whether we use Intrinsic or a usual Call for vectorized
// version of the instruction.		// version of the instruction.
// Is it beneficial to perform intrinsic call compared to lib call?		// Is it beneficial to perform intrinsic call compared to lib call?
bool NeedToScalarize;		bool NeedToScalarize;
unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
bool UseVectorIntrinsic =		bool UseVectorIntrinsic =
ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;		ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;

		// If a match for the simd function was not found, then just scalarize the
		// calls to the original function.
		Function *F = CI->getCalledFunction();
		if (F->hasFnAttribute("vector-variants")) {
		bool IsMasked = Legal->isMaskRequired(I);
		VectorVariant *SimdVariant = matchVectorVariant(F, VF, IsMasked, TTI);
		if (!SimdVariant)
		return false;
		}

return UseVectorIntrinsic \|\| !NeedToScalarize;		return UseVectorIntrinsic \|\| !NeedToScalarize;
}		}
if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {		if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {
LoopVectorizationCostModel::InstWidening Decision =		LoopVectorizationCostModel::InstWidening Decision =
CM.getWideningDecision(I, VF);		CM.getWideningDecision(I, VF);
assert(Decision != LoopVectorizationCostModel::CM_Unknown &&		assert(Decision != LoopVectorizationCostModel::CM_Unknown &&
"CM decision should be taken at this point.");		"CM decision should be taken at this point.");
assert(Decision != LoopVectorizationCostModel::CM_Interleave &&		assert(Decision != LoopVectorizationCostModel::CM_Interleave &&
▲ Show 20 Lines • Show All 672 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/masked_simd_func.ll

				; Note: Test the simd function caller side functionality. The function side vectorization is tested under VecClone.

				; RUN: opt < %s -vec-clone -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s

				; CHECK: call <8 x i32> @_ZGVdM8vlu_dowork

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

				; Function Attrs: noinline nounwind uwtable
				define i32 @dowork(i32 %b, i32 %k, i32 %c) #0 {
				entry:
				%add = add nsw i32 %b, %k
				%add1 = add nsw i32 %add, %c
				ret i32 %add1
				}

				; Function Attrs: noinline nounwind uwtable
				define i32 @main() local_unnamed_addr #1 {
				entry:
				%a = alloca [4096 x i32], align 16
				%b = alloca [4096 x i32], align 16
				%0 = bitcast [4096 x i32]* %a to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %0) #5
				%1 = bitcast [4096 x i32]* %b to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %1) #5
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv39 = phi i64 [ 0, %entry ], [ %indvars.iv.next40, %for.body ]
				%arrayidx = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv39
				%2 = trunc i64 %indvars.iv39 to i32
				store i32 %2, i32* %arrayidx, align 4, !tbaa !2
				%indvars.iv.next40 = add nuw nsw i64 %indvars.iv39, 1
				%exitcond41 = icmp eq i64 %indvars.iv.next40, 4096
				br i1 %exitcond41, label %for.end, label %for.body

				for.end: ; preds = %for.body
				%arrayidx1 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 3
				%3 = load i32, i32* %arrayidx1, align 4, !tbaa !2
				br label %omp.inner.for.body

				omp.inner.for.body: ; preds = %omp.inner.for.inc, %for.end
				%indvars.iv36 = phi i64 [ 0, %for.end ], [ %indvars.iv.next37, %omp.inner.for.inc ]
				%4 = trunc i64 %indvars.iv36 to i32
				%rem = and i32 %4, 1
				%tobool = icmp eq i32 %rem, 0
				br i1 %tobool, label %omp.inner.for.inc, label %if.then

				if.then: ; preds = %omp.inner.for.body
				%arrayidx5 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv36
				%5 = load i32, i32* %arrayidx5, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%call = tail call i32 @dowork(i32 %5, i32 %4, i32 %3), !llvm.mem.parallel_loop_access !6
				%arrayidx7 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv36
				store i32 %call, i32* %arrayidx7, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				br label %omp.inner.for.inc

				omp.inner.for.inc: ; preds = %omp.inner.for.body, %if.then
				%indvars.iv.next37 = add nuw nsw i64 %indvars.iv36, 1
				%exitcond38 = icmp eq i64 %indvars.iv.next37, 4096
				br i1 %exitcond38, label %omp.inner.for.end, label %omp.inner.for.body, !llvm.loop !6

				omp.inner.for.end: ; preds = %omp.inner.for.inc
				br label %for.body11

				for.body11: ; preds = %for.body11, %omp.inner.for.end
				%indvars.iv = phi i64 [ 0, %omp.inner.for.end ], [ %indvars.iv.next, %for.body11 ]
				%arrayidx13 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv
				%6 = load i32, i32* %arrayidx13, align 4, !tbaa !2
				%call14 = tail call i32 (i8, ...) @printf(i8 getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %6)
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4096
				br i1 %exitcond, label %for.end17, label %for.body11

				for.end17: ; preds = %for.body11
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %1) #5
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %0) #5
				ret i32 0
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #2

				declare i32 @printf(i8*, ...) #3

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN4vlu_dowork,_ZGVcN8vlu_dowork,_ZGVdN8vlu_dowork,_ZGVeN16vlu_dowork,_ZGVbM4vlu_dowork,_ZGVcM8vlu_dowork,_ZGVdM8vlu_dowork,_ZGVeM16vlu_dowork" }
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Could you unit test this only adding the _ZGVdM8vlu_dowork variant to the attribute? fpetrogalli: Could you unit test this only adding the _ZGVdM8vlu_dowork variant to the attribute?
				attributes #1 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { argmemonly nounwind }
				attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #4 = { nounwind }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}
				!6 = distinct !{!6, !7}
				!7 = !{!"llvm.loop.vectorize.enable", i1 true}

test/Transforms/LoopVectorize/simd_func.ll

				; Note: Test the simd function caller side functionality. The function side vectorization is tested under VecClone.

				; RUN: opt < %s -vec-clone -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s

				; CHECK: call <8 x i32> @_ZGVdN8vlu_dowork

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

				; Function Attrs: noinline nounwind uwtable
				define i32 @dowork(i32 %b, i32 %k, i32 %c) #0 {
				entry:
				%add = add nsw i32 %b, %k
				%add1 = add nsw i32 %add, %c
				ret i32 %add1
				}

				; Function Attrs: noinline nounwind uwtable
				define i32 @main() local_unnamed_addr #1 {
				entry:
				%a = alloca [4096 x i32], align 16
				%b = alloca [4096 x i32], align 16
				%0 = bitcast [4096 x i32]* %a to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %0) #5
				%1 = bitcast [4096 x i32]* %b to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %1) #5
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv38 = phi i64 [ 0, %entry ], [ %indvars.iv.next39, %for.body ]
				%arrayidx = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv38
				%2 = trunc i64 %indvars.iv38 to i32
				store i32 %2, i32* %arrayidx, align 4, !tbaa !2
				%indvars.iv.next39 = add nuw nsw i64 %indvars.iv38, 1
				%exitcond40 = icmp eq i64 %indvars.iv.next39, 4096
				br i1 %exitcond40, label %for.end, label %for.body

				for.end: ; preds = %for.body
				%arrayidx1 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 3
				%3 = load i32, i32* %arrayidx1, align 4, !tbaa !2
				br label %omp.inner.for.body

				omp.inner.for.body: ; preds = %omp.inner.for.body, %for.end
				%indvars.iv35 = phi i64 [ 0, %for.end ], [ %indvars.iv.next36, %omp.inner.for.body ]
				%arrayidx5 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv35
				%4 = load i32, i32* %arrayidx5, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%5 = trunc i64 %indvars.iv35 to i32
				%call = tail call i32 @dowork(i32 %4, i32 %5, i32 %3), !llvm.mem.parallel_loop_access !6
				%arrayidx7 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv35
				store i32 %call, i32* %arrayidx7, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%exitcond37 = icmp eq i64 %indvars.iv.next36, 4096
				br i1 %exitcond37, label %omp.inner.for.end, label %omp.inner.for.body, !llvm.loop !6

				omp.inner.for.end: ; preds = %omp.inner.for.body
				br label %for.body11

				for.body11: ; preds = %for.body11, %omp.inner.for.end
				%indvars.iv = phi i64 [ 0, %omp.inner.for.end ], [ %indvars.iv.next, %for.body11 ]
				%arrayidx13 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv
				%6 = load i32, i32* %arrayidx13, align 4, !tbaa !2
				%call14 = tail call i32 (i8, ...) @printf(i8 getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %6)
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4096
				br i1 %exitcond, label %for.end17, label %for.body11

				for.end17: ; preds = %for.body11
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %1) #5
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %0) #5
				ret i32 0
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #2

				declare i32 @printf(i8*, ...) #3

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN4vlu_dowork,_ZGVcN8vlu_dowork,_ZGVdN8vlu_dowork,_ZGVeN16vlu_dowork,_ZGVbM4vlu_dowork,_ZGVcM8vlu_dowork,_ZGVdM8vlu_dowork,_ZGVeM16vlu_dowork" }
				attributes #1 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { argmemonly nounwind }
				attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #4 = { nounwind }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}
				!6 = distinct !{!6, !7}
				!7 = !{!"llvm.loop.vectorize.enable", i1 true}

test/Transforms/LoopVectorize/simd_func_scalar.ll

				; Note: Test the simd function caller side functionality. The function side vectorization is tested under VecClone.

				; RUN: opt < %s -vec-clone -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s

				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork
				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork
				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork
				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

				; Function Attrs: noinline nounwind uwtable
				define i32 @dowork(i32 %b, i32 %k, i32 %c) #0 {
				entry:
				%add = add nsw i32 %b, %k
				%add1 = add nsw i32 %add, %c
				ret i32 %add1
				}

				; Function Attrs: noinline nounwind uwtable
				define i32 @main() local_unnamed_addr #1 {
				entry:
				%a = alloca [4096 x i32], align 16
				%b = alloca [4096 x i32], align 16
				%0 = bitcast [4096 x i32]* %a to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %0) #5
				%1 = bitcast [4096 x i32]* %b to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %1) #5
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv38 = phi i64 [ 0, %entry ], [ %indvars.iv.next39, %for.body ]
				%arrayidx = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv38
				%2 = trunc i64 %indvars.iv38 to i32
				store i32 %2, i32* %arrayidx, align 4, !tbaa !2
				%indvars.iv.next39 = add nuw nsw i64 %indvars.iv38, 1
				%exitcond40 = icmp eq i64 %indvars.iv.next39, 4096
				br i1 %exitcond40, label %for.end, label %for.body

				for.end: ; preds = %for.body
				%arrayidx1 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 3
				%3 = load i32, i32* %arrayidx1, align 4, !tbaa !2
				br label %omp.inner.for.body

				omp.inner.for.body: ; preds = %omp.inner.for.body, %for.end
				%indvars.iv35 = phi i64 [ 0, %for.end ], [ %indvars.iv.next36, %omp.inner.for.body ]
				%arrayidx5 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv35
				%4 = load i32, i32* %arrayidx5, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%5 = trunc i64 %indvars.iv35 to i32
				%call = tail call i32 @dowork(i32 %4, i32 %5, i32 %3), !llvm.mem.parallel_loop_access !6
				%arrayidx7 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv35
				store i32 %call, i32* %arrayidx7, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%exitcond37 = icmp eq i64 %indvars.iv.next36, 4096
				br i1 %exitcond37, label %omp.inner.for.end, label %omp.inner.for.body, !llvm.loop !6

				omp.inner.for.end: ; preds = %omp.inner.for.body
				br label %for.body11

				for.body11: ; preds = %for.body11, %omp.inner.for.end
				%indvars.iv = phi i64 [ 0, %omp.inner.for.end ], [ %indvars.iv.next, %for.body11 ]
				%arrayidx13 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv
				%6 = load i32, i32* %arrayidx13, align 4, !tbaa !2
				%call14 = tail call i32 (i8, ...) @printf(i8 getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %6)
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4096
				br i1 %exitcond, label %for.end17, label %for.body11

				for.end17: ; preds = %for.body11
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %1) #5
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %0) #5
				ret i32 0
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #2

				declare i32 @printf(i8*, ...) #3

				attributes #0 = { noinline norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN8vlu_dowork,_ZGVcN8vlu_dowork,_ZGVdN8vlu_dowork,_ZGVeN8vlu_dowork,_ZGVbM8vlu_dowork,_ZGVcM8vlu_dowork,_ZGVdM8vlu_dowork,_ZGVeM8vlu_dowork" }
				attributes #1 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { argmemonly nounwind }
				attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #4 = { nounwind }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}
				!6 = distinct !{!6, !7, !8}
				!7 = !{!"llvm.loop.vectorize.width", i32 4}
				!8 = !{!"llvm.loop.vectorize.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

LoopVectorize support for simd functionsNeeds ReviewPublic

Details

Diff Detail

Event Timeline