This is an archive of the discontinued LLVM Phabricator instance.

AVX-512 cost calculation for interleave load/store patterns
ClosedPublic

Authored by delena on Dec 26 2016, 7:12 AM.

Download Raw Diff

Details

Reviewers

RKSimon
mkuper
Farhana

Commits

rG21706cbd2488: AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
rL290810: AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.

Summary

X86 target does not provide any target specific cost calculation for interleave patterns. It uses the common target-independent calculation, which gives very high numbers.
As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

Diff Detail

Repository: rL LLVM

Event Timeline

delena updated this revision to Diff 82501.Dec 26 2016, 7:12 AM

delena retitled this revision from to AVX-512 cost calculation for interleave load/store patterns.

delena updated this object.

delena added reviewers: mkuper, Farhana.

delena set the repository for this revision to rL LLVM.

delena added subscribers: llvm-commits, Ayal.

mkuper added inline comments.Dec 27 2016, 11:16 AM

../include/llvm/Analysis/TargetTransformInfo.h
467 ↗	(On Diff #82501)	Can you make this clearer? It's not obvious what "merge" means. Does the order matter?
469 ↗	(On Diff #82501)	Is the extra space intentional? Also, maybe "one" -> " a single"?
../include/llvm/CodeGen/BasicTTIImpl.h
65 ↗	(On Diff #82501)	Why "All Permutations"? getPermuteShuffleOverhead(), maybe? Also, I'm not sure what this has to do with permutations, especially given the example below. (TBH, It didn't have anything to do with SK_Alternate, it was just used that way. That wasn't really good either).
359 ↗	(On Diff #82501)	This code makes very little sense to me. Not your change, but the original code. Why does this special-case "alt shuffle", of all things? And why should this special-case only these specific shuffles after the change? I think this is backwards - there may be specific shuffle types that are cheap by default - e.g. broadcast makes sense. Reverse? Not so much.
../lib/Target/X86/X86TargetTransformInfo.cpp
804 ↗	(On Diff #82501)	Are you sure this is correct? I mean, this is fine for SK_Reverse, but I don't think it works for general shuffles. I mean, let's say you have a shuffle of two v256i8. Legalization will give you two sets of 4 * v64i8, but you don't end up with 4 two-input shuffles, since each of the 4 output vectors may depend on any subset of the 8 input vectors, so you may need a lot more shuffles. Am I missing something?
851 ↗	(On Diff #82501)	Same as above, I don't think this works for PermuteOneSrc either.
886 ↗	(On Diff #82501)	Are you planning on adding SSE4, AVX and AVX2 costs as well? This isn't a blocker for this patch, and should be a separate patch in any case, I'm just curious.
2131 ↗	(On Diff #82501)	This line is > 80 chars.

mssimpso added a subscriber: mssimpso.Dec 28 2016, 5:05 AM

zvi added a subscriber: zvi.Dec 28 2016, 7:34 AM

delena marked 2 inline comments as done.Dec 29 2016, 1:16 AM

delena added inline comments.

../include/llvm/Analysis/TargetTransformInfo.h
467 ↗	(On Diff #82501)	The order does not matter. I meant any permutation of elements from 2 source vectors. I've changed the comment.
../include/llvm/CodeGen/BasicTTIImpl.h
65 ↗	(On Diff #82501)	In the worst case, the shuffle is being replaced with "extracts" and "inserts". In my opinion, SK_Reverse should also have this overhead. And the SK_Broadcast is not always 1 instruction. But I don't want to fix everything in this patch.
359 ↗	(On Diff #82501)	Reverse is not cheap for all types, VPERMW, for example, appears on AVX-512-BW. And broadcast for i16/i8 was added to AVX2 ISA.
../lib/Target/X86/X86TargetTransformInfo.cpp
804 ↗	(On Diff #82501)	You are not missing anything. The cost, given here is the right cost for the legal types. After split the cost should be (NumOfSrc2 -1)Entry->Cost. I took this into account in AVX-512 calculations. I'll fix.
851 ↗	(On Diff #82501)	I fixed and added a test. thanks.
886 ↗	(On Diff #82501)	It should be added, I'm not sure that I'll be able to take it immediately after this patch. (and I have one more patch that should compare "interleave" with gather/scatter). I'll try to find an example, where high "interleaving" cost on AVX2 prevents vectorization and fill PR. But on AVX2, where we do not have any gather (or at least do not consider it as an option), we should compare "interleave" with strided-scalar. Mohamed is working on reducing scalar cost for strided access. We'll see what happens on AVX2 and earlier ISAs after his patch.

Some fixes in shuffle cost calculation after Michael comments. Added 2 more tests for shuffle cost model.

D27811 is working on much the same code, but I'm fine with you getting these changes in first and I'll carry on the refactor afterwards.

../lib/Analysis/CostModel.cpp
115 ↗	(On Diff #82656)	I've added a ShuffleVectorInst::isSplat helper in D27811 for the same purpose. It's probably worth you merging that part (inc the CodeGenPrepare.cpp change as well) here.

delena added inline comments.Dec 29 2016, 3:31 AM

../lib/Analysis/CostModel.cpp
115 ↗	(On Diff #82656)	The "broadcast" was not the matter of my patch and I do not really investigated in it, just added for completeness. And the broadcast-test is not full. I'd rather remove these changes at all and let you to proceed.

mkuper added inline comments.Dec 29 2016, 9:57 AM

../include/llvm/CodeGen/BasicTTIImpl.h
65 ↗	(On Diff #82501)	I didn't mean to imply you should, just trying to figure out how we can at least move this in the right direction - even if just in terms of naming/documentation. :-)
359 ↗	(On Diff #82501)	Sure. And this isn't even X86-specific, this is just trying to give sane defaults for all targets, and I don't think the current code does that. Could you please add a FIXME here?
../lib/Analysis/CostModel.cpp
112 ↗	(On Diff #82656)	Not 100% related, but I would expect us to canonicalize shuffles to avoid the "all elements come from the second input" case. (I'm not saying you should remove the check, since the cost is fairly minor... but I'm curious whether this is the case)
../lib/Target/X86/X86TargetTransformInfo.cpp
886 ↗	(On Diff #82501)	SGTM.
2207 ↗	(On Diff #82656)	Could you add an assert here?
../test/Analysis/CostModel/X86/shuffle-reverse.ll
126 ↗	(On Diff #82656)	I thought this patch was not supposed to touch the costs of SK_Reverse shuffles. Did we end up considering this shuffle SK_PermuteSingleSrc?
../test/Analysis/CostModel/X86/shuffle-two-src.ll
4 ↗	(On Diff #82656)	1 -> 2

delena marked 2 inline comments as done.Jan 1 2017, 4:36 AM

delena added inline comments.

../test/Analysis/CostModel/X86/shuffle-reverse.ll
126 ↗	(On Diff #82656)	In some cases "reverse" is cheaper in some not, but definitely not more expensive than SK_PermuteSingleSrc. I fixed one line in the "reverse" cost to make it consistent. But in order to redirect SK_Reverse to SK_PermuteSingleSrc I need to refactor the whole function. I don't want to do this in this patch. I'm also taking in account one more pending patch to the same place: https://reviews.llvm.org/D27811

Some minor changes after Michael's comments.

RKSimon added a reviewer: RKSimon.Jan 1 2017, 3:02 PM

zvi added a subscriber: magabari.Jan 1 2017, 11:49 PM

This LGTM, but please also wait for Simon's approval (or disapproval :-) ), just so your patches stay in sync.

../lib/Analysis/CostModel.cpp
115 ↗	(On Diff #82656)	So, if this goes in first, it'll get removed by Simon's clean-up, right?

LGTM, with a few minors. I can do the NFC refactor on the shuffle kinds after that and the finish off the broadcast patch.

I wonder whether we should consider adding a version of getShuffleCost that takes a raw shuffle mask instead of a shuffle kind enum - similar to getIntrinsicInstrCost. There is always going to be cases where the target code can identify more cheap shuffle cases. But that is a problem for another day - we would still need the SK_PermuteTwoSrc/SK_PermuteSingleSrc enums.

../lib/Analysis/CostModel.cpp
115 ↗	(On Diff #82656)	Yes, but that's not a problem.
../lib/Target/X86/X86TargetTransformInfo.cpp
617 ↗	(On Diff #82783)	I missed this when I did the earlier work on Reverse - better just to commit this (and the fixed cost test) separately.

This revision is now accepted and ready to land.Jan 2 2017, 1:35 AM

Closed by commit rL290810: AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns. (authored by delena). · Explain WhyJan 2 2017, 2:49 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

TargetTransformInfo.h

6 lines

CodeGen/

BasicTTIImpl.h

10 lines

lib/

Analysis/

CostModel.cpp

32 lines

Target/

X86/

X86TargetTransformInfo.h

7 lines

X86TargetTransformInfo.cpp

252 lines

test/

Analysis/

CostModel/

X86/

interleave-load-i32.ll

85 lines

interleave-store-i32.ll

85 lines

shuffle-broadcast.ll

18 lines

shuffle-single-src.ll

94 lines

68 lines

113 lines

110 lines

81 lines

117 lines

Diff 82798

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	public:
/// @{		/// @{

/// \brief The various kinds of shuffle patterns for vector queries.		/// \brief The various kinds of shuffle patterns for vector queries.
enum ShuffleKind {		enum ShuffleKind {
SK_Broadcast, ///< Broadcast element 0 to all other elements.		SK_Broadcast, ///< Broadcast element 0 to all other elements.
SK_Reverse, ///< Reverse the order of the vector.		SK_Reverse, ///< Reverse the order of the vector.
SK_Alternate, ///< Choose alternate elements from vector.		SK_Alternate, ///< Choose alternate elements from vector.
SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.		SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.
SK_ExtractSubvector ///< ExtractSubvector Index indicates start offset.		SK_ExtractSubvector,///< ExtractSubvector Index indicates start offset.
		SK_PermuteTwoSrc, ///< Merge elements from two source vectors into one
		///< with any shuffle mask.
		SK_PermuteSingleSrc ///< Shuffle elements of single source vector with any
		///< shuffle mask.
};		};

/// \brief Additional information about an operand's possible values.		/// \brief Additional information about an operand's possible values.
enum OperandValueKind {		enum OperandValueKind {
OK_AnyValue, // Operand can have any value.		OK_AnyValue, // Operand can have any value.
OK_UniformValue, // Operand is uniform (splat of a value).		OK_UniformValue, // Operand is uniform (splat of a value).
OK_UniformConstantValue, // Operand is uniform constant.		OK_UniformConstantValue, // Operand is uniform constant.
OK_NonUniformConstantValue // Operand is a non uniform constant value.		OK_NonUniformConstantValue // Operand is a non uniform constant value.
▲ Show 20 Lines • Show All 715 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
if (Extract)		if (Extract)
Cost += static_cast<T *>(this)		Cost += static_cast<T *>(this)
->getVectorInstrCost(Instruction::ExtractElement, Ty, i);		->getVectorInstrCost(Instruction::ExtractElement, Ty, i);
}		}

return Cost;		return Cost;
}		}

/// Estimate the cost overhead of SK_Alternate shuffle.		/// Estimate a cost of shuffle as a sequence of extract and insert
unsigned getAltShuffleOverhead(Type *Ty) {		/// operations.
		unsigned getPermuteShuffleOverhead(Type *Ty) {
assert(Ty->isVectorTy() && "Can only shuffle vectors");		assert(Ty->isVectorTy() && "Can only shuffle vectors");
unsigned Cost = 0;		unsigned Cost = 0;
// Shuffle cost is equal to the cost of extracting element from its argument		// Shuffle cost is equal to the cost of extracting element from its argument
// plus the cost of inserting them onto the result vector.		// plus the cost of inserting them onto the result vector.

// e.g. <4 x float> has a mask of <0,5,2,7> i.e we need to extract from		// e.g. <4 x float> has a mask of <0,5,2,7> i.e we need to extract from
// index 0 of first vector, index 1 of second vector,index 2 of first		// index 0 of first vector, index 1 of second vector,index 2 of first
// vector and finally index 3 of second vector and insert them at index		// vector and finally index 3 of second vector and insert them at index
▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	unsigned getArithmeticInstrCost(
}		}

// We don't know anything about this scalar instruction.		// We don't know anything about this scalar instruction.
return OpCost;		return OpCost;
}		}

unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
if (Kind == TTI::SK_Alternate) {		if (Kind == TTI::SK_Alternate \|\| Kind == TTI::SK_PermuteTwoSrc \|\|
return getAltShuffleOverhead(Tp);		Kind == TTI::SK_PermuteSingleSrc) {
		return getPermuteShuffleOverhead(Tp);
}		}
return 1;		return 1;
}		}

unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src) {		unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src) {
const TargetLoweringBase *TLI = getTLI();		const TargetLoweringBase *TLI = getTLI();
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");
▲ Show 20 Lines • Show All 658 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/CostModel.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines

static bool isReverseVectorMask(ArrayRef<int> Mask) {		static bool isReverseVectorMask(ArrayRef<int> Mask) {
for (unsigned i = 0, MaskSize = Mask.size(); i < MaskSize; ++i)		for (unsigned i = 0, MaskSize = Mask.size(); i < MaskSize; ++i)
if (Mask[i] >= 0 && Mask[i] != (int)(MaskSize - 1 - i))		if (Mask[i] >= 0 && Mask[i] != (int)(MaskSize - 1 - i))
return false;		return false;
return true;		return true;
}		}

		static bool isSingleSourceVectorMask(ArrayRef<int> Mask) {
		bool Vec0 = false;
		bool Vec1 = false;
		for (unsigned i = 0, NumVecElts = Mask.size(); i < NumVecElts; ++i) {
		if (Mask[i] >= 0) {
		if ((unsigned)Mask[i] >= NumVecElts)
		Vec1 = true;
		else
		Vec0 = true;
		}
		}
		return !(Vec0 && Vec1);
		}

		static bool isZeroEltBroadcastVectorMask(ArrayRef<int> Mask) {
		for (unsigned i = 0; i < Mask.size(); ++i)
		if (Mask[i] > 0)
		return false;
		return true;
		}

static bool isAlternateVectorMask(ArrayRef<int> Mask) {		static bool isAlternateVectorMask(ArrayRef<int> Mask) {
bool isAlternate = true;		bool isAlternate = true;
unsigned MaskSize = Mask.size();		unsigned MaskSize = Mask.size();

// Example: shufflevector A, B, <0,5,2,7>		// Example: shufflevector A, B, <0,5,2,7>
for (unsigned i = 0; i < MaskSize && isAlternate; ++i) {		for (unsigned i = 0; i < MaskSize && isAlternate; ++i) {
if (Mask[i] < 0)		if (Mask[i] < 0)
continue;		continue;
▲ Show 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	case Instruction::ShuffleVector: {

if (NumVecElems == Mask.size()) {		if (NumVecElems == Mask.size()) {
if (isReverseVectorMask(Mask))		if (isReverseVectorMask(Mask))
return TTI->getShuffleCost(TargetTransformInfo::SK_Reverse, VecTypOp0,		return TTI->getShuffleCost(TargetTransformInfo::SK_Reverse, VecTypOp0,
0, nullptr);		0, nullptr);
if (isAlternateVectorMask(Mask))		if (isAlternateVectorMask(Mask))
return TTI->getShuffleCost(TargetTransformInfo::SK_Alternate,		return TTI->getShuffleCost(TargetTransformInfo::SK_Alternate,
VecTypOp0, 0, nullptr);		VecTypOp0, 0, nullptr);

		if (isZeroEltBroadcastVectorMask(Mask))
		return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast,
		VecTypOp0, 0, nullptr);

		if (isSingleSourceVectorMask(Mask))
		return TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
		VecTypOp0, 0, nullptr);

		return TTI->getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc,
		VecTypOp0, 0, nullptr);
}		}

return -1;		return -1;
}		}
case Instruction::Call:		case Instruction::Call:
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
SmallVector<Value *, 4> Args;		SmallVector<Value *, 4> Args;
for (unsigned J = 0, JE = II->getNumArgOperands(); J != JE; ++J)		for (unsigned J = 0, JE = II->getNumArgOperands(); J != JE; ++J)
Show All 32 Lines

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	public:

int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> Tys, FastMathFlags FMF);		ArrayRef<Type *> Tys, FastMathFlags FMF);
int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF);		ArrayRef<Value *> Args, FastMathFlags FMF);

int getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwiseForm);		int getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwiseForm);

		int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
		unsigned Factor, ArrayRef<unsigned> Indices,
		unsigned Alignment, unsigned AddressSpace);
		int getInterleavedMemoryOpCostAVX512(unsigned Opcode, Type *VecTy,
		unsigned Factor, ArrayRef<unsigned> Indices,
		unsigned Alignment, unsigned AddressSpace);

int getIntImmCost(int64_t);		int getIntImmCost(int64_t);

int getIntImmCost(const APInt &Imm, Type *Ty);		int getIntImmCost(const APInt &Imm, Type *Ty);

int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);		int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);
bool isLegalMaskedLoad(Type *DataType);		bool isLegalMaskedLoad(Type *DataType);
Show All 19 Lines

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	if (const auto *Entry = CostTableLookup(SSE1FloatCostTable, ISD,
LT.second))		LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
// Fallback to the default implementation.		// Fallback to the default implementation.
return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info);		return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info);
}		}

int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
// We only estimate the cost of reverse and alternate shuffles.
if (Kind != TTI::SK_Reverse && Kind != TTI::SK_Alternate)
return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);

if (Kind == TTI::SK_Reverse) {		if (Kind == TTI::SK_Reverse) {
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);

static const CostTblEntry AVX512VBMIShuffleTbl[] = {		static const CostTblEntry AVX512VBMIShuffleTbl[] = {
{ ISD::VECTOR_SHUFFLE, MVT::v64i8, 1 }, // vpermb		{ ISD::VECTOR_SHUFFLE, MVT::v64i8, 1 }, // vpermb
{ ISD::VECTOR_SHUFFLE, MVT::v32i8, 1 } // vpermb		{ ISD::VECTOR_SHUFFLE, MVT::v32i8, 1 } // vpermb
};		};
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	if (Kind == TTI::SK_Reverse) {
static const CostTblEntry SSE1ShuffleTbl[] = {		static const CostTblEntry SSE1ShuffleTbl[] = {
{ ISD::VECTOR_SHUFFLE, MVT::v4f32, 1 }, // shufps		{ ISD::VECTOR_SHUFFLE, MVT::v4f32, 1 }, // shufps
};		};

if (ST->hasSSE1())		if (ST->hasSSE1())
if (const auto *Entry =		if (const auto *Entry =
CostTableLookup(SSE1ShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second))		CostTableLookup(SSE1ShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
}

if (Kind == TTI::SK_Alternate) {		} else if (Kind == TTI::SK_Alternate) {
// 64-bit packed float vectors (v2f32) are widened to type v4f32.		// 64-bit packed float vectors (v2f32) are widened to type v4f32.
// 64-bit packed integer vectors (v2i32) are promoted to type v2i64.		// 64-bit packed integer vectors (v2i32) are promoted to type v2i64.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);

// The backend knows how to generate a single VEX.256 version of		// The backend knows how to generate a single VEX.256 version of
// instruction VPBLENDW if the target supports AVX2.		// instruction VPBLENDW if the target supports AVX2.
if (ST->hasAVX2() && LT.second == MVT::v16i16)		if (ST->hasAVX2() && LT.second == MVT::v16i16)
return LT.first;		return LT.first;
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	static const CostTblEntry SSEAltShuffleTbl[] = {
// 8 x (pinsrw + pextrw + and + movb + movzb + or)		// 8 x (pinsrw + pextrw + and + movb + movzb + or)
{ISD::VECTOR_SHUFFLE, MVT::v16i8, 48}		{ISD::VECTOR_SHUFFLE, MVT::v16i8, 48}
};		};

// Fall-back (SSE3 and SSE2).		// Fall-back (SSE3 and SSE2).
if (const auto *Entry = CostTableLookup(SSEAltShuffleTbl,		if (const auto *Entry = CostTableLookup(SSEAltShuffleTbl,
ISD::VECTOR_SHUFFLE, LT.second))		ISD::VECTOR_SHUFFLE, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);
		} else if (Kind == TTI::SK_PermuteTwoSrc) {
		// We assume that source and destination have the same vector type.
		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
		int NumOfDests = LT.first;
		int NumOfShufflesPerDest = LT.first * 2 - 1;
		int NumOfShuffles = NumOfDests * NumOfShufflesPerDest;

		static const CostTblEntry AVX512VBMIShuffleTbl[] = {
		{ISD::VECTOR_SHUFFLE, MVT::v64i8, 1}, // vpermt2b
		{ISD::VECTOR_SHUFFLE, MVT::v32i8, 1}, // vpermt2b
		{ISD::VECTOR_SHUFFLE, MVT::v16i8, 1} // vpermt2b
		};

		if (ST->hasVBMI())
		if (const auto *Entry = CostTableLookup(AVX512VBMIShuffleTbl,
		ISD::VECTOR_SHUFFLE, LT.second))
		return NumOfShuffles * Entry->Cost;

		static const CostTblEntry AVX512BWShuffleTbl[] = {
		{ISD::VECTOR_SHUFFLE, MVT::v32i16, 1}, // vpermt2w
		{ISD::VECTOR_SHUFFLE, MVT::v16i16, 1}, // vpermt2w
		{ISD::VECTOR_SHUFFLE, MVT::v8i16, 1}, // vpermt2w
		{ISD::VECTOR_SHUFFLE, MVT::v32i8, 3}, // zext + vpermt2w + trunc
		{ISD::VECTOR_SHUFFLE, MVT::v64i8, 19}, // 6 * v32i8 + 1
		{ISD::VECTOR_SHUFFLE, MVT::v16i8, 3} // zext + vpermt2w + trunc
		};

		if (ST->hasBWI())
		if (const auto *Entry = CostTableLookup(AVX512BWShuffleTbl,
		ISD::VECTOR_SHUFFLE, LT.second))
		return NumOfShuffles * Entry->Cost;

		static const CostTblEntry AVX512ShuffleTbl[] = {
		{ISD::VECTOR_SHUFFLE, MVT::v8f64, 1}, // vpermt2pd
		{ISD::VECTOR_SHUFFLE, MVT::v16f32, 1}, // vpermt2ps
		{ISD::VECTOR_SHUFFLE, MVT::v8i64, 1}, // vpermt2q
		{ISD::VECTOR_SHUFFLE, MVT::v16i32, 1}, // vpermt2d
		{ISD::VECTOR_SHUFFLE, MVT::v4f64, 1}, // vpermt2pd
		{ISD::VECTOR_SHUFFLE, MVT::v8f32, 1}, // vpermt2ps
		{ISD::VECTOR_SHUFFLE, MVT::v4i64, 1}, // vpermt2q
		{ISD::VECTOR_SHUFFLE, MVT::v8i32, 1}, // vpermt2d
		{ISD::VECTOR_SHUFFLE, MVT::v2f64, 1}, // vpermt2pd
		{ISD::VECTOR_SHUFFLE, MVT::v4f32, 1}, // vpermt2ps
		{ISD::VECTOR_SHUFFLE, MVT::v2i64, 1}, // vpermt2q
		{ISD::VECTOR_SHUFFLE, MVT::v4i32, 1} // vpermt2d
		};

		if (ST->hasAVX512())
		if (const auto *Entry =
		CostTableLookup(AVX512ShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second))
		return NumOfShuffles * Entry->Cost;

		} else if (Kind == TTI::SK_PermuteSingleSrc) {
		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
		if (LT.first == 1) {

		static const CostTblEntry AVX512VBMIShuffleTbl[] = {
		{ISD::VECTOR_SHUFFLE, MVT::v64i8, 1}, // vpermb
		{ISD::VECTOR_SHUFFLE, MVT::v32i8, 1} // vpermb
		};

		if (ST->hasVBMI())
		if (const auto *Entry = CostTableLookup(AVX512VBMIShuffleTbl,
		ISD::VECTOR_SHUFFLE, LT.second))
		return Entry->Cost;

		static const CostTblEntry AVX512BWShuffleTbl[] = {
		{ISD::VECTOR_SHUFFLE, MVT::v32i16, 1}, // vpermw
		{ISD::VECTOR_SHUFFLE, MVT::v16i16, 1}, // vpermw
		{ISD::VECTOR_SHUFFLE, MVT::v8i16, 1}, // vpermw
		{ISD::VECTOR_SHUFFLE, MVT::v64i8, 8}, // extend to v32i16
		{ISD::VECTOR_SHUFFLE, MVT::v32i8, 3} // vpermw + zext/trunc
		};

		if (ST->hasBWI())
		if (const auto *Entry = CostTableLookup(AVX512BWShuffleTbl,
		ISD::VECTOR_SHUFFLE, LT.second))
		return Entry->Cost;

		static const CostTblEntry AVX512ShuffleTbl[] = {
		{ISD::VECTOR_SHUFFLE, MVT::v8f64, 1}, // vpermpd
		{ISD::VECTOR_SHUFFLE, MVT::v4f64, 1}, // vpermpd
		{ISD::VECTOR_SHUFFLE, MVT::v2f64, 1}, // vpermpd
		{ISD::VECTOR_SHUFFLE, MVT::v16f32, 1}, // vpermps
		{ISD::VECTOR_SHUFFLE, MVT::v8f32, 1}, // vpermps
		{ISD::VECTOR_SHUFFLE, MVT::v4f32, 1}, // vpermps
		{ISD::VECTOR_SHUFFLE, MVT::v8i64, 1}, // vpermq
		{ISD::VECTOR_SHUFFLE, MVT::v4i64, 1}, // vpermq
		{ISD::VECTOR_SHUFFLE, MVT::v2i64, 1}, // vpermq
		{ISD::VECTOR_SHUFFLE, MVT::v16i32, 1}, // vpermd
		{ISD::VECTOR_SHUFFLE, MVT::v8i32, 1}, // vpermd
		{ISD::VECTOR_SHUFFLE, MVT::v4i32, 1}, // vpermd
		{ISD::VECTOR_SHUFFLE, MVT::v16i8, 1} // pshufb
		};

		if (ST->hasAVX512())
		if (const auto *Entry =
		CostTableLookup(AVX512ShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second))
		return Entry->Cost;

		} else {
		// We are going to permute multiple sources and the result will be in
		// multiple destinations. Providing an accurate cost only for splits where
		// the element type remains the same.

		MVT LegalVT = LT.second;
		if (LegalVT.getVectorElementType().getSizeInBits() ==
		Tp->getVectorElementType()->getPrimitiveSizeInBits() &&
		LegalVT.getVectorNumElements() < Tp->getVectorNumElements()) {

		unsigned VecTySize = DL.getTypeStoreSize(Tp);
		unsigned LegalVTSize = LegalVT.getStoreSize();
		// Number of source vectors after legalization:
		unsigned NumOfSrcs = (VecTySize + LegalVTSize - 1) / LegalVTSize;
		// Number of destination vectors after legalization:
		unsigned NumOfDests = LT.first;

		Type *SingleOpTy = VectorType::get(Tp->getVectorElementType(),
		LegalVT.getVectorNumElements());

		unsigned NumOfShuffles = (NumOfSrcs - 1) * NumOfDests;
		return NumOfShuffles *
		getShuffleCost(TTI::SK_PermuteTwoSrc, SingleOpTy, 0, nullptr);
		}
		}
}		}

return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);		return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);
}		}

int X86TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src) {		int X86TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src) {
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");
▲ Show 20 Lines • Show All 1,133 Lines • ▼ Show 20 Lines	int X86TTIImpl::getGatherScatterOpCost(unsigned Opcode, Type *SrcVTy,
bool Scalarize = false;		bool Scalarize = false;
if ((Opcode == Instruction::Load && !isLegalMaskedGather(SrcVTy)) \|\|		if ((Opcode == Instruction::Load && !isLegalMaskedGather(SrcVTy)) \|\|
(Opcode == Instruction::Store && !isLegalMaskedScatter(SrcVTy)))		(Opcode == Instruction::Store && !isLegalMaskedScatter(SrcVTy)))
Scalarize = true;		Scalarize = true;
// Gather / Scatter for vector 2 is not profitable on KNL / SKX		// Gather / Scatter for vector 2 is not profitable on KNL / SKX
// Vector-4 of gather/scatter instruction does not exist on KNL.		// Vector-4 of gather/scatter instruction does not exist on KNL.
// We can extend it to 8 elements, but zeroing upper bits of		// We can extend it to 8 elements, but zeroing upper bits of
// the mask vector will add more instructions. Right now we give the scalar		// the mask vector will add more instructions. Right now we give the scalar
// cost of vector-4 for KNL. TODO: Check, maybe the gather/scatter instruction is		// cost of vector-4 for KNL. TODO: Check, maybe the gather/scatter instruction
// better in the VariableMask case.		// is better in the VariableMask case.
if (VF == 2 \|\| (VF == 4 && !ST->hasVLX()))		if (VF == 2 \|\| (VF == 4 && !ST->hasVLX()))
Scalarize = true;		Scalarize = true;

if (Scalarize)		if (Scalarize)
return getGSScalarCost(Opcode, SrcVTy, VariableMask, Alignment, AddressSpace);		return getGSScalarCost(Opcode, SrcVTy, VariableMask, Alignment,
		AddressSpace);

return getGSVectorCost(Opcode, SrcVTy, Ptr, Alignment, AddressSpace);		return getGSVectorCost(Opcode, SrcVTy, Ptr, Alignment, AddressSpace);
}		}

bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy) {		bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy) {
Type *ScalarTy = DataTy->getScalarType();		Type *ScalarTy = DataTy->getScalarType();
int DataWidth = isa<PointerType>(ScalarTy) ?		int DataWidth = isa<PointerType>(ScalarTy) ?
DL.getPointerSizeInBits() : ScalarTy->getPrimitiveSizeInBits();		DL.getPointerSizeInBits() : ScalarTy->getPrimitiveSizeInBits();
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
}		}

bool X86TTIImpl::enableInterleavedAccessVectorization() {		bool X86TTIImpl::enableInterleavedAccessVectorization() {
// TODO: We expect this to be beneficial regardless of arch,		// TODO: We expect this to be beneficial regardless of arch,
// but there are currently some unexplained performance artifacts on Atom.		// but there are currently some unexplained performance artifacts on Atom.
// As a temporary solution, disable on Atom.		// As a temporary solution, disable on Atom.
return !(ST->isAtom() \|\| ST->isSLM());		return !(ST->isAtom() \|\| ST->isSLM());
}		}

		// Get estimation for interleaved load/store operations and strided load.
		// \p Indices contains indices for strided load.
		// \p Factor - the factor of interleaving.
		// AVX-512 provides 3-src shuffles that significantly reduces the cost.
		int X86TTIImpl::getInterleavedMemoryOpCostAVX512(unsigned Opcode, Type *VecTy,
		unsigned Factor,
		ArrayRef<unsigned> Indices,
		unsigned Alignment,
		unsigned AddressSpace) {

		// VecTy for interleave memop is <VF*Factor x Elt>.
		// So, for VF=4, Interleave Factor = 3, Element type = i32 we have
		// VecTy = <12 x i32>.

		// Calculate the number of memory operations (NumOfMemOps), required
		// for load/store the VecTy.
		MVT LegalVT = getTLI()->getTypeLegalizationCost(DL, VecTy).second;
		unsigned VecTySize = DL.getTypeStoreSize(VecTy);
		unsigned LegalVTSize = LegalVT.getStoreSize();
		unsigned NumOfMemOps = (VecTySize + LegalVTSize - 1) / LegalVTSize;

		// Get the cost of one memory operation.
		Type *SingleMemOpTy = VectorType::get(VecTy->getVectorElementType(),
		LegalVT.getVectorNumElements());
		unsigned MemOpCost =
		getMemoryOpCost(Opcode, SingleMemOpTy, Alignment, AddressSpace);

		if (Opcode == Instruction::Load) {
		// Kind of shuffle depends on number of loaded values.
		// If we load the entire data in one register, we can use a 1-src shuffle.
		// Otherwise, we'll merge 2 sources in each operation.
		TTI::ShuffleKind ShuffleKind =
		(NumOfMemOps > 1) ? TTI::SK_PermuteTwoSrc : TTI::SK_PermuteSingleSrc;

		unsigned ShuffleCost =
		getShuffleCost(ShuffleKind, SingleMemOpTy, 0, nullptr);

		unsigned NumOfLoadsInInterleaveGrp =
		Indices.size() ? Indices.size() : Factor;
		Type *ResultTy = VectorType::get(VecTy->getVectorElementType(),
		VecTy->getVectorNumElements() / Factor);
		unsigned NumOfResults =
		getTLI()->getTypeLegalizationCost(DL, ResultTy).first *
		NumOfLoadsInInterleaveGrp;

		// About a half of the loads may be folded in shuffles when we have only
		// one result. If we have more than one result, we do not fold loads at all.
		unsigned NumOfUnfoldedLoads =
		NumOfResults > 1 ? NumOfMemOps : NumOfMemOps / 2;

		// Get a number of shuffle operations per result.
		unsigned NumOfShufflesPerResult =
		std::max((unsigned)1, (unsigned)(NumOfMemOps - 1));

		// The SK_MergeTwoSrc shuffle clobbers one of src operands.
		// When we have more than one destination, we need additional instructions
		// to keep sources.
		unsigned NumOfMoves = 0;
		if (NumOfResults > 1 && ShuffleKind == TTI::SK_PermuteTwoSrc)
		NumOfMoves = NumOfResults * NumOfShufflesPerResult / 2;

		int Cost = NumOfResults * NumOfShufflesPerResult * ShuffleCost +
		NumOfUnfoldedLoads * MemOpCost + NumOfMoves;

		return Cost;
		}

		// Store.
		assert(Opcode == Instruction::Store &&
		"Expected Store Instruction at this point");

		// There is no strided stores meanwhile. And store can't be folded in
		// shuffle.
		unsigned NumOfSources = Factor; // The number of values to be merged.
		unsigned ShuffleCost =
		getShuffleCost(TTI::SK_PermuteTwoSrc, SingleMemOpTy, 0, nullptr);
		unsigned NumOfShufflesPerStore = NumOfSources - 1;

		// The SK_MergeTwoSrc shuffle clobbers one of src operands.
		// We need additional instructions to keep sources.
		unsigned NumOfMoves = NumOfMemOps * NumOfShufflesPerStore / 2;
		int Cost = NumOfMemOps * (MemOpCost + NumOfShufflesPerStore * ShuffleCost) +
		NumOfMoves;
		return Cost;
		}

		int X86TTIImpl::getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
		unsigned Factor,
		ArrayRef<unsigned> Indices,
		unsigned Alignment,
		unsigned AddressSpace) {
		auto isSupportedOnAVX512 = [](Type *VecTy, bool &RequiresBW) {
		RequiresBW = false;
		Type *EltTy = VecTy->getVectorElementType();
		if (EltTy->isFloatTy() \|\| EltTy->isDoubleTy() \|\| EltTy->isIntegerTy(64) \|\|
		EltTy->isIntegerTy(32) \|\| EltTy->isPointerTy())
		return true;
		if (EltTy->isIntegerTy(16) \|\| EltTy->isIntegerTy(8)) {
		RequiresBW = true;
		return true;
		}
		return false;
		};
		bool RequiresBW;
		bool HasAVX512Solution = isSupportedOnAVX512(VecTy, RequiresBW);
		if (ST->hasAVX512() && HasAVX512Solution && (!RequiresBW \|\| ST->hasBWI()))
		return getInterleavedMemoryOpCostAVX512(Opcode, VecTy, Factor, Indices,
		Alignment, AddressSpace);
		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
		Alignment, AddressSpace);
		}

llvm/trunk/test/Analysis/CostModel/X86/interleave-load-i32.ll

Property	Old Value	New Value
svn:executable	null	*

				; REQUIRES: asserts
				; RUN: opt -loop-vectorize -S -mcpu=skx --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = global [10240 x i32] zeroinitializer, align 16
				@B = global [10240 x i32] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @load_i32_interleave4() {
				;CHECK-LABEL: load_i32_interleave4
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %0 = load
				;CHECK: Found an estimated cost of 5 for VF 2 For instruction: %0 = load
				;CHECK: Found an estimated cost of 5 for VF 4 For instruction: %0 = load
				;CHECK: Found an estimated cost of 8 for VF 8 For instruction: %0 = load
				;CHECK: Found an estimated cost of 22 for VF 16 For instruction: %0 = load
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 16
				%1 = or i64 %indvars.iv, 1
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %1
				%2 = load i32, i32* %arrayidx2, align 4
				%add3 = add nsw i32 %2, %0
				%3 = or i64 %indvars.iv, 2
				%arrayidx6 = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %3
				%4 = load i32, i32* %arrayidx6, align 8
				%add7 = add nsw i32 %add3, %4
				%5 = or i64 %indvars.iv, 3
				%arrayidx10 = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %5
				%6 = load i32, i32* %arrayidx10, align 4
				%add11 = add nsw i32 %add7, %6
				%arrayidx13 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %add11, i32* %arrayidx13, align 16
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 4
				%cmp = icmp slt i64 %indvars.iv.next, 1024
				br i1 %cmp, label %for.body, label %for.cond.cleanup
				}

				define void @load_i32_interleave5() {
				;CHECK-LABEL: load_i32_interleave5
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %0 = load
				;CHECK: Found an estimated cost of 6 for VF 2 For instruction: %0 = load
				;CHECK: Found an estimated cost of 9 for VF 4 For instruction: %0 = load
				;CHECK: Found an estimated cost of 18 for VF 8 For instruction: %0 = load
				;CHECK: Found an estimated cost of 35 for VF 16 For instruction: %0 = load
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%1 = add nuw nsw i64 %indvars.iv, 1
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %1
				%2 = load i32, i32* %arrayidx2, align 4
				%add3 = add nsw i32 %2, %0
				%3 = add nuw nsw i64 %indvars.iv, 2
				%arrayidx6 = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %3
				%4 = load i32, i32* %arrayidx6, align 4
				%add7 = add nsw i32 %add3, %4
				%5 = add nuw nsw i64 %indvars.iv, 3
				%arrayidx10 = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %5
				%6 = load i32, i32* %arrayidx10, align 4
				%add11 = add nsw i32 %add7, %6
				%7 = add nuw nsw i64 %indvars.iv, 4
				%arrayidx14 = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %7
				%8 = load i32, i32* %arrayidx14, align 4
				%add15 = add nsw i32 %add11, %8
				%arrayidx17 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %add15, i32* %arrayidx17, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 5
				%cmp = icmp slt i64 %indvars.iv.next, 1024
				br i1 %cmp, label %for.body, label %for.cond.cleanup
				}

llvm/trunk/test/Analysis/CostModel/X86/interleave-store-i32.ll

Property	Old Value	New Value
svn:executable	null	*

				; REQUIRES: asserts
				; RUN: opt -loop-vectorize -S -mcpu=skx --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = global [10240 x i32] zeroinitializer, align 16
				@B = global [10240 x i32] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @store_i32_interleave4() {
				;CHECK-LABEL: store_i32_interleave4
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add16
				;CHECK: Found an estimated cost of 5 for VF 2 For instruction: store i32 %add16
				;CHECK: Found an estimated cost of 5 for VF 4 For instruction: store i32 %add16
				;CHECK: Found an estimated cost of 11 for VF 8 For instruction: store i32 %add16
				;CHECK: Found an estimated cost of 22 for VF 16 For instruction: store i32 %add16
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 16
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %0, i32* %arrayidx2, align 16
				%add = add nsw i32 %0, 1
				%1 = or i64 %indvars.iv, 1
				%arrayidx7 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %1
				store i32 %add, i32* %arrayidx7, align 4
				%add10 = add nsw i32 %0, 2
				%2 = or i64 %indvars.iv, 2
				%arrayidx13 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %2
				store i32 %add10, i32* %arrayidx13, align 8
				%add16 = add nsw i32 %0, 3
				%3 = or i64 %indvars.iv, 3
				%arrayidx19 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %3
				store i32 %add16, i32* %arrayidx19, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 4
				%cmp = icmp slt i64 %indvars.iv.next, 1024
				br i1 %cmp, label %for.body, label %for.cond.cleanup
				}

				define void @store_i32_interleave5() {
				;CHECK-LABEL: store_i32_interleave5
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add22
				;CHECK: Found an estimated cost of 7 for VF 2 For instruction: store i32 %add22
				;CHECK: Found an estimated cost of 14 for VF 4 For instruction: store i32 %add22
				;CHECK: Found an estimated cost of 21 for VF 8 For instruction: store i32 %add22
				;CHECK: Found an estimated cost of 35 for VF 16 For instruction: store i32 %add22
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %0, i32* %arrayidx2, align 4
				%add = add nsw i32 %0, 1
				%1 = add nuw nsw i64 %indvars.iv, 1
				%arrayidx7 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %1
				store i32 %add, i32* %arrayidx7, align 4
				%add10 = add nsw i32 %0, 2
				%2 = add nuw nsw i64 %indvars.iv, 2
				%arrayidx13 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %2
				store i32 %add10, i32* %arrayidx13, align 4
				%add16 = add nsw i32 %0, 3
				%3 = add nuw nsw i64 %indvars.iv, 3
				%arrayidx19 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %3
				store i32 %add16, i32* %arrayidx19, align 4
				%add22 = add nsw i32 %0, 4
				%4 = add nuw nsw i64 %indvars.iv, 4
				%arrayidx25 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %4
				store i32 %add22, i32* %arrayidx25, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 5
				%cmp = icmp slt i64 %indvars.iv.next, 1024
				br i1 %cmp, label %for.body, label %for.cond.cleanup
				}

llvm/trunk/test/Analysis/CostModel/X86/shuffle-broadcast.ll

	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s -check-prefix=CHECK -check-prefix=SSE -check-prefix=SSE2			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s -check-prefix=CHECK -check-prefix=SSE -check-prefix=SSE2
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+ssse3 \| FileCheck %s -check-prefix=CHECK -check-prefix=SSE -check-prefix=SSSE3			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+ssse3 \| FileCheck %s -check-prefix=CHECK -check-prefix=SSE -check-prefix=SSSE3
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+sse4.2 \| FileCheck %s -check-prefix=CHECK -check-prefix=SSE -check-prefix=SSE42			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+sse4.2 \| FileCheck %s -check-prefix=CHECK -check-prefix=SSE -check-prefix=SSE42
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s -check-prefix=CHECK -check-prefix=AVX -check-prefix=AVX1			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s -check-prefix=CHECK -check-prefix=AVX -check-prefix=AVX1
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s -check-prefix=CHECK -check-prefix=AVX -check-prefix=AVX2			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s -check-prefix=CHECK -check-prefix=AVX -check-prefix=AVX2
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512F			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512F
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512BW			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512BW

	;			;
	; Verify the cost model for broadcast shuffles.			; Verify the cost model for broadcast shuffles.
	;			;

	; CHECK-LABEL: 'test_vXf64'			; CHECK-LABEL: 'test_vXf64'
	define void @test_vXf64(<2 x double> %src128, <4 x double> %src256, <8 x double> %src512) {			define void @test_vXf64(<2 x double> %src128, <4 x double> %src256, <8 x double> %src512) {
	; SSE: Unknown cost {{.*}} %V128 = shufflevector			; SSE: cost of 1 {{.*}} %V128 = shufflevector
	; AVX: Unknown cost {{.*}} %V128 = shufflevector			; AVX: cost of 1 {{.*}} %V128 = shufflevector
	; AVX512: Unknown cost {{.*}} %V128 = shufflevector			; AVX512: cost of 1 {{.*}} %V128 = shufflevector
	%V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> zeroinitializer			%V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> zeroinitializer

	; SSE: Unknown cost {{.*}} %V256 = shufflevector			; SSE: cost of 1 {{.*}} %V256 = shufflevector
	; AVX: Unknown cost {{.*}} %V256 = shufflevector			; AVX: cost of 1 {{.*}} %V256 = shufflevector
	; AVX512: Unknown cost {{.*}} %V256 = shufflevector			; AVX512: cost of 1 {{.*}} %V256 = shufflevector
	%V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> zeroinitializer			%V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> zeroinitializer

	; SSE: Unknown cost {{.*}} %V512 = shufflevector			; SSE: cost of 1 {{.*}} %V512 = shufflevector
	; AVX: Unknown cost {{.*}} %V512 = shufflevector			; AVX: cost of 1 {{.*}} %V512 = shufflevector
	; AVX512: Unknown cost {{.*}} %V512 = shufflevector			; AVX512: cost of 1 {{.*}} %V512 = shufflevector
	%V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> zeroinitializer			%V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> zeroinitializer

	ret void			ret void
	}			}

llvm/trunk/test/Analysis/CostModel/X86/shuffle-single-src.ll

				; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake-avx512 \| FileCheck %s --check-prefix=SKX

				;
				; Verify the cost model for 1 src shuffles
				;

				; SKX-LABEL: 'test_vXf64'
				define void @test_vXf64(<4 x double> %src256, <8 x double> %src512, <16 x double> %src1024) {
				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 2 {{.*}} %V1024 = shufflevector
				%V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				ret void
				}

				; SKX-LABEL: 'test_vXi64'
				define void @test_vXi64(<4 x i64> %src256, <8 x i64> %src512) {

				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>

				ret void
				}

				; CHECK-LABEL: 'test_vXf32'
				define void @test_vXf32(<4 x float> %src128, <8 x float> %src256, <16 x float> %src512) {

				; SKX: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				ret void
				}

				; CHECK-LABEL: 'test_vXi32'
				define void @test_vXi32(<4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512, <32 x i32> %src1024) {

				; SKX: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 2 {{.*}} %V1024 = shufflevector
				%V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				ret void
				}

				; CHECK-LABEL: 'test_vXi16'
				define void @test_vXi16(<8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512, <64 x i16> %src1024) {

				; SKX: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 2 {{.*}} %V1024 = shufflevector
				%V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				ret void
				}

				; CHECK-LABEL: 'test_vXi8'
				define void @test_vXi8(<16 x i8> %src128, <32 x i8> %src256, <64 x i8> %src512) {
				; SKX: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 3 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 8 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				ret void
				}

llvm/trunk/test/Analysis/CostModel/X86/shuffle-two-src.ll

				; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake-avx512 \| FileCheck %s --check-prefix=SKX

				;
				; Verify the cost model for 2 src shuffles
				;

				; SKX-LABEL: 'test_vXf64'
				define void @test_vXf64(<4 x double> %src256, <8 x double> %src512, <16 x double> %src1024, <4 x double> %src256_1, <8 x double> %src512_1, <16 x double> %src1024_1) {
				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <4 x double> %src256, <4 x double> %src256_1, <4 x i32> <i32 3, i32 3, i32 7, i32 6>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <8 x double> %src512, <8 x double> %src512_1, <8 x i32> <i32 7, i32 6, i32 12, i32 4, i32 3, i32 2, i32 1, i32 15>

				; SKX: cost of 6 {{.*}} %V1024 = shufflevector
				%V1024 = shufflevector <16 x double> %src1024, <16 x double> %src1024_1, <16 x i32> <i32 30, i32 14, i32 13, i32 12, i32 13, i32 10, i32 18, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				ret void
				}

				; CHECK-LABEL: 'test_vXf32'
				define void @test_vXf32(<4 x float> %src128, <8 x float> %src256, <16 x float> %src512, <32 x float> %src1024, <4 x float> %src128_1, <8 x float> %src256_1, <16 x float> %src512_1, <32 x float> %src1024_1) {

				; SKX: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <4 x float> %src128, <4 x float> %src128_1, <4 x i32> <i32 3, i32 6, i32 1, i32 5>

				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <8 x float> %src256, <8 x float> %src256_1, <8 x i32> <i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 12, i32 0>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <16 x float> %src512, <16 x float> %src512_1, <16 x i32> <i32 15, i32 17, i32 13, i32 20, i32 11, i32 10, i32 8, i32 8, i32 7, i32 22, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 6 {{.*}} %V1024 = shufflevector
				%V1024 = shufflevector <32 x float> %src1024, <32 x float> %src1024_1, <32 x i32> <i32 31, i32 33, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 48, i32 13, i32 12, i32 11, i32 11, i32 9, i32 45, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				ret void
				}

				; CHECK-LABEL: 'test_vXi16'
				define void @test_vXi16(<8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512, <64 x i16> %src1024, <8 x i16> %src128_1, <16 x i16> %src256_1, <32 x i16> %src512_1, <64 x i16> %src1024_1) {

				; SKX: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <8 x i16> %src128, <8 x i16> %src128_1, <8 x i32> <i32 7, i32 6, i32 6, i32 8, i32 9, i32 2, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <16 x i16> %src256, <16 x i16> %src256_1, <16 x i32> <i32 15, i32 14, i32 13, i32 20, i32 21, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <32 x i16> %src512, <32 x i16> %src512_1, <32 x i32> <i32 31, i32 30, i32 45, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 38, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 6 {{.*}} %V1024 = shufflevector
				%V1024 = shufflevector <64 x i16> %src1024, <64 x i16> %src1024_1, <64 x i32> <i32 63, i32 62, i32 71, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 66, i32 2, i32 1, i32 0>
				ret void
				}

				; CHECK-LABEL: 'test_vXi8'
				define void @test_vXi8(<16 x i8> %src128, <32 x i8> %src256, <64 x i8> %src512, <16 x i8> %src128_1, <32 x i8> %src256_1, <64 x i8> %src512_1) {
				; SKX: cost of 3 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <16 x i8> %src128, <16 x i8> %src128_1, <16 x i32> <i32 29, i32 14, i32 28, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 3 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <32 x i8> %src256, <32 x i8> %src256_1, <32 x i32> <i32 31, i32 30, i32 45, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>

				; SKX: cost of 19 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <64 x i8> %src512, <64 x i8> %src512_1, <64 x i32> <i32 63, i32 100, i32 61, i32 96, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

				ret void
				}

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i16.ll

Property	Old Value	New Value
svn:executable	null	*

				; REQUIRES: asserts
				; RUN: opt -loop-vectorize -S -mcpu=skx --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = global [10240 x i16] zeroinitializer, align 16
				@B = global [10240 x i16] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @load_i16_stride2() {
				;CHECK-LABEL: load_i16_stride2
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 32 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = shl nsw i64 %indvars.iv, 1
				%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
				%1 = load i16, i16* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv
				store i16 %1, i16* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i16_stride3() {
				;CHECK-LABEL: load_i16_stride3
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 32 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 3
				%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
				%1 = load i16, i16* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv
				store i16 %1, i16* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i16_stride4() {
				;CHECK-LABEL: load_i16_stride4
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 5 for VF 32 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = shl nsw i64 %indvars.iv, 2
				%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
				%1 = load i16, i16* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv
				store i16 %1, i16* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i16_stride5() {
				;CHECK-LABEL: load_i16_stride5
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 6 for VF 32 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 5
				%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
				%1 = load i16, i16* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv
				store i16 %1, i16* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i32.ll

Property	Old Value	New Value
svn:executable	null	*

				; REQUIRES: asserts
				; RUN: opt -loop-vectorize -S -mcpu=skx --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = global [10240 x i32] zeroinitializer, align 16
				@B = global [10240 x i32] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @load_int_stride2() {
				;CHECK-LABEL: load_int_stride2
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 16 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = shl nsw i64 %indvars.iv, 1
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
				%1 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %1, i32* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_int_stride3() {
				;CHECK-LABEL: load_int_stride3
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 16 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 3
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
				%1 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %1, i32* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_int_stride4() {
				;CHECK-LABEL: load_int_stride4
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 5 for VF 16 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = shl nsw i64 %indvars.iv, 2
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
				%1 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %1, i32* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_int_stride5() {
				;CHECK-LABEL: load_int_stride5
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 6 for VF 16 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 5
				%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
				%1 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
				store i32 %1, i32* %arrayidx2, align 2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i64.ll

Property	Old Value	New Value
svn:executable	null	*

				; REQUIRES: asserts
				; RUN: opt -loop-vectorize -S -mcpu=skx --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = global [10240 x i64] zeroinitializer, align 16
				@B = global [10240 x i64] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @load_i64_stride2() {
				;CHECK-LABEL: load_i64_stride2
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = shl nsw i64 %indvars.iv, 1
				%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0
				%1 = load i64, i64* %arrayidx, align 16
				%arrayidx2 = getelementptr inbounds [10240 x i64], [10240 x i64]* @B, i64 0, i64 %indvars.iv
				store i64 %1, i64* %arrayidx2, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i64_stride3() {
				;CHECK-LABEL: load_i64_stride3
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 8 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 3
				%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0
				%1 = load i64, i64* %arrayidx, align 16
				%arrayidx2 = getelementptr inbounds [10240 x i64], [10240 x i64]* @B, i64 0, i64 %indvars.iv
				store i64 %1, i64* %arrayidx2, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i64_stride4() {
				;CHECK-LABEL: load_i64_stride4
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 5 for VF 8 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 4
				%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0
				%1 = load i64, i64* %arrayidx, align 16
				%arrayidx2 = getelementptr inbounds [10240 x i64], [10240 x i64]* @B, i64 0, i64 %indvars.iv
				store i64 %1, i64* %arrayidx2, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i8.ll

Property	Old Value	New Value
svn:executable	null	*

				; REQUIRES: asserts
				; RUN: opt -loop-vectorize -S -mcpu=skx --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = global [10240 x i8] zeroinitializer, align 16
				@B = global [10240 x i8] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @load_i8_stride2() {
				;CHECK-LABEL: load_i8_stride2
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 8 for VF 32 For instruction: %1 = load
				;CHECK: Found an estimated cost of 20 for VF 64 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = shl nsw i64 %indvars.iv, 1
				%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
				%1 = load i8, i8* %arrayidx, align 2
				%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv
				store i8 %1, i8* %arrayidx2, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i8_stride3() {
				;CHECK-LABEL: load_i8_stride3
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 8 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 20 for VF 32 For instruction: %1 = load
				;CHECK: Found an estimated cost of 39 for VF 64 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 3
				%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
				%1 = load i8, i8* %arrayidx, align 2
				%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv
				store i8 %1, i8* %arrayidx2, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i8_stride4() {
				;CHECK-LABEL: load_i8_stride4
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 8 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 20 for VF 32 For instruction: %1 = load
				;CHECK: Found an estimated cost of 59 for VF 64 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = shl nsw i64 %indvars.iv, 2
				%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
				%1 = load i8, i8* %arrayidx, align 2
				%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv
				store i8 %1, i8* %arrayidx2, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				define void @load_i8_stride5() {
				;CHECK-LABEL: load_i8_stride5
				;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
				;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
				;CHECK: Found an estimated cost of 3 for VF 4 For instruction: %1 = load
				;CHECK: Found an estimated cost of 8 for VF 8 For instruction: %1 = load
				;CHECK: Found an estimated cost of 20 for VF 16 For instruction: %1 = load
				;CHECK: Found an estimated cost of 39 for VF 32 For instruction: %1 = load
				;CHECK: Found an estimated cost of 78 for VF 64 For instruction: %1 = load
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = mul nsw i64 %indvars.iv, 5
				%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
				%1 = load i8, i8* %arrayidx, align 2
				%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv
				store i8 %1, i8* %arrayidx2, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

AVX-512 cost calculation for interleave load/store patternsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 82798

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

llvm/trunk/lib/Analysis/CostModel.cpp

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/trunk/test/Analysis/CostModel/X86/interleave-load-i32.ll

llvm/trunk/test/Analysis/CostModel/X86/interleave-store-i32.ll

llvm/trunk/test/Analysis/CostModel/X86/shuffle-broadcast.ll

llvm/trunk/test/Analysis/CostModel/X86/shuffle-single-src.ll

llvm/trunk/test/Analysis/CostModel/X86/shuffle-two-src.ll

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i16.ll

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i32.ll

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i64.ll

llvm/trunk/test/Analysis/CostModel/X86/strided-load-i8.ll

AVX-512 cost calculation for interleave load/store patterns
ClosedPublic