This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/X86/
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
gather-i16-with-i8-index.ll
-
gather-i32-with-i8-index.ll
-
gather-i64-with-i8-index.ll
-
gather-i8-with-i8-index.ll
-
interleaved-load-i32-stride-3-indices-0uu.ll
-
scatter-i16-with-i8-index.ll
-
scatter-i32-with-i8-index.ll
-
scatter-i64-with-i8-index.ll
-
scatter-i8-with-i8-index.ll
-
Transforms/LoopVectorize/X86/
-
LoopVectorize/
-
X86/
2/2
gather-cost.ll
-
load-deref-pred.ll
1/1
parallel-loops.ll
-
uniform_mem_op.ll
-
vector_ptr_load_store.ll
-
x86-interleaved-accesses-masked-group.ll
-
x86-interleaved-store-accesses-with-gaps.ll

Differential D111220

[X86][LV][TTI][Costmodel] LoopVectorizer: don't use `TTI::isLegalMaskedGather()` hook, introduce `TTI::shouldUseMaskedGatherForVectorization()`
AbandonedPublic

Authored by lebedev.ri on Oct 6 2021, 4:56 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
fhahn
efriedma
spatel
delena

Summary

On X86, gather/scatter story is sad. Native support appeared only in AVX2,
but even then, only in Skylake and newer their performance is not abysmal.
Even in Zen3 it's rather bad. So X86 says that masked gather/scatter
are not legal (except for +avx512 || +fast-gather),
and ScalarizeMaskedMemIntrin pass expands them.

But at the same time, we can model the cost of the expanded form
of gather/scatter, via X86TTIImpl::getGatherScatterOpCost(),
and most often it's better than the LV's "scalarization" cost,
but since we say the gather is illegal, LV does not even query it's cost.

I think this is not optimal. I propose to add a new TTI hook,
shouldUseMaskedGatherForVectorization(), which defaults to isLegalMaskedGather(),
but is overrided on X86 to unconditionally return true iff no variable mask is needed
(i.e. the gather/scatter sequence will not require branching).

If this makes sense i can follow up with SLP patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 6 2021, 4:56 AM

Herald added subscribers: pengfei, arphaman, hiraditya. · View Herald TranscriptOct 6 2021, 4:56 AM

lebedev.ri requested review of this revision.Oct 6 2021, 4:56 AM

lebedev.ri retitled this revision from [X86][LV][TTi][Costmodel] LoopVectorizer: don't use `TTI::isLegalMaskedGather()` hook, introduce `TTI::shouldUseMaskedGatherForVectorization()` to [X86][LV][TTI][Costmodel] LoopVectorizer: don't use `TTI::isLegalMaskedGather()` hook, introduce `TTI::shouldUseMaskedGatherForVectorization()`.Oct 6 2021, 5:04 AM

Hm, actually, i don't think X86TTIImpl::getGSScalarCost() is right, let's fix that first.

Harbormaster completed remote builds in B127267: Diff 377501.Oct 6 2021, 5:24 AM

Rebased ontop of D111222.

lebedev.ri added a parent revision: D111222: [X86][Costmodel] Fix `X86TTIImpl::getGSScalarCost()`.Oct 6 2021, 8:50 AM

Matt added a subscriber: Matt.Oct 6 2021, 9:12 AM

Was "never" supposed to be a different word in "only in Skylake and never their performance is not abysmal."?

In D111220#3045661, @craig.topper wrote:

Was "never" supposed to be a different word in "only in Skylake and never their performance is not abysmal."?

Right, "newer".

Harbormaster completed remote builds in B127313: Diff 377563.Oct 6 2021, 10:00 AM

RKSimon added inline comments.Oct 6 2021, 12:19 PM

llvm/test/Transforms/LoopVectorize/X86/gather-cost.ll
2	pre-commit the regeneration?
17–18	Does this pass with this still here?
llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll
2	pre-commit?

lebedev.ri mentioned this in rG62d67d9e7c9c: [NFC][X86][LoopVectorize] Autogenerate check lines in a few tests for ease of….Oct 6 2021, 12:54 PM

Rebased over precommitted check line regeneration.

lebedev.ri mentioned this in D111174: [X86][Costmodel] Improve cost modelling for not-fully-interleaved load.Oct 6 2021, 1:51 PM

Harbormaster completed remote builds in B127382: Diff 377664.Oct 6 2021, 3:11 PM

How good is the gather/scatter scalarization at splitting vectorized geps? Doing pointer math on the vector unit (probably using a single splatted base pointer) and then having to extract every pointer is going to be pretty poor compared to scalarizing and using LEA op on x86.

In D111220#3048397, @RKSimon wrote:

How good is the gather/scatter scalarization at splitting vectorized geps? Doing pointer math on the vector unit (probably using a single splatted base pointer) and then having to extract every pointer is going to be pretty poor compared to scalarizing and using LEA op on x86.

Hmm it's not: https://godbolt.org/z/eYzP4ob3G
I guess we might want to:

teach vectorcombine to split vector geps iff all their uses are extracts
schedule additional run of vectorcombine after the scalarizer

But that doesn't affect cost model though, i believe.

In D111220#3048423, @lebedev.ri wrote:

In D111220#3048397, @RKSimon wrote:

How good is the gather/scatter scalarization at splitting vectorized geps? Doing pointer math on the vector unit (probably using a single splatted base pointer) and then having to extract every pointer is going to be pretty poor compared to scalarizing and using LEA op on x86.

Hmm it's not: https://godbolt.org/z/eYzP4ob3G
I guess we might want to:

teach vectorcombine to split vector geps iff all their uses are extracts

schedule additional run of vectorcombine after the scalarizer

But that doesn't affect cost model though, i believe.

Ok, D111363 should deal with most of it,
but i'm not sure it's a hard blocker here.

I think we won't need this after all, i've found the real problem - D111460.

lebedev.ri mentioned this in rG8cd782487fe6: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`.Nov 29 2021, 11:48 PM

D111460 landed.

lebedev.ri mentioned this in D115329: [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter.Dec 13 2021, 6:58 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

33 lines

TargetTransformInfoImpl.h

10 lines

lib/

Analysis/

TargetTransformInfo.cpp

12 lines

Target/

X86/

X86TargetTransformInfo.h

4 lines

X86TargetTransformInfo.cpp

15 lines

Transforms/

Vectorize/

LoopVectorize.cpp

16 lines

test/

Analysis/

CostModel/

X86/

gather-i16-with-i8-index.ll

58 lines

gather-i32-with-i8-index.ll

36 lines

gather-i64-with-i8-index.ll

38 lines

gather-i8-with-i8-index.ll

58 lines

interleaved-load-i32-stride-3-indices-0uu.ll

4 lines

scatter-i16-with-i8-index.ll

58 lines

scatter-i32-with-i8-index.ll

46 lines

scatter-i64-with-i8-index.ll

46 lines

scatter-i8-with-i8-index.ll

58 lines

Transforms/

LoopVectorize/

X86/

130 lines

92 lines

34 lines

76 lines

vector_ptr_load_store.ll

4 lines

x86-interleaved-accesses-masked-group.ll

44 lines

x86-interleaved-store-accesses-with-gaps.ll

48 lines

Diff 377664

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 650 Lines • ▼ Show 20 Lines	public:
/// Return true if the target supports masked load.		/// Return true if the target supports masked load.
bool isLegalMaskedLoad(Type *DataType, Align Alignment) const;		bool isLegalMaskedLoad(Type *DataType, Align Alignment) const;

/// Return true if the target supports nontemporal store.		/// Return true if the target supports nontemporal store.
bool isLegalNTStore(Type *DataType, Align Alignment) const;		bool isLegalNTStore(Type *DataType, Align Alignment) const;
/// Return true if the target supports nontemporal load.		/// Return true if the target supports nontemporal load.
bool isLegalNTLoad(Type *DataType, Align Alignment) const;		bool isLegalNTLoad(Type *DataType, Align Alignment) const;

/// Return true if the target supports masked scatter.		/// Return true if masked gather should be used for vectorization.
bool isLegalMaskedScatter(Type *DataType, Align Alignment) const;		bool shouldUseMaskedGatherForVectorization(Type *DataType, bool VariableMask,
		Align Alignment) const;
		/// Return true if masked scatter should be used for vectorization.
		bool shouldUseMaskedScatterForVectorization(Type *DataType, bool VariableMask,
		Align Alignment) const;

/// Return true if the target supports masked gather.		/// Return true if the target supports masked gather.
bool isLegalMaskedGather(Type *DataType, Align Alignment) const;		bool isLegalMaskedGather(Type *DataType, Align Alignment) const;
		/// Return true if the target supports masked scatter.
		bool isLegalMaskedScatter(Type *DataType, Align Alignment) const;

/// Return true if the target supports masked compress store.		/// Return true if the target supports masked compress store.
bool isLegalMaskedCompressStore(Type *DataType) const;		bool isLegalMaskedCompressStore(Type *DataType) const;
/// Return true if the target supports masked expand load.		/// Return true if the target supports masked expand load.
bool isLegalMaskedExpandLoad(Type *DataType) const;		bool isLegalMaskedExpandLoad(Type *DataType) const;

/// Return true if we should be enabling ordered reductions for the target.		/// Return true if we should be enabling ordered reductions for the target.
bool enableOrderedReductions() const;		bool enableOrderedReductions() const;
▲ Show 20 Lines • Show All 837 Lines • ▼ Show 20 Lines	virtual bool canSaveCmp(Loop L, BranchInst BI, ScalarEvolution SE,
LoopInfo LI, DominatorTree DT, AssumptionCache *AC,		LoopInfo LI, DominatorTree DT, AssumptionCache *AC,
TargetLibraryInfo *LibInfo) = 0;		TargetLibraryInfo *LibInfo) = 0;
virtual AddressingModeKind		virtual AddressingModeKind
getPreferredAddressingMode(const Loop L, ScalarEvolution SE) const = 0;		getPreferredAddressingMode(const Loop L, ScalarEvolution SE) const = 0;
virtual bool isLegalMaskedStore(Type *DataType, Align Alignment) = 0;		virtual bool isLegalMaskedStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment) = 0;		virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment) = 0;
virtual bool isLegalNTStore(Type *DataType, Align Alignment) = 0;		virtual bool isLegalNTStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalNTLoad(Type *DataType, Align Alignment) = 0;		virtual bool isLegalNTLoad(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedScatter(Type *DataType, Align Alignment) = 0;		virtual bool shouldUseMaskedGatherForVectorization(Type *DataType,
		bool VariableMask,
		Align Alignment) = 0;
		virtual bool shouldUseMaskedScatterForVectorization(Type *DataType,
		bool VariableMask,
		Align Alignment) = 0;
virtual bool isLegalMaskedGather(Type *DataType, Align Alignment) = 0;		virtual bool isLegalMaskedGather(Type *DataType, Align Alignment) = 0;
		virtual bool isLegalMaskedScatter(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;		virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;
virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;		virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;
virtual bool enableOrderedReductions() = 0;		virtual bool enableOrderedReductions() = 0;
virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;		virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;
virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;		virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;
virtual bool prefersVectorizedAddressing() = 0;		virtual bool prefersVectorizedAddressing() = 0;
virtual InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset,		int64_t BaseOffset,
▲ Show 20 Lines • Show All 362 Lines • ▼ Show 20 Lines	bool isLegalMaskedLoad(Type *DataType, Align Alignment) override {
return Impl.isLegalMaskedLoad(DataType, Alignment);		return Impl.isLegalMaskedLoad(DataType, Alignment);
}		}
bool isLegalNTStore(Type *DataType, Align Alignment) override {		bool isLegalNTStore(Type *DataType, Align Alignment) override {
return Impl.isLegalNTStore(DataType, Alignment);		return Impl.isLegalNTStore(DataType, Alignment);
}		}
bool isLegalNTLoad(Type *DataType, Align Alignment) override {		bool isLegalNTLoad(Type *DataType, Align Alignment) override {
return Impl.isLegalNTLoad(DataType, Alignment);		return Impl.isLegalNTLoad(DataType, Alignment);
}		}
bool isLegalMaskedScatter(Type *DataType, Align Alignment) override {		bool shouldUseMaskedGatherForVectorization(Type *DataType, bool VariableMask,
return Impl.isLegalMaskedScatter(DataType, Alignment);		Align Alignment) override {
		return Impl.shouldUseMaskedGatherForVectorization(DataType, VariableMask,
		Alignment);
		}
		bool shouldUseMaskedScatterForVectorization(Type *DataType, bool VariableMask,
		Align Alignment) override {
		return Impl.shouldUseMaskedScatterForVectorization(DataType, VariableMask,
		Alignment);
}		}
bool isLegalMaskedGather(Type *DataType, Align Alignment) override {		bool isLegalMaskedGather(Type *DataType, Align Alignment) override {
return Impl.isLegalMaskedGather(DataType, Alignment);		return Impl.isLegalMaskedGather(DataType, Alignment);
}		}
		bool isLegalMaskedScatter(Type *DataType, Align Alignment) override {
		return Impl.isLegalMaskedScatter(DataType, Alignment);
		}
bool isLegalMaskedCompressStore(Type *DataType) override {		bool isLegalMaskedCompressStore(Type *DataType) override {
return Impl.isLegalMaskedCompressStore(DataType);		return Impl.isLegalMaskedCompressStore(DataType);
}		}
bool isLegalMaskedExpandLoad(Type *DataType) override {		bool isLegalMaskedExpandLoad(Type *DataType) override {
return Impl.isLegalMaskedExpandLoad(DataType);		return Impl.isLegalMaskedExpandLoad(DataType);
}		}
bool enableOrderedReductions() override {		bool enableOrderedReductions() override {
return Impl.enableOrderedReductions();		return Impl.enableOrderedReductions();
▲ Show 20 Lines • Show All 506 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 856 Lines • ▼ Show 20 Lines	private:
typedef TargetTransformInfoImplBase BaseT;		typedef TargetTransformInfoImplBase BaseT;

protected:		protected:
explicit TargetTransformInfoImplCRTPBase(const DataLayout &DL) : BaseT(DL) {}		explicit TargetTransformInfoImplCRTPBase(const DataLayout &DL) : BaseT(DL) {}

public:		public:
using BaseT::getGEPCost;		using BaseT::getGEPCost;

		bool shouldUseMaskedGatherForVectorization(Type *DataType, bool VariableMask,
		Align Alignment) {
		return static_cast<T *>(this)->isLegalMaskedGather(DataType, Alignment);
		}

		bool shouldUseMaskedScatterForVectorization(Type *DataType, bool VariableMask,
		Align Alignment) {
		return static_cast<T *>(this)->isLegalMaskedScatter(DataType, Alignment);
		}

InstructionCost getGEPCost(Type PointeeType, const Value Ptr,		InstructionCost getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
assert(PointeeType && Ptr && "can't get GEPCost of nullptr");		assert(PointeeType && Ptr && "can't get GEPCost of nullptr");
assert(cast<PointerType>(Ptr->getType()->getScalarType())		assert(cast<PointerType>(Ptr->getType()->getScalarType())
->isOpaqueOrPointeeTypeMatches(PointeeType) &&		->isOpaqueOrPointeeTypeMatches(PointeeType) &&
"explicit pointee type doesn't match operand's pointee type");		"explicit pointee type doesn't match operand's pointee type");
auto *BaseGV = dyn_cast<GlobalValue>(Ptr->stripPointerCasts());		auto *BaseGV = dyn_cast<GlobalValue>(Ptr->stripPointerCasts());
▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::isLegalNTStore(Type *DataType,
Align Alignment) const {		Align Alignment) const {
return TTIImpl->isLegalNTStore(DataType, Alignment);		return TTIImpl->isLegalNTStore(DataType, Alignment);
}		}

bool TargetTransformInfo::isLegalNTLoad(Type *DataType, Align Alignment) const {		bool TargetTransformInfo::isLegalNTLoad(Type *DataType, Align Alignment) const {
return TTIImpl->isLegalNTLoad(DataType, Alignment);		return TTIImpl->isLegalNTLoad(DataType, Alignment);
}		}

		bool TargetTransformInfo::shouldUseMaskedGatherForVectorization(
		Type *DataType, bool VariableMask, Align Alignment) const {
		return TTIImpl->shouldUseMaskedGatherForVectorization(DataType, VariableMask,
		Alignment);
		}

		bool TargetTransformInfo::shouldUseMaskedScatterForVectorization(
		Type *DataType, bool VariableMask, Align Alignment) const {
		return TTIImpl->shouldUseMaskedScatterForVectorization(DataType, VariableMask,
		Alignment);
		}

bool TargetTransformInfo::isLegalMaskedGather(Type *DataType,		bool TargetTransformInfo::isLegalMaskedGather(Type *DataType,
Align Alignment) const {		Align Alignment) const {
return TTIImpl->isLegalMaskedGather(DataType, Alignment);		return TTIImpl->isLegalMaskedGather(DataType, Alignment);
}		}

bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType,		bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType,
Align Alignment) const {		Align Alignment) const {
return TTIImpl->isLegalMaskedScatter(DataType, Alignment);		return TTIImpl->isLegalMaskedScatter(DataType, Alignment);
▲ Show 20 Lines • Show All 770 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2);		TargetTransformInfo::LSRCost &C2);
bool canMacroFuseCmp();		bool canMacroFuseCmp();
bool isLegalMaskedLoad(Type *DataType, Align Alignment);		bool isLegalMaskedLoad(Type *DataType, Align Alignment);
bool isLegalMaskedStore(Type *DataType, Align Alignment);		bool isLegalMaskedStore(Type *DataType, Align Alignment);
bool isLegalNTLoad(Type *DataType, Align Alignment);		bool isLegalNTLoad(Type *DataType, Align Alignment);
bool isLegalNTStore(Type *DataType, Align Alignment);		bool isLegalNTStore(Type *DataType, Align Alignment);
		bool shouldUseMaskedGatherForVectorization(Type *DataType, bool VariableMask,
		Align Alignment);
		bool shouldUseMaskedScatterForVectorization(Type *DataType, bool VariableMask,
		Align Alignment);
bool isLegalMaskedGather(Type *DataType, Align Alignment);		bool isLegalMaskedGather(Type *DataType, Align Alignment);
bool isLegalMaskedScatter(Type *DataType, Align Alignment);		bool isLegalMaskedScatter(Type *DataType, Align Alignment);
bool isLegalMaskedExpandLoad(Type *DataType);		bool isLegalMaskedExpandLoad(Type *DataType);
bool isLegalMaskedCompressStore(Type *DataType);		bool isLegalMaskedCompressStore(Type *DataType);
bool hasDivRemOp(Type *DataType, bool IsSigned);		bool hasDivRemOp(Type *DataType, bool IsSigned);
bool isFCmpOrdCheaperThanFCmpZero(Type *Ty);		bool isFCmpOrdCheaperThanFCmpZero(Type *Ty);
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;
Show All 24 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 4,870 Lines • ▼ Show 20 Lines	bool X86TTIImpl::isLegalMaskedExpandLoad(Type *DataTy) {
return IntWidth == 32 \|\| IntWidth == 64 \|\|		return IntWidth == 32 \|\| IntWidth == 64 \|\|
((IntWidth == 8 \|\| IntWidth == 16) && ST->hasVBMI2());		((IntWidth == 8 \|\| IntWidth == 16) && ST->hasVBMI2());
}		}

bool X86TTIImpl::isLegalMaskedCompressStore(Type *DataTy) {		bool X86TTIImpl::isLegalMaskedCompressStore(Type *DataTy) {
return isLegalMaskedExpandLoad(DataTy);		return isLegalMaskedExpandLoad(DataTy);
}		}

		bool X86TTIImpl::shouldUseMaskedGatherForVectorization(Type *DataType,
		bool VariableMask,
		Align Alignment) {
		if (!VariableMask)
		return true;
		return isLegalMaskedGather(DataType, Alignment);
		}
		bool X86TTIImpl::shouldUseMaskedScatterForVectorization(Type *DataType,
		bool VariableMask,
		Align Alignment) {
		if (!VariableMask)
		return true;
		return isLegalMaskedScatter(DataType, Alignment);
		}

bool X86TTIImpl::isLegalMaskedGather(Type *DataTy, Align Alignment) {		bool X86TTIImpl::isLegalMaskedGather(Type *DataTy, Align Alignment) {
// Some CPUs have better gather performance than others.		// Some CPUs have better gather performance than others.
// TODO: Remove the explicit ST->hasAVX512()?, That would mean we would only		// TODO: Remove the explicit ST->hasAVX512()?, That would mean we would only
// enable gather with a -march.		// enable gather with a -march.
if (!(ST->hasAVX512() \|\| (ST->hasFastGather() && ST->hasAVX2())))		if (!(ST->hasAVX512() \|\| (ST->hasFastGather() && ST->hasAVX2())))
return false;		return false;

// This function is called now in two cases: from the Loop Vectorizer		// This function is called now in two cases: from the Loop Vectorizer
▲ Show 20 Lines • Show All 539 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,498 Lines • ▼ Show 20 Lines	public:
/// for the given \p DataType and kind of access to \p Ptr.		/// for the given \p DataType and kind of access to \p Ptr.
bool isLegalMaskedLoad(Type DataType, Value Ptr, Align Alignment) const {		bool isLegalMaskedLoad(Type DataType, Value Ptr, Align Alignment) const {
return Legal->isConsecutivePtr(DataType, Ptr) &&		return Legal->isConsecutivePtr(DataType, Ptr) &&
TTI.isLegalMaskedLoad(DataType, Alignment);		TTI.isLegalMaskedLoad(DataType, Alignment);
}		}

/// Returns true if the target machine can represent \p V as a masked gather		/// Returns true if the target machine can represent \p V as a masked gather
/// or scatter operation.		/// or scatter operation.
bool isLegalGatherOrScatter(Value *V) {		bool shouldUseGatherOrScatterForVectorization(Value *V) {
		auto *I = cast<Instruction>(V);
bool LI = isa<LoadInst>(V);		bool LI = isa<LoadInst>(V);
bool SI = isa<StoreInst>(V);		bool SI = isa<StoreInst>(V);
if (!LI && !SI)		if (!LI && !SI)
return false;		return false;
auto *Ty = getLoadStoreType(V);		auto *Ty = getLoadStoreType(V);
Align Align = getLoadStoreAlignment(V);		Align Align = getLoadStoreAlignment(V);
return (LI && TTI.isLegalMaskedGather(Ty, Align)) \|\|		return (LI && TTI.shouldUseMaskedGatherForVectorization(
(SI && TTI.isLegalMaskedScatter(Ty, Align));		Ty, Legal->isMaskRequired(I), Align)) \|\|
		(SI && TTI.shouldUseMaskedScatterForVectorization(
		Ty, Legal->isMaskRequired(I), Align));
}		}

/// Returns true if the target machine supports all of the reduction		/// Returns true if the target machine supports all of the reduction
/// variables found for the given VF.		/// variables found for the given VF.
bool canVectorizeReductions(ElementCount VF) const {		bool canVectorizeReductions(ElementCount VF) const {
return (all_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {		return (all_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {
const RecurrenceDescriptor &RdxDesc = Reduction.second;		const RecurrenceDescriptor &RdxDesc = Reduction.second;
return TTI.isLegalToVectorizeReduction(RdxDesc, VF);		return TTI.isLegalToVectorizeReduction(RdxDesc, VF);
▲ Show 20 Lines • Show All 4,813 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
//		//
// FIXME: The check here attempts to predict whether a load or store will		// FIXME: The check here attempts to predict whether a load or store will
// be vectorized. We only know this for certain after a VF has		// be vectorized. We only know this for certain after a VF has
// been selected. Here, we assume that if an access can be		// been selected. Here, we assume that if an access can be
// vectorized, it will be. We should also look at extending this		// vectorized, it will be. We should also look at extending this
// optimization to non-pointer types.		// optimization to non-pointer types.
//		//
if (T->isPointerTy() && !isConsecutiveLoadOrStore(&I) &&		if (T->isPointerTy() && !isConsecutiveLoadOrStore(&I) &&
!isAccessInterleaved(&I) && !isLegalGatherOrScatter(&I))		!isAccessInterleaved(&I) &&
		!shouldUseGatherOrScatterForVectorization(&I))
continue;		continue;

ElementTypesInLoop.insert(T);		ElementTypesInLoop.insert(T);
}		}
}		}
}		}

unsigned LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,		unsigned LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
▲ Show 20 Lines • Show All 1,095 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {

if (Legal->isUniformMemOp(I)) {		if (Legal->isUniformMemOp(I)) {
// TODO: Avoid replicating loads and stores instead of		// TODO: Avoid replicating loads and stores instead of
// relying on instcombine to remove them.		// relying on instcombine to remove them.
// Load: Scalar load + broadcast		// Load: Scalar load + broadcast
// Store: Scalar store + isLoopInvariantStoreValue ? 0 : extract		// Store: Scalar store + isLoopInvariantStoreValue ? 0 : extract
InstructionCost Cost;		InstructionCost Cost;
if (isa<StoreInst>(&I) && VF.isScalable() &&		if (isa<StoreInst>(&I) && VF.isScalable() &&
isLegalGatherOrScatter(&I)) {		shouldUseGatherOrScatterForVectorization(&I)) {
Cost = getGatherScatterCost(&I, VF);		Cost = getGatherScatterCost(&I, VF);
setWideningDecision(&I, VF, CM_GatherScatter, Cost);		setWideningDecision(&I, VF, CM_GatherScatter, Cost);
} else {		} else {
assert((isa<LoadInst>(&I) \|\| !VF.isScalable()) &&		assert((isa<LoadInst>(&I) \|\| !VF.isScalable()) &&
"Cannot yet scalarize uniform stores");		"Cannot yet scalarize uniform stores");
Cost = getUniformMemOpCost(&I, VF);		Cost = getUniformMemOpCost(&I, VF);
setWideningDecision(&I, VF, CM_Scalarize, Cost);		setWideningDecision(&I, VF, CM_Scalarize, Cost);
}		}
Show All 25 Lines	for (Instruction &I : *BB) {
continue;		continue;

NumAccesses = Group->getNumMembers();		NumAccesses = Group->getNumMembers();
if (interleavedAccessCanBeWidened(&I, VF))		if (interleavedAccessCanBeWidened(&I, VF))
InterleaveCost = getInterleaveGroupCost(&I, VF);		InterleaveCost = getInterleaveGroupCost(&I, VF);
}		}

InstructionCost GatherScatterCost =		InstructionCost GatherScatterCost =
isLegalGatherOrScatter(&I)		shouldUseGatherOrScatterForVectorization(&I)
? getGatherScatterCost(&I, VF) * NumAccesses		? getGatherScatterCost(&I, VF) * NumAccesses
: InstructionCost::getInvalid();		: InstructionCost::getInvalid();

InstructionCost ScalarizationCost =		InstructionCost ScalarizationCost =
getMemInstScalarizationCost(&I, VF) * NumAccesses;		getMemInstScalarizationCost(&I, VF) * NumAccesses;

// Choose better solution for the current VF,		// Choose better solution for the current VF,
// write down this decision and use it during vectorization.		// write down this decision and use it during vectorization.
▲ Show 20 Lines • Show All 3,088 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i16] zeroinitializer, align 128			@B = global [1024 x i16] zeroinitializer, align 128
	@C = global [1024 x i16] zeroinitializer, align 128			@C = global [1024 x i16] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 28 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 8 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 56 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 16 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 112 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 32 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 224 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 64 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 28 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 8 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 56 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 16 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 112 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 32 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 224 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 64 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 108 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 28 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 218 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 58 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 436 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 116 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 218 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 58 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 436 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 116 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 218 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 58 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 436 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 116 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 110 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 30 for VF 8 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 222 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 62 for VF 16 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 444 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 124 for VF 32 For instruction: %valB = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 888 for VF 64 For instruction: %valB = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 248 for VF 64 For instruction: %valB = load i16, i16* %inB, align 2
	;			;
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB = load i16, i16* %inB, align 4			; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB = load i16, i16* %inB, align 4
	define void @test() {			define void @test() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	Show All 18 Lines

llvm/test/Analysis/CostModel/X86/gather-i32-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i32] zeroinitializer, align 128			@B = global [1024 x i32] zeroinitializer, align 128
	@C = global [1024 x i32] zeroinitializer, align 128			@C = global [1024 x i32] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 29 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 9 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 59 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 19 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 118 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 38 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 236 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 76 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 29 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 9 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 59 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 19 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 118 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 38 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 236 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 76 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 110 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 30 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 220 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 60 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 440 for VF 32 For instruction: %valB = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 120 for VF 32 For instruction: %valB = load i32, i32* %inB, align 4
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 110 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 30 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 220 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 60 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 440 for VF 32 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 120 for VF 32 For instruction: %valB = load i32, i32* %inB, align 4
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB = load i32, i32* %inB, align 4
	;			;
	Show All 33 Lines

llvm/test/Analysis/CostModel/X86/gather-i64-with-i8-index.ll

	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse2 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE2			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse2 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE2
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse42 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE2			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse42 --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE42
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX1			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX1
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2,-fast-gather --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX2-SLOWGATHER			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2,-fast-gather --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX2-SLOWGATHER
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2,+fast-gather --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX2-FASTGATHER			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2,+fast-gather --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX2-FASTGATHER
	; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx512bw --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX512			; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx512bw --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX512

	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i64] zeroinitializer, align 128			@B = global [1024 x i64] zeroinitializer, align 128
	@C = global [1024 x i64] zeroinitializer, align 128			@C = global [1024 x i64] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 29 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 9 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 58 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 18 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 116 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 36 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 232 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 72 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 29 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 9 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 58 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 18 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 116 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 36 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 232 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 72 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 56 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 16 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 112 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 32 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 224 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 64 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 448 for VF 32 For instruction: %valB = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 128 for VF 32 For instruction: %valB = load i64, i64* %inB, align 8
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 56 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 16 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 112 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 32 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 224 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 64 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: %valB = load i64, i64* %inB, align 8
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB = load i64, i64* %inB, align 8
	;			;
	Show All 34 Lines

llvm/test/Analysis/CostModel/X86/gather-i8-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i8] zeroinitializer, align 128			@B = global [1024 x i8] zeroinitializer, align 128
	@C = global [1024 x i8] zeroinitializer, align 128			@C = global [1024 x i8] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 29 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 9 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 59 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 19 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 119 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 39 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 239 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 79 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 29 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 9 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 59 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 19 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 119 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 39 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 239 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 79 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 108 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 28 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 216 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 56 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 434 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 114 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 216 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 56 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 434 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 114 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 216 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 56 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 434 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 114 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 26 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 54 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 14 for VF 4 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 110 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 30 for VF 8 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 220 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 60 for VF 16 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 442 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 122 for VF 32 For instruction: %valB = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 884 for VF 64 For instruction: %valB = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 244 for VF 64 For instruction: %valB = load i8, i8* %inB, align 1
	;			;
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB = load i8, i8* %inB, align 4			; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB = load i8, i8* %inB, align 4
	define void @test() {			define void @test() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	Show All 18 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll

	Show All 12 Lines
	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; SSE2: LV: Found an estimated cost of 8 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 8 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; SSE2: LV: Found an estimated cost of 17 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 17 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; SSE2: LV: Found an estimated cost of 34 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; SSE2: LV: Found an estimated cost of 34 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 7 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 11 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 11 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 25 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 25 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX1: LV: Found an estimated cost of 50 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX1: LV: Found an estimated cost of 50 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 7 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 11 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 11 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 25 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 25 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX2: LV: Found an estimated cost of 50 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX2: LV: Found an estimated cost of 50 for VF 16 For instruction: %v0 = load i32, i32* %in0, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 2 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 4 For instruction: %v0 = load i32, i32* %in0, align 4
	; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4			; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %v0 = load i32, i32* %in0, align 4
	Show All 34 Lines

llvm/test/Analysis/CostModel/X86/scatter-i16-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i16] zeroinitializer, align 128			@B = global [1024 x i16] zeroinitializer, align 128
	@C = global [1024 x i16] zeroinitializer, align 128			@C = global [1024 x i16] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2
	; SSE2: LV: Found an estimated cost of 28 for VF 2 For instruction: store i16 %valB, i16* %out, align 2			; SSE2: LV: Found an estimated cost of 8 for VF 2 For instruction: store i16 %valB, i16* %out, align 2
	; SSE2: LV: Found an estimated cost of 56 for VF 4 For instruction: store i16 %valB, i16* %out, align 2			; SSE2: LV: Found an estimated cost of 16 for VF 4 For instruction: store i16 %valB, i16* %out, align 2
	; SSE2: LV: Found an estimated cost of 112 for VF 8 For instruction: store i16 %valB, i16* %out, align 2			; SSE2: LV: Found an estimated cost of 32 for VF 8 For instruction: store i16 %valB, i16* %out, align 2
	; SSE2: LV: Found an estimated cost of 224 for VF 16 For instruction: store i16 %valB, i16* %out, align 2			; SSE2: LV: Found an estimated cost of 64 for VF 16 For instruction: store i16 %valB, i16* %out, align 2
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2
	; SSE42: LV: Found an estimated cost of 28 for VF 2 For instruction: store i16 %valB, i16* %out, align 2			; SSE42: LV: Found an estimated cost of 8 for VF 2 For instruction: store i16 %valB, i16* %out, align 2
	; SSE42: LV: Found an estimated cost of 56 for VF 4 For instruction: store i16 %valB, i16* %out, align 2			; SSE42: LV: Found an estimated cost of 16 for VF 4 For instruction: store i16 %valB, i16* %out, align 2
	; SSE42: LV: Found an estimated cost of 112 for VF 8 For instruction: store i16 %valB, i16* %out, align 2			; SSE42: LV: Found an estimated cost of 32 for VF 8 For instruction: store i16 %valB, i16* %out, align 2
	; SSE42: LV: Found an estimated cost of 224 for VF 16 For instruction: store i16 %valB, i16* %out, align 2			; SSE42: LV: Found an estimated cost of 64 for VF 16 For instruction: store i16 %valB, i16* %out, align 2
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: store i16 %valB, i16* %out, align 2			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %valB, i16* %out, align 2
	; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: store i16 %valB, i16* %out, align 2			; AVX1: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %valB, i16* %out, align 2
	; AVX1: LV: Found an estimated cost of 108 for VF 8 For instruction: store i16 %valB, i16* %out, align 2			; AVX1: LV: Found an estimated cost of 28 for VF 8 For instruction: store i16 %valB, i16* %out, align 2
	; AVX1: LV: Found an estimated cost of 224 for VF 16 For instruction: store i16 %valB, i16* %out, align 2			; AVX1: LV: Found an estimated cost of 64 for VF 16 For instruction: store i16 %valB, i16* %out, align 2
	; AVX1: LV: Found an estimated cost of 448 for VF 32 For instruction: store i16 %valB, i16* %out, align 2			; AVX1: LV: Found an estimated cost of 128 for VF 32 For instruction: store i16 %valB, i16* %out, align 2
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 224 for VF 16 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 64 for VF 16 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i16 %valB, i16* %out, align 2
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 224 for VF 16 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 64 for VF 16 For instruction: store i16 %valB, i16* %out, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i16 %valB, i16* %out, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i16 %valB, i16* %out, align 2
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %valB, i16* %out, align 2
	; AVX512: LV: Found an estimated cost of 26 for VF 2 For instruction: store i16 %valB, i16* %out, align 2			; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %valB, i16* %out, align 2
	; AVX512: LV: Found an estimated cost of 54 for VF 4 For instruction: store i16 %valB, i16* %out, align 2			; AVX512: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %valB, i16* %out, align 2
	; AVX512: LV: Found an estimated cost of 110 for VF 8 For instruction: store i16 %valB, i16* %out, align 2			; AVX512: LV: Found an estimated cost of 30 for VF 8 For instruction: store i16 %valB, i16* %out, align 2
	; AVX512: LV: Found an estimated cost of 228 for VF 16 For instruction: store i16 %valB, i16* %out, align 2			; AVX512: LV: Found an estimated cost of 68 for VF 16 For instruction: store i16 %valB, i16* %out, align 2
	; AVX512: LV: Found an estimated cost of 464 for VF 32 For instruction: store i16 %valB, i16* %out, align 2			; AVX512: LV: Found an estimated cost of 144 for VF 32 For instruction: store i16 %valB, i16* %out, align 2
	; AVX512: LV: Found an estimated cost of 928 for VF 64 For instruction: store i16 %valB, i16* %out, align 2			; AVX512: LV: Found an estimated cost of 288 for VF 64 For instruction: store i16 %valB, i16* %out, align 2
	;			;
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: store i16 %valB, i16* %out			; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: store i16 %valB, i16* %out
	define void @test() {			define void @test() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	Show All 18 Lines

llvm/test/Analysis/CostModel/X86/scatter-i32-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i32] zeroinitializer, align 128			@B = global [1024 x i32] zeroinitializer, align 128
	@C = global [1024 x i32] zeroinitializer, align 128			@C = global [1024 x i32] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4
	; SSE2: LV: Found an estimated cost of 29 for VF 2 For instruction: store i32 %valB, i32* %out, align 4			; SSE2: LV: Found an estimated cost of 9 for VF 2 For instruction: store i32 %valB, i32* %out, align 4
	; SSE2: LV: Found an estimated cost of 59 for VF 4 For instruction: store i32 %valB, i32* %out, align 4			; SSE2: LV: Found an estimated cost of 19 for VF 4 For instruction: store i32 %valB, i32* %out, align 4
	; SSE2: LV: Found an estimated cost of 118 for VF 8 For instruction: store i32 %valB, i32* %out, align 4			; SSE2: LV: Found an estimated cost of 38 for VF 8 For instruction: store i32 %valB, i32* %out, align 4
	; SSE2: LV: Found an estimated cost of 236 for VF 16 For instruction: store i32 %valB, i32* %out, align 4			; SSE2: LV: Found an estimated cost of 76 for VF 16 For instruction: store i32 %valB, i32* %out, align 4
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4
	; SSE42: LV: Found an estimated cost of 29 for VF 2 For instruction: store i32 %valB, i32* %out, align 4			; SSE42: LV: Found an estimated cost of 9 for VF 2 For instruction: store i32 %valB, i32* %out, align 4
	; SSE42: LV: Found an estimated cost of 59 for VF 4 For instruction: store i32 %valB, i32* %out, align 4			; SSE42: LV: Found an estimated cost of 19 for VF 4 For instruction: store i32 %valB, i32* %out, align 4
	; SSE42: LV: Found an estimated cost of 118 for VF 8 For instruction: store i32 %valB, i32* %out, align 4			; SSE42: LV: Found an estimated cost of 38 for VF 8 For instruction: store i32 %valB, i32* %out, align 4
	; SSE42: LV: Found an estimated cost of 236 for VF 16 For instruction: store i32 %valB, i32* %out, align 4			; SSE42: LV: Found an estimated cost of 76 for VF 16 For instruction: store i32 %valB, i32* %out, align 4
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: store i32 %valB, i32* %out, align 4			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: store i32 %valB, i32* %out, align 4
	; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: store i32 %valB, i32* %out, align 4			; AVX1: LV: Found an estimated cost of 14 for VF 4 For instruction: store i32 %valB, i32* %out, align 4
	; AVX1: LV: Found an estimated cost of 112 for VF 8 For instruction: store i32 %valB, i32* %out, align 4			; AVX1: LV: Found an estimated cost of 32 for VF 8 For instruction: store i32 %valB, i32* %out, align 4
	; AVX1: LV: Found an estimated cost of 224 for VF 16 For instruction: store i32 %valB, i32* %out, align 4			; AVX1: LV: Found an estimated cost of 64 for VF 16 For instruction: store i32 %valB, i32* %out, align 4
	; AVX1: LV: Found an estimated cost of 448 for VF 32 For instruction: store i32 %valB, i32* %out, align 4			; AVX1: LV: Found an estimated cost of 128 for VF 32 For instruction: store i32 %valB, i32* %out, align 4
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 112 for VF 8 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 32 for VF 8 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 224 for VF 16 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 64 for VF 16 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i32 %valB, i32* %out, align 4
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 112 for VF 8 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 32 for VF 8 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 224 for VF 16 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 64 for VF 16 For instruction: store i32 %valB, i32* %out, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i32 %valB, i32* %out, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i32 %valB, i32* %out, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, i32* %out
	; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: store i32 %valB, i32* %out			; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: store i32 %valB, i32* %out
	; AVX512: LV: Found an estimated cost of 14 for VF 4 For instruction: store i32 %valB, i32* %out			; AVX512: LV: Found an estimated cost of 14 for VF 4 For instruction: store i32 %valB, i32* %out
	; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i32 %valB, i32* %out			; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i32 %valB, i32* %out
	; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: store i32 %valB, i32* %out			; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: store i32 %valB, i32* %out
	; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: store i32 %valB, i32* %out			; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: store i32 %valB, i32* %out
	; AVX512: LV: Found an estimated cost of 72 for VF 64 For instruction: store i32 %valB, i32* %out			; AVX512: LV: Found an estimated cost of 72 for VF 64 For instruction: store i32 %valB, i32* %out
	Show All 26 Lines

llvm/test/Analysis/CostModel/X86/scatter-i64-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i64] zeroinitializer, align 128			@B = global [1024 x i64] zeroinitializer, align 128
	@C = global [1024 x i64] zeroinitializer, align 128			@C = global [1024 x i64] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8
	; SSE2: LV: Found an estimated cost of 29 for VF 2 For instruction: store i64 %valB, i64* %out, align 8			; SSE2: LV: Found an estimated cost of 9 for VF 2 For instruction: store i64 %valB, i64* %out, align 8
	; SSE2: LV: Found an estimated cost of 58 for VF 4 For instruction: store i64 %valB, i64* %out, align 8			; SSE2: LV: Found an estimated cost of 18 for VF 4 For instruction: store i64 %valB, i64* %out, align 8
	; SSE2: LV: Found an estimated cost of 116 for VF 8 For instruction: store i64 %valB, i64* %out, align 8			; SSE2: LV: Found an estimated cost of 36 for VF 8 For instruction: store i64 %valB, i64* %out, align 8
	; SSE2: LV: Found an estimated cost of 232 for VF 16 For instruction: store i64 %valB, i64* %out, align 8			; SSE2: LV: Found an estimated cost of 72 for VF 16 For instruction: store i64 %valB, i64* %out, align 8
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8
	; SSE42: LV: Found an estimated cost of 29 for VF 2 For instruction: store i64 %valB, i64* %out, align 8			; SSE42: LV: Found an estimated cost of 9 for VF 2 For instruction: store i64 %valB, i64* %out, align 8
	; SSE42: LV: Found an estimated cost of 58 for VF 4 For instruction: store i64 %valB, i64* %out, align 8			; SSE42: LV: Found an estimated cost of 18 for VF 4 For instruction: store i64 %valB, i64* %out, align 8
	; SSE42: LV: Found an estimated cost of 116 for VF 8 For instruction: store i64 %valB, i64* %out, align 8			; SSE42: LV: Found an estimated cost of 36 for VF 8 For instruction: store i64 %valB, i64* %out, align 8
	; SSE42: LV: Found an estimated cost of 232 for VF 16 For instruction: store i64 %valB, i64* %out, align 8			; SSE42: LV: Found an estimated cost of 72 for VF 16 For instruction: store i64 %valB, i64* %out, align 8
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: store i64 %valB, i64* %out, align 8			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: store i64 %valB, i64* %out, align 8
	; AVX1: LV: Found an estimated cost of 56 for VF 4 For instruction: store i64 %valB, i64* %out, align 8			; AVX1: LV: Found an estimated cost of 16 for VF 4 For instruction: store i64 %valB, i64* %out, align 8
	; AVX1: LV: Found an estimated cost of 112 for VF 8 For instruction: store i64 %valB, i64* %out, align 8			; AVX1: LV: Found an estimated cost of 32 for VF 8 For instruction: store i64 %valB, i64* %out, align 8
	; AVX1: LV: Found an estimated cost of 224 for VF 16 For instruction: store i64 %valB, i64* %out, align 8			; AVX1: LV: Found an estimated cost of 64 for VF 16 For instruction: store i64 %valB, i64* %out, align 8
	; AVX1: LV: Found an estimated cost of 448 for VF 32 For instruction: store i64 %valB, i64* %out, align 8			; AVX1: LV: Found an estimated cost of 128 for VF 32 For instruction: store i64 %valB, i64* %out, align 8
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 56 for VF 4 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 16 for VF 4 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 112 for VF 8 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 32 for VF 8 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 224 for VF 16 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 64 for VF 16 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i64 %valB, i64* %out, align 8
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 56 for VF 4 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 16 for VF 4 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 112 for VF 8 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 32 for VF 8 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 224 for VF 16 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 64 for VF 16 For instruction: store i64 %valB, i64* %out, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i64 %valB, i64* %out, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i64 %valB, i64* %out, align 8
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, i64* %out, align 8
	; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: store i64 %valB, i64* %out, align 8			; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: store i64 %valB, i64* %out, align 8
	; AVX512: LV: Found an estimated cost of 16 for VF 4 For instruction: store i64 %valB, i64* %out, align 8			; AVX512: LV: Found an estimated cost of 16 for VF 4 For instruction: store i64 %valB, i64* %out, align 8
	; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %valB, i64* %out, align 8			; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %valB, i64* %out, align 8
	; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: store i64 %valB, i64* %out, align 8			; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: store i64 %valB, i64* %out, align 8
	; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: store i64 %valB, i64* %out, align 8			; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: store i64 %valB, i64* %out, align 8
	; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: store i64 %valB, i64* %out, align 8			; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: store i64 %valB, i64* %out, align 8
	Show All 26 Lines

llvm/test/Analysis/CostModel/X86/scatter-i8-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i8] zeroinitializer, align 128			@B = global [1024 x i8] zeroinitializer, align 128
	@C = global [1024 x i8] zeroinitializer, align 128			@C = global [1024 x i8] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1
	; SSE2: LV: Found an estimated cost of 29 for VF 2 For instruction: store i8 %valB, i8* %out, align 1			; SSE2: LV: Found an estimated cost of 9 for VF 2 For instruction: store i8 %valB, i8* %out, align 1
	; SSE2: LV: Found an estimated cost of 59 for VF 4 For instruction: store i8 %valB, i8* %out, align 1			; SSE2: LV: Found an estimated cost of 19 for VF 4 For instruction: store i8 %valB, i8* %out, align 1
	; SSE2: LV: Found an estimated cost of 119 for VF 8 For instruction: store i8 %valB, i8* %out, align 1			; SSE2: LV: Found an estimated cost of 39 for VF 8 For instruction: store i8 %valB, i8* %out, align 1
	; SSE2: LV: Found an estimated cost of 239 for VF 16 For instruction: store i8 %valB, i8* %out, align 1			; SSE2: LV: Found an estimated cost of 79 for VF 16 For instruction: store i8 %valB, i8* %out, align 1
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1
	; SSE42: LV: Found an estimated cost of 29 for VF 2 For instruction: store i8 %valB, i8* %out, align 1			; SSE42: LV: Found an estimated cost of 9 for VF 2 For instruction: store i8 %valB, i8* %out, align 1
	; SSE42: LV: Found an estimated cost of 59 for VF 4 For instruction: store i8 %valB, i8* %out, align 1			; SSE42: LV: Found an estimated cost of 19 for VF 4 For instruction: store i8 %valB, i8* %out, align 1
	; SSE42: LV: Found an estimated cost of 119 for VF 8 For instruction: store i8 %valB, i8* %out, align 1			; SSE42: LV: Found an estimated cost of 39 for VF 8 For instruction: store i8 %valB, i8* %out, align 1
	; SSE42: LV: Found an estimated cost of 239 for VF 16 For instruction: store i8 %valB, i8* %out, align 1			; SSE42: LV: Found an estimated cost of 79 for VF 16 For instruction: store i8 %valB, i8* %out, align 1
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1
	; AVX1: LV: Found an estimated cost of 26 for VF 2 For instruction: store i8 %valB, i8* %out, align 1			; AVX1: LV: Found an estimated cost of 6 for VF 2 For instruction: store i8 %valB, i8* %out, align 1
	; AVX1: LV: Found an estimated cost of 54 for VF 4 For instruction: store i8 %valB, i8* %out, align 1			; AVX1: LV: Found an estimated cost of 14 for VF 4 For instruction: store i8 %valB, i8* %out, align 1
	; AVX1: LV: Found an estimated cost of 108 for VF 8 For instruction: store i8 %valB, i8* %out, align 1			; AVX1: LV: Found an estimated cost of 28 for VF 8 For instruction: store i8 %valB, i8* %out, align 1
	; AVX1: LV: Found an estimated cost of 216 for VF 16 For instruction: store i8 %valB, i8* %out, align 1			; AVX1: LV: Found an estimated cost of 56 for VF 16 For instruction: store i8 %valB, i8* %out, align 1
	; AVX1: LV: Found an estimated cost of 448 for VF 32 For instruction: store i8 %valB, i8* %out, align 1			; AVX1: LV: Found an estimated cost of 128 for VF 32 For instruction: store i8 %valB, i8* %out, align 1
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 216 for VF 16 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 56 for VF 16 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i8 %valB, i8* %out, align 1
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 26 for VF 2 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 2 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 54 for VF 4 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 14 for VF 4 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 108 for VF 8 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 28 for VF 8 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 216 for VF 16 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 56 for VF 16 For instruction: store i8 %valB, i8* %out, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 448 for VF 32 For instruction: store i8 %valB, i8* %out, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 128 for VF 32 For instruction: store i8 %valB, i8* %out, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %valB, i8* %out, align 1
	; AVX512: LV: Found an estimated cost of 26 for VF 2 For instruction: store i8 %valB, i8* %out, align 1			; AVX512: LV: Found an estimated cost of 6 for VF 2 For instruction: store i8 %valB, i8* %out, align 1
	; AVX512: LV: Found an estimated cost of 54 for VF 4 For instruction: store i8 %valB, i8* %out, align 1			; AVX512: LV: Found an estimated cost of 14 for VF 4 For instruction: store i8 %valB, i8* %out, align 1
	; AVX512: LV: Found an estimated cost of 110 for VF 8 For instruction: store i8 %valB, i8* %out, align 1			; AVX512: LV: Found an estimated cost of 30 for VF 8 For instruction: store i8 %valB, i8* %out, align 1
	; AVX512: LV: Found an estimated cost of 220 for VF 16 For instruction: store i8 %valB, i8* %out, align 1			; AVX512: LV: Found an estimated cost of 60 for VF 16 For instruction: store i8 %valB, i8* %out, align 1
	; AVX512: LV: Found an estimated cost of 456 for VF 32 For instruction: store i8 %valB, i8* %out, align 1			; AVX512: LV: Found an estimated cost of 136 for VF 32 For instruction: store i8 %valB, i8* %out, align 1
	; AVX512: LV: Found an estimated cost of 928 for VF 64 For instruction: store i8 %valB, i8* %out, align 1			; AVX512: LV: Found an estimated cost of 288 for VF 64 For instruction: store i8 %valB, i8* %out, align 1
	;			;
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: store i8 %valB, i8* %out			; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: store i8 %valB, i8* %out
	define void @test() {			define void @test() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	Show All 18 Lines

llvm/test/Transforms/LoopVectorize/X86/gather-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-vectorize -mtriple=x86_64-apple-macosx -S -mcpu=corei7-avx -enable-interleaved-mem-accesses=false < %s \| FileCheck %s			; RUN: opt -loop-vectorize -mtriple=x86_64-apple-macosx -S -mcpu=corei7-avx -enable-interleaved-mem-accesses=false < %s \| FileCheck %s
				RKSimonUnsubmitted Done Reply Inline Actions pre-commit the regeneration? RKSimon: pre-commit the regeneration?
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	@kernel = global [512 x float] zeroinitializer, align 16			@kernel = global [512 x float] zeroinitializer, align 16
	@kernel2 = global [512 x float] zeroinitializer, align 16			@kernel2 = global [512 x float] zeroinitializer, align 16
	@kernel3 = global [512 x float] zeroinitializer, align 16			@kernel3 = global [512 x float] zeroinitializer, align 16
	@kernel4 = global [512 x float] zeroinitializer, align 16			@kernel4 = global [512 x float] zeroinitializer, align 16
	@src_data = global [1536 x float] zeroinitializer, align 16			@src_data = global [1536 x float] zeroinitializer, align 16
	@r_ = global i8 0, align 1			@r_ = global i8 0, align 1
	@g_ = global i8 0, align 1			@g_ = global i8 0, align 1
	@b_ = global i8 0, align 1			@b_ = global i8 0, align 1

	; We don't want to vectorize most loops containing gathers because they are			; We don't want to vectorize most loops containing gathers because they are
	; expensive. This function represents a point where vectorization starts to			; expensive. This function represents a point where vectorization starts to
	; become beneficial.			; become beneficial.
	; Make sure we are conservative and don't vectorize it.			; Make sure we are conservative and don't vectorize it.

				RKSimonUnsubmitted Done Reply Inline Actions Does this pass with this still here? RKSimon: Does this pass with this still here?
	define void @_Z4testmm(i64 %size, i64 %offset) {			define void @_Z4testmm(i64 %size, i64 %offset) {
	; CHECK-LABEL: @_Z4testmm(			; CHECK-LABEL: @_Z4testmm(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP53:%.]] = icmp eq i64 [[SIZE:%.]], 0			; CHECK-NEXT: [[CMP53:%.]] = icmp eq i64 [[SIZE:%.]], 0
	; CHECK-NEXT: br i1 [[CMP53]], label [[FOR_END:%.]], label [[FOR_BODY_LR_PH:%.]]			; CHECK-NEXT: br i1 [[CMP53]], label [[FOR_END:%.]], label [[FOR_BODY_LR_PH:%.]]
	; CHECK: for.body.lr.ph:			; CHECK: for.body.lr.ph:
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[SIZE]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[SIZE]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[SIZE]], [[N_MOD_VF]]
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[OFFSET:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP30:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP37:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
				; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i64> [[TMP4]], <i64 3, i64 3, i64 3, i64 3>
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, <4 x i64> [[TMP5]]
				; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP6]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef)
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel, i64 0, i64 [[TMP0]]
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP7]], i32 0
				; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4
				; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x float> [[WIDE_MASKED_GATHER]], [[WIDE_LOAD]]
				; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel2, i64 0, i64 [[TMP0]]
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP11]], i32 0
				; CHECK-NEXT: [[TMP13:%.]] = bitcast float [[TMP12]] to <4 x float>*
				; CHECK-NEXT: [[WIDE_LOAD3:%.]] = load <4 x float>, <4 x float> [[TMP13]], align 4
				; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <4 x float> [[TMP10]], [[WIDE_LOAD3]]
				; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel3, i64 0, i64 [[TMP0]]
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP15]], i32 0
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x float>, <4 x float> [[TMP17]], align 4
				; CHECK-NEXT: [[TMP18:%.*]] = fmul fast <4 x float> [[TMP14]], [[WIDE_LOAD4]]
				; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel4, i64 0, i64 [[TMP0]]
				; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[TMP19]], i32 0
				; CHECK-NEXT: [[TMP21:%.]] = bitcast float [[TMP20]] to <4 x float>*
				; CHECK-NEXT: [[WIDE_LOAD5:%.]] = load <4 x float>, <4 x float> [[TMP21]], align 4
				; CHECK-NEXT: [[TMP22:%.*]] = fmul fast <4 x float> [[TMP18]], [[WIDE_LOAD5]]
				; CHECK-NEXT: [[TMP23]] = fadd fast <4 x float> [[VEC_PHI]], [[TMP22]]
				; CHECK-NEXT: [[TMP24:%.*]] = add <4 x i64> [[TMP5]], <i64 1, i64 1, i64 1, i64 1>
				; CHECK-NEXT: [[TMP25:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, <4 x i64> [[TMP24]]
				; CHECK-NEXT: [[WIDE_MASKED_GATHER6:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP25]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef)
				; CHECK-NEXT: [[TMP26:%.*]] = fmul fast <4 x float> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER6]]
				; CHECK-NEXT: [[TMP27:%.*]] = fmul fast <4 x float> [[WIDE_LOAD3]], [[TMP26]]
				; CHECK-NEXT: [[TMP28:%.*]] = fmul fast <4 x float> [[WIDE_LOAD4]], [[TMP27]]
				; CHECK-NEXT: [[TMP29:%.*]] = fmul fast <4 x float> [[WIDE_LOAD5]], [[TMP28]]
				; CHECK-NEXT: [[TMP30]] = fadd fast <4 x float> [[VEC_PHI1]], [[TMP29]]
				; CHECK-NEXT: [[TMP31:%.*]] = add <4 x i64> [[TMP5]], <i64 2, i64 2, i64 2, i64 2>
				; CHECK-NEXT: [[TMP32:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, <4 x i64> [[TMP31]]
				; CHECK-NEXT: [[WIDE_MASKED_GATHER7:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP32]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef)
				; CHECK-NEXT: [[TMP33:%.*]] = fmul fast <4 x float> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER7]]
				; CHECK-NEXT: [[TMP34:%.*]] = fmul fast <4 x float> [[WIDE_LOAD3]], [[TMP33]]
				; CHECK-NEXT: [[TMP35:%.*]] = fmul fast <4 x float> [[WIDE_LOAD4]], [[TMP34]]
				; CHECK-NEXT: [[TMP36:%.*]] = fmul fast <4 x float> [[WIDE_LOAD5]], [[TMP35]]
				; CHECK-NEXT: [[TMP37]] = fadd fast <4 x float> [[VEC_PHI2]], [[TMP36]]
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP38:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP38]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[TMP39:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP37]])
				; CHECK-NEXT: [[TMP40:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP30]])
				; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP23]])
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SIZE]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_LR_PH]] ]
				; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[FOR_BODY_LR_PH]] ], [ [[TMP41]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: [[BC_MERGE_RDX8:%.*]] = phi float [ 0.000000e+00, [[FOR_BODY_LR_PH]] ], [ [[TMP40]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: [[BC_MERGE_RDX9:%.*]] = phi float [ 0.000000e+00, [[FOR_BODY_LR_PH]] ], [ [[TMP39]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[R_057:%.]] = phi float [ 0.000000e+00, [[FOR_BODY_LR_PH]] ], [ [[ADD10:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_057:%.]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD10:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[G_056:%.]] = phi float [ 0.000000e+00, [[FOR_BODY_LR_PH]] ], [ [[ADD20:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[G_056:%.]] = phi float [ [[BC_MERGE_RDX8]], [[SCALAR_PH]] ], [ [[ADD20:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[V_055:%.]] = phi i64 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[V_055:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[B_054:%.]] = phi float [ 0.000000e+00, [[FOR_BODY_LR_PH]] ], [ [[ADD30:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[B_054:%.]] = phi float [ [[BC_MERGE_RDX9]], [[SCALAR_PH]] ], [ [[ADD30:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ADD:%.]] = add i64 [[V_055]], [[OFFSET:%.]]			; CHECK-NEXT: [[ADD:%.*]] = add i64 [[V_055]], [[OFFSET]]
	; CHECK-NEXT: [[MUL:%.*]] = mul i64 [[ADD]], 3			; CHECK-NEXT: [[MUL:%.*]] = mul i64 [[ADD]], 3
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, i64 [[MUL]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, i64 [[MUL]]
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP42:%.]] = load float, float [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel, i64 0, i64 [[V_055]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel, i64 0, i64 [[V_055]]
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[TMP43:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[MUL3:%.*]] = fmul fast float [[TMP0]], [[TMP1]]			; CHECK-NEXT: [[MUL3:%.*]] = fmul fast float [[TMP42]], [[TMP43]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel2, i64 0, i64 [[V_055]]			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel2, i64 0, i64 [[V_055]]
	; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX4]], align 4			; CHECK-NEXT: [[TMP44:%.]] = load float, float [[ARRAYIDX4]], align 4
	; CHECK-NEXT: [[MUL5:%.*]] = fmul fast float [[MUL3]], [[TMP2]]			; CHECK-NEXT: [[MUL5:%.*]] = fmul fast float [[MUL3]], [[TMP44]]
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel3, i64 0, i64 [[V_055]]			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel3, i64 0, i64 [[V_055]]
	; CHECK-NEXT: [[TMP3:%.]] = load float, float [[ARRAYIDX6]], align 4			; CHECK-NEXT: [[TMP45:%.]] = load float, float [[ARRAYIDX6]], align 4
	; CHECK-NEXT: [[MUL7:%.*]] = fmul fast float [[MUL5]], [[TMP3]]			; CHECK-NEXT: [[MUL7:%.*]] = fmul fast float [[MUL5]], [[TMP45]]
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel4, i64 0, i64 [[V_055]]			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds [512 x float], [512 x float] @kernel4, i64 0, i64 [[V_055]]
	; CHECK-NEXT: [[TMP4:%.]] = load float, float [[ARRAYIDX8]], align 4			; CHECK-NEXT: [[TMP46:%.]] = load float, float [[ARRAYIDX8]], align 4
	; CHECK-NEXT: [[MUL9:%.*]] = fmul fast float [[MUL7]], [[TMP4]]			; CHECK-NEXT: [[MUL9:%.*]] = fmul fast float [[MUL7]], [[TMP46]]
	; CHECK-NEXT: [[ADD10]] = fadd fast float [[R_057]], [[MUL9]]			; CHECK-NEXT: [[ADD10]] = fadd fast float [[R_057]], [[MUL9]]
	; CHECK-NEXT: [[ARRAYIDX_SUM:%.*]] = add i64 [[MUL]], 1			; CHECK-NEXT: [[ARRAYIDX_SUM:%.*]] = add i64 [[MUL]], 1
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, i64 [[ARRAYIDX_SUM]]			; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, i64 [[ARRAYIDX_SUM]]
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX11]], align 4			; CHECK-NEXT: [[TMP47:%.]] = load float, float [[ARRAYIDX11]], align 4
	; CHECK-NEXT: [[MUL13:%.*]] = fmul fast float [[TMP1]], [[TMP5]]			; CHECK-NEXT: [[MUL13:%.*]] = fmul fast float [[TMP43]], [[TMP47]]
	; CHECK-NEXT: [[MUL15:%.*]] = fmul fast float [[TMP2]], [[MUL13]]			; CHECK-NEXT: [[MUL15:%.*]] = fmul fast float [[TMP44]], [[MUL13]]
	; CHECK-NEXT: [[MUL17:%.*]] = fmul fast float [[TMP3]], [[MUL15]]			; CHECK-NEXT: [[MUL17:%.*]] = fmul fast float [[TMP45]], [[MUL15]]
	; CHECK-NEXT: [[MUL19:%.*]] = fmul fast float [[TMP4]], [[MUL17]]			; CHECK-NEXT: [[MUL19:%.*]] = fmul fast float [[TMP46]], [[MUL17]]
	; CHECK-NEXT: [[ADD20]] = fadd fast float [[G_056]], [[MUL19]]			; CHECK-NEXT: [[ADD20]] = fadd fast float [[G_056]], [[MUL19]]
	; CHECK-NEXT: [[ARRAYIDX_SUM52:%.*]] = add i64 [[MUL]], 2			; CHECK-NEXT: [[ARRAYIDX_SUM52:%.*]] = add i64 [[MUL]], 2
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, i64 [[ARRAYIDX_SUM52]]			; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds [1536 x float], [1536 x float] @src_data, i64 0, i64 [[ARRAYIDX_SUM52]]
	; CHECK-NEXT: [[TMP6:%.]] = load float, float [[ARRAYIDX21]], align 4			; CHECK-NEXT: [[TMP48:%.]] = load float, float [[ARRAYIDX21]], align 4
	; CHECK-NEXT: [[MUL23:%.*]] = fmul fast float [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[MUL23:%.*]] = fmul fast float [[TMP43]], [[TMP48]]
	; CHECK-NEXT: [[MUL25:%.*]] = fmul fast float [[TMP2]], [[MUL23]]			; CHECK-NEXT: [[MUL25:%.*]] = fmul fast float [[TMP44]], [[MUL23]]
	; CHECK-NEXT: [[MUL27:%.*]] = fmul fast float [[TMP3]], [[MUL25]]			; CHECK-NEXT: [[MUL27:%.*]] = fmul fast float [[TMP45]], [[MUL25]]
	; CHECK-NEXT: [[MUL29:%.*]] = fmul fast float [[TMP4]], [[MUL27]]			; CHECK-NEXT: [[MUL29:%.*]] = fmul fast float [[TMP46]], [[MUL27]]
	; CHECK-NEXT: [[ADD30]] = fadd fast float [[B_054]], [[MUL29]]			; CHECK-NEXT: [[ADD30]] = fadd fast float [[B_054]], [[MUL29]]
	; CHECK-NEXT: [[INC]] = add i64 [[V_055]], 1			; CHECK-NEXT: [[INC]] = add i64 [[V_055]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INC]], [[SIZE]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INC]], [[SIZE]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_COND_FOR_END_CRIT_EDGE]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: for.cond.for.end_crit_edge:			; CHECK: for.cond.for.end_crit_edge:
	; CHECK-NEXT: [[ADD30_LCSSA:%.*]] = phi float [ [[ADD30]], [[FOR_BODY]] ]			; CHECK-NEXT: [[ADD30_LCSSA:%.*]] = phi float [ [[ADD30]], [[FOR_BODY]] ], [ [[TMP39]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[ADD20_LCSSA:%.*]] = phi float [ [[ADD20]], [[FOR_BODY]] ]			; CHECK-NEXT: [[ADD20_LCSSA:%.*]] = phi float [ [[ADD20]], [[FOR_BODY]] ], [ [[TMP40]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[ADD10_LCSSA:%.*]] = phi float [ [[ADD10]], [[FOR_BODY]] ]			; CHECK-NEXT: [[ADD10_LCSSA:%.*]] = phi float [ [[ADD10]], [[FOR_BODY]] ], [ [[TMP41]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[PHITMP:%.*]] = fptoui float [[ADD10_LCSSA]] to i8			; CHECK-NEXT: [[PHITMP:%.*]] = fptoui float [[ADD10_LCSSA]] to i8
	; CHECK-NEXT: [[PHITMP60:%.*]] = fptoui float [[ADD20_LCSSA]] to i8			; CHECK-NEXT: [[PHITMP60:%.*]] = fptoui float [[ADD20_LCSSA]] to i8
	; CHECK-NEXT: [[PHITMP61:%.*]] = fptoui float [[ADD30_LCSSA]] to i8			; CHECK-NEXT: [[PHITMP61:%.*]] = fptoui float [[ADD30_LCSSA]] to i8
	; CHECK-NEXT: br label [[FOR_END]]			; CHECK-NEXT: br label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i8 [ [[PHITMP]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i8 [ [[PHITMP]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[G_0_LCSSA:%.*]] = phi i8 [ [[PHITMP60]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[G_0_LCSSA:%.*]] = phi i8 [ [[PHITMP60]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: [[B_0_LCSSA:%.*]] = phi i8 [ [[PHITMP61]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[B_0_LCSSA:%.*]] = phi i8 [ [[PHITMP61]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY]] ]
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

	Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines
	define i32 @test_invariant_address(i64 %len, i1* %test_base) {			define i32 @test_invariant_address(i64 %len, i1* %test_base) {
	; CHECK-LABEL: @test_invariant_address(			; CHECK-LABEL: @test_invariant_address(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [4096 x i32], align 4			; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [4096 x i32], align 4
	; CHECK-NEXT: [[BASE:%.]] = bitcast [4096 x i32] [[ALLOCA]] to i32*			; CHECK-NEXT: [[BASE:%.]] = bitcast [4096 x i32] [[ALLOCA]] to i32*
	; CHECK-NEXT: call void @init(i32* [[BASE]])			; CHECK-NEXT: call void @init(i32* [[BASE]])
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i32> poison, i32* [[BASE]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32*> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.]] = insertelement <4 x i32> poison, i32* [[BASE]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT5:%.]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT4]], <4 x i32*> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.]] = insertelement <4 x i32> poison, i32* [[BASE]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT8:%.]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT7]], <4 x i32*> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT10:%.]] = insertelement <4 x i32> poison, i32* [[BASE]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT11:%.]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT10]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP100:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP68:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP101:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP69:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP102:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP70:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP103:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP71:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6			; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
	; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7			; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[BROADCAST_SPLAT]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
	; CHECK-NEXT: [[TMP65:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[WIDE_MASKED_GATHER6:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[BROADCAST_SPLAT5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
	; CHECK-NEXT: [[TMP66:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[WIDE_MASKED_GATHER9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[BROADCAST_SPLAT8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
	; CHECK-NEXT: [[TMP67:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[WIDE_MASKED_GATHER12:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[BROADCAST_SPLAT11]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
	; CHECK-NEXT: [[TMP68:%.*]] = insertelement <4 x i32> poison, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP64:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP69:%.*]] = insertelement <4 x i32> [[TMP68]], i32 [[TMP65]], i32 1			; CHECK-NEXT: [[TMP65:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP70:%.*]] = insertelement <4 x i32> [[TMP69]], i32 [[TMP66]], i32 2			; CHECK-NEXT: [[TMP66:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP71:%.*]] = insertelement <4 x i32> [[TMP70]], i32 [[TMP67]], i32 3			; CHECK-NEXT: [[TMP67:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP72:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_GATHER]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP73:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[PREDPHI13:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_GATHER6]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP74:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[PREDPHI14:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[WIDE_MASKED_GATHER9]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP75:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[PREDPHI15:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[WIDE_MASKED_GATHER12]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP76:%.*]] = insertelement <4 x i32> poison, i32 [[TMP72]], i32 0			; CHECK-NEXT: [[TMP68]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]
	; CHECK-NEXT: [[TMP77:%.*]] = insertelement <4 x i32> [[TMP76]], i32 [[TMP73]], i32 1			; CHECK-NEXT: [[TMP69]] = add <4 x i32> [[VEC_PHI1]], [[PREDPHI13]]
	; CHECK-NEXT: [[TMP78:%.*]] = insertelement <4 x i32> [[TMP77]], i32 [[TMP74]], i32 2			; CHECK-NEXT: [[TMP70]] = add <4 x i32> [[VEC_PHI2]], [[PREDPHI14]]
	; CHECK-NEXT: [[TMP79:%.*]] = insertelement <4 x i32> [[TMP78]], i32 [[TMP75]], i32 3			; CHECK-NEXT: [[TMP71]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI15]]
	; CHECK-NEXT: [[TMP80:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP81:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP82:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP83:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP84:%.*]] = insertelement <4 x i32> poison, i32 [[TMP80]], i32 0
	; CHECK-NEXT: [[TMP85:%.*]] = insertelement <4 x i32> [[TMP84]], i32 [[TMP81]], i32 1
	; CHECK-NEXT: [[TMP86:%.*]] = insertelement <4 x i32> [[TMP85]], i32 [[TMP82]], i32 2
	; CHECK-NEXT: [[TMP87:%.*]] = insertelement <4 x i32> [[TMP86]], i32 [[TMP83]], i32 3
	; CHECK-NEXT: [[TMP88:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP89:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP90:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP91:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: [[TMP92:%.*]] = insertelement <4 x i32> poison, i32 [[TMP88]], i32 0
	; CHECK-NEXT: [[TMP93:%.*]] = insertelement <4 x i32> [[TMP92]], i32 [[TMP89]], i32 1
	; CHECK-NEXT: [[TMP94:%.*]] = insertelement <4 x i32> [[TMP93]], i32 [[TMP90]], i32 2
	; CHECK-NEXT: [[TMP95:%.*]] = insertelement <4 x i32> [[TMP94]], i32 [[TMP91]], i32 3
	; CHECK-NEXT: [[TMP96:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP97:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP98:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP99:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[TMP71]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI4:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[TMP79]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI5:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[TMP87]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI6:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[TMP95]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP100]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]
	; CHECK-NEXT: [[TMP101]] = add <4 x i32> [[VEC_PHI1]], [[PREDPHI4]]
	; CHECK-NEXT: [[TMP102]] = add <4 x i32> [[VEC_PHI2]], [[PREDPHI5]]
	; CHECK-NEXT: [[TMP103]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI6]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP104:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP72:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP104]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP72]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP101]], [[TMP100]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP69]], [[TMP68]]
	; CHECK-NEXT: [[BIN_RDX7:%.*]] = add <4 x i32> [[TMP102]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX16:%.*]] = add <4 x i32> [[TMP70]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <4 x i32> [[TMP103]], [[BIN_RDX7]]			; CHECK-NEXT: [[BIN_RDX17:%.*]] = add <4 x i32> [[TMP71]], [[BIN_RDX16]]
	; CHECK-NEXT: [[TMP105:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX8]])			; CHECK-NEXT: [[TMP73:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX17]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP105]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP73]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LATCH]] ]			; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LATCH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
	; CHECK-NEXT: [[TEST_ADDR:%.]] = getelementptr inbounds i1, i1 [[TEST_BASE]], i64 [[IV]]			; CHECK-NEXT: [[TEST_ADDR:%.]] = getelementptr inbounds i1, i1 [[TEST_BASE]], i64 [[IV]]
	; CHECK-NEXT: [[EARLYCND:%.]] = load i1, i1 [[TEST_ADDR]], align 1			; CHECK-NEXT: [[EARLYCND:%.]] = load i1, i1 [[TEST_ADDR]], align 1
	; CHECK-NEXT: br i1 [[EARLYCND]], label [[PRED:%.*]], label [[LATCH]]			; CHECK-NEXT: br i1 [[EARLYCND]], label [[PRED:%.*]], label [[LATCH]]
	; CHECK: pred:			; CHECK: pred:
	; CHECK-NEXT: [[VAL:%.]] = load i32, i32 [[BASE]], align 4			; CHECK-NEXT: [[VAL:%.]] = load i32, i32 [[BASE]], align 4
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[VAL_PHI:%.*]] = phi i32 [ 0, [[LOOP]] ], [ [[VAL]], [[PRED]] ]			; CHECK-NEXT: [[VAL_PHI:%.*]] = phi i32 [ 0, [[LOOP]] ], [ [[VAL]], [[PRED]] ]
	; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL_PHI]]			; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL_PHI]]
	; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094			; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094
	; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: loop_exit:			; CHECK: loop_exit:
	; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LATCH]] ], [ [[TMP105]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LATCH]] ], [ [[TMP73]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	%alloca = alloca [4096 x i32]			%alloca = alloca [4096 x i32]
	%base = bitcast [4096 x i32]* %alloca to i32*			%base = bitcast [4096 x i32]* %alloca to i32*
	call void @init(i32* %base)			call void @init(i32* %base)
	br label %loop			br label %loop
	loop:			loop:
	▲ Show 20 Lines • Show All 2,249 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s
				RKSimonUnsubmitted Done Reply Inline Actions pre-commit? RKSimon: pre-commit?

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; A tricky loop:			; A tricky loop:
	;			;
	; void loop(int a, int b) {			; void loop(int a, int b) {
	; for (int i = 0; i < 512; ++i) {			; for (int i = 0; i < 512; ++i) {
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, !llvm.access.group !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !llvm.access.group !0			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[TMP4:%.*]] = sext <4 x i32> [[WIDE_LOAD1]] to <4 x i64>			; CHECK-NEXT: [[TMP4:%.*]] = sext <4 x i32> [[WIDE_LOAD1]] to <4 x i64>
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[A]], <4 x i64> [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP5]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> [[WIDE_LOAD]], <4 x i32*> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !llvm.access.group !1
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i64> [[TMP4]], i32 2			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP9]]			; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x i32>, <4 x i32> [[TMP8]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i64> [[TMP4]], i32 3			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP11]]			; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD2]], <4 x i32>* [[TMP9]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 0
	; CHECK-NEXT: store i32 [[TMP13]], i32* [[TMP6]], align 4, !llvm.access.group !1
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 1
	; CHECK-NEXT: store i32 [[TMP14]], i32* [[TMP8]], align 4, !llvm.access.group !1
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 2
	; CHECK-NEXT: store i32 [[TMP15]], i32* [[TMP10]], align 4, !llvm.access.group !1
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 3
	; CHECK-NEXT: store i32 [[TMP16]], i32* [[TMP12]], align 4, !llvm.access.group !1
	; CHECK-NEXT: [[TMP17:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x i32>, <4 x i32> [[TMP19]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[TMP20:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD2]], <4 x i32>* [[TMP20]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
	; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

	Show First 20 Lines • Show All 474 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[BASE:%.]] = bitcast [4096 x i32] [[ALLOCA]] to i32*			; CHECK-NEXT: [[BASE:%.]] = bitcast [4096 x i32] [[ALLOCA]] to i32*
	; CHECK-NEXT: call void @init(i32* [[BASE]])			; CHECK-NEXT: call void @init(i32* [[BASE]])
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP52:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP53:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6			; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
	; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7			; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
	; CHECK-NEXT: [[TMP8:%.*]] = udiv <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>			; CHECK-NEXT: [[TMP8:%.*]] = udiv <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>
	; CHECK-NEXT: [[TMP9:%.*]] = udiv <4 x i64> [[STEP_ADD]], <i64 8, i64 8, i64 8, i64 8>			; CHECK-NEXT: [[TMP9:%.*]] = udiv <4 x i64> [[STEP_ADD]], <i64 8, i64 8, i64 8, i64 8>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i64> [[TMP8]], i32 0			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE:%.*]], <4 x i64> [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE:%.*]], i64 [[TMP10]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], <4 x i64> [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i64> [[TMP8]], i32 1			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8> [[TMP10]], i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[WIDE_MASKED_GATHER3:%.]] = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8> [[TMP11]], i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i64> [[TMP8]], i32 2			; CHECK-NEXT: [[TMP12:%.*]] = urem <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[TMP14]]			; CHECK-NEXT: [[TMP13:%.*]] = urem <4 x i64> [[STEP_ADD]], <i64 8, i64 8, i64 8, i64 8>
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i64> [[TMP8]], i32 3			; CHECK-NEXT: [[TMP14:%.*]] = trunc <4 x i64> [[TMP12]] to <4 x i8>
	; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[TMP16]]			; CHECK-NEXT: [[TMP15:%.*]] = trunc <4 x i64> [[TMP13]] to <4 x i8>
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i64> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = lshr <4 x i8> [[WIDE_MASKED_GATHER]], [[TMP14]]
	; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[TMP18]]			; CHECK-NEXT: [[TMP17:%.*]] = lshr <4 x i8> [[WIDE_MASKED_GATHER3]], [[TMP15]]
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x i64> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP18:%.*]] = and <4 x i8> [[TMP16]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[TMP20]]			; CHECK-NEXT: [[TMP19:%.*]] = and <4 x i8> [[TMP17]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i64> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP20:%.*]] = zext <4 x i8> [[TMP18]] to <4 x i32>
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[TMP22]]			; CHECK-NEXT: [[TMP21:%.*]] = zext <4 x i8> [[TMP19]] to <4 x i32>
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i64> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP22]] = add <4 x i32> [[VEC_PHI]], [[TMP20]]
	; CHECK-NEXT: [[TMP25:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[TMP24]]			; CHECK-NEXT: [[TMP23]] = add <4 x i32> [[VEC_PHI2]], [[TMP21]]
	; CHECK-NEXT: [[TMP26:%.]] = load i8, i8 [[TMP11]], align 1
	; CHECK-NEXT: [[TMP27:%.]] = load i8, i8 [[TMP13]], align 1
	; CHECK-NEXT: [[TMP28:%.]] = load i8, i8 [[TMP15]], align 1
	; CHECK-NEXT: [[TMP29:%.]] = load i8, i8 [[TMP17]], align 1
	; CHECK-NEXT: [[TMP30:%.*]] = insertelement <4 x i8> poison, i8 [[TMP26]], i32 0
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <4 x i8> [[TMP30]], i8 [[TMP27]], i32 1
	; CHECK-NEXT: [[TMP32:%.*]] = insertelement <4 x i8> [[TMP31]], i8 [[TMP28]], i32 2
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <4 x i8> [[TMP32]], i8 [[TMP29]], i32 3
	; CHECK-NEXT: [[TMP34:%.]] = load i8, i8 [[TMP19]], align 1
	; CHECK-NEXT: [[TMP35:%.]] = load i8, i8 [[TMP21]], align 1
	; CHECK-NEXT: [[TMP36:%.]] = load i8, i8 [[TMP23]], align 1
	; CHECK-NEXT: [[TMP37:%.]] = load i8, i8 [[TMP25]], align 1
	; CHECK-NEXT: [[TMP38:%.*]] = insertelement <4 x i8> poison, i8 [[TMP34]], i32 0
	; CHECK-NEXT: [[TMP39:%.*]] = insertelement <4 x i8> [[TMP38]], i8 [[TMP35]], i32 1
	; CHECK-NEXT: [[TMP40:%.*]] = insertelement <4 x i8> [[TMP39]], i8 [[TMP36]], i32 2
	; CHECK-NEXT: [[TMP41:%.*]] = insertelement <4 x i8> [[TMP40]], i8 [[TMP37]], i32 3
	; CHECK-NEXT: [[TMP42:%.*]] = urem <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>
	; CHECK-NEXT: [[TMP43:%.*]] = urem <4 x i64> [[STEP_ADD]], <i64 8, i64 8, i64 8, i64 8>
	; CHECK-NEXT: [[TMP44:%.*]] = trunc <4 x i64> [[TMP42]] to <4 x i8>
	; CHECK-NEXT: [[TMP45:%.*]] = trunc <4 x i64> [[TMP43]] to <4 x i8>
	; CHECK-NEXT: [[TMP46:%.*]] = lshr <4 x i8> [[TMP33]], [[TMP44]]
	; CHECK-NEXT: [[TMP47:%.*]] = lshr <4 x i8> [[TMP41]], [[TMP45]]
	; CHECK-NEXT: [[TMP48:%.*]] = and <4 x i8> [[TMP46]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP49:%.*]] = and <4 x i8> [[TMP47]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP50:%.*]] = zext <4 x i8> [[TMP48]] to <4 x i32>
	; CHECK-NEXT: [[TMP51:%.*]] = zext <4 x i8> [[TMP49]] to <4 x i32>
	; CHECK-NEXT: [[TMP52]] = add <4 x i32> [[VEC_PHI]], [[TMP50]]
	; CHECK-NEXT: [[TMP53]] = add <4 x i32> [[VEC_PHI2]], [[TMP51]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP54:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP54]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP53]], [[TMP52]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP23]], [[TMP22]]
	; CHECK-NEXT: [[TMP55:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])			; CHECK-NEXT: [[TMP25:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP55]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP]] ]			; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
	; CHECK-NEXT: [[BYTE:%.*]] = udiv i64 [[IV]], 8			; CHECK-NEXT: [[BYTE:%.*]] = udiv i64 [[IV]], 8
	; CHECK-NEXT: [[TEST_ADDR:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[BYTE]]			; CHECK-NEXT: [[TEST_ADDR:%.]] = getelementptr inbounds i8, i8 [[TEST_BASE]], i64 [[BYTE]]
	; CHECK-NEXT: [[EARLYCND:%.]] = load i8, i8 [[TEST_ADDR]], align 1			; CHECK-NEXT: [[EARLYCND:%.]] = load i8, i8 [[TEST_ADDR]], align 1
	; CHECK-NEXT: [[BIT:%.*]] = urem i64 [[IV]], 8			; CHECK-NEXT: [[BIT:%.*]] = urem i64 [[IV]], 8
	; CHECK-NEXT: [[BIT_TRUNC:%.*]] = trunc i64 [[BIT]] to i8			; CHECK-NEXT: [[BIT_TRUNC:%.*]] = trunc i64 [[BIT]] to i8
	; CHECK-NEXT: [[MASK:%.*]] = lshr i8 [[EARLYCND]], [[BIT_TRUNC]]			; CHECK-NEXT: [[MASK:%.*]] = lshr i8 [[EARLYCND]], [[BIT_TRUNC]]
	; CHECK-NEXT: [[TEST:%.*]] = and i8 [[MASK]], 1			; CHECK-NEXT: [[TEST:%.*]] = and i8 [[MASK]], 1
	; CHECK-NEXT: [[VAL:%.*]] = zext i8 [[TEST]] to i32			; CHECK-NEXT: [[VAL:%.*]] = zext i8 [[TEST]] to i32
	; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL]]			; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL]]
	; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094			; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094
	; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: loop_exit:			; CHECK: loop_exit:
	; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LOOP]] ], [ [[TMP55]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LOOP]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	%alloca = alloca [4096 x i32]			%alloca = alloca [4096 x i32]
	%base = bitcast [4096 x i32]* %alloca to i32*			%base = bitcast [4096 x i32]* %alloca to i32*
	call void @init(i32* %base)			call void @init(i32* %base)
	br label %loop			br label %loop
	loop:			loop:
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; therefore the widest type should be i16.			; therefore the widest type should be i16.
	; int* p[2048][8];			; int* p[2048][8];
	; short q[2048];			; short q[2048];
	; for (int y = 0; y < 8; ++y)			; for (int y = 0; y < 8; ++y)
	; for (int i = 0; i < 1024; ++i) {			; for (int i = 0; i < 1024; ++i) {
	; p[i][y] = (int*) (1 + q[i]);			; p[i][y] = (int*) (1 + q[i]);
	; }			; }
	; CHECK: test_nonconsecutive_store			; CHECK: test_nonconsecutive_store
	; CHECK: The Smallest and Widest types: 16 / 16 bits.			; CHECK: The Smallest and Widest types: 16 / 64 bits.
	define void @test_nonconsecutive_store() nounwind ssp uwtable {			define void @test_nonconsecutive_store() nounwind ssp uwtable {
	br label %1			br label %1

	; <label>:1 ; preds = %14, %0			; <label>:1 ; preds = %14, %0
	%2 = phi i64 [ 0, %0 ], [ %15, %14 ]			%2 = phi i64 [ 0, %0 ], [ %15, %14 ]
	br label %3			br label %3

	; <label>:3 ; preds = %3, %1			; <label>:3 ; preds = %3, %1
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

	; <label>:12 ; preds = %1			; <label>:12 ; preds = %1
	%13 = phi i8 [ %9, %1 ]			%13 = phi i8 [ %9, %1 ]
	ret i8 %13			ret i8 %13
	}			}

	;; However, we should not take unconsecutive loads of pointers into account.			;; However, we should not take unconsecutive loads of pointers into account.
	; CHECK: test_nonconsecutive_ptr_load			; CHECK: test_nonconsecutive_ptr_load
	; CHECK: LV: The Smallest and Widest types: 16 / 16 bits.			; CHECK: LV: The Smallest and Widest types: 16 / 64 bits.
	define void @test_nonconsecutive_ptr_load() nounwind ssp uwtable {			define void @test_nonconsecutive_ptr_load() nounwind ssp uwtable {
	br label %1			br label %1

	; <label>:1 ; preds = %13, %0			; <label>:1 ; preds = %13, %0
	%2 = phi i64 [ 0, %0 ], [ %14, %13 ]			%2 = phi i64 [ 0, %0 ], [ %14, %13 ]
	br label %3			br label %3

	; <label>:3 ; preds = %3, %1			; <label>:3 ; preds = %3, %1
	Show All 22 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

	Show First 20 Lines • Show All 799 Lines • ▼ Show 20 Lines
	define dso_local void @unconditional_strided1_optsize(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 zeroext %guard) local_unnamed_addr optsize {			define dso_local void @unconditional_strided1_optsize(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 zeroext %guard) local_unnamed_addr optsize {
	; DISABLED_MASKED_STRIDED-LABEL: @unconditional_strided1_optsize(			; DISABLED_MASKED_STRIDED-LABEL: @unconditional_strided1_optsize(
	; DISABLED_MASKED_STRIDED-NEXT: entry:			; DISABLED_MASKED_STRIDED-NEXT: entry:
	; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; DISABLED_MASKED_STRIDED: vector.body:			; DISABLED_MASKED_STRIDED: vector.body:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[TMP0]], i32 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], <8 x i32> [[TMP0]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; DISABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <8 x i8> @llvm.masked.gather.v8i8.v8p0i8(<8 x i8> [[TMP1]], i32 1, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i8> undef)
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP0]], i32 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP3]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <8 x i8>*
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP0]], i32 2			; DISABLED_MASKED_STRIDED-NEXT: store <8 x i8> [[WIDE_MASKED_GATHER]], <8 x i8>* [[TMP3]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP5]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP0]], i32 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP7]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP0]], i32 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP9]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP0]], i32 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP12:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP11]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[TMP0]], i32 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP14:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP13]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP15:%.*]] = extractelement <8 x i32> [[TMP0]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP15]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP17:%.]] = load i8, i8 [[TMP2]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP18:%.]] = load i8, i8 [[TMP4]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP19:%.]] = load i8, i8 [[TMP6]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP20:%.]] = load i8, i8 [[TMP8]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP21:%.]] = load i8, i8 [[TMP10]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP22:%.]] = load i8, i8 [[TMP12]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP23:%.]] = load i8, i8 [[TMP14]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP24:%.]] = load i8, i8 [[TMP16]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP25:%.*]] = insertelement <8 x i8> poison, i8 [[TMP17]], i32 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP26:%.*]] = insertelement <8 x i8> [[TMP25]], i8 [[TMP18]], i32 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP27:%.*]] = insertelement <8 x i8> [[TMP26]], i8 [[TMP19]], i32 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP28:%.*]] = insertelement <8 x i8> [[TMP27]], i8 [[TMP20]], i32 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP29:%.*]] = insertelement <8 x i8> [[TMP28]], i8 [[TMP21]], i32 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP30:%.*]] = insertelement <8 x i8> [[TMP29]], i8 [[TMP22]], i32 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP31:%.*]] = insertelement <8 x i8> [[TMP30]], i8 [[TMP23]], i32 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP32:%.*]] = insertelement <8 x i8> [[TMP31]], i8 [[TMP24]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP33:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP34:%.]] = bitcast i8 [[TMP33]] to <8 x i8>*
	; DISABLED_MASKED_STRIDED-NEXT: store <8 x i8> [[TMP32]], <8 x i8>* [[TMP34]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP35:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP35]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP4]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: @unconditional_strided1_optsize(			; ENABLED_MASKED_STRIDED-LABEL: @unconditional_strided1_optsize(
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	▲ Show 20 Lines • Show All 1,511 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll

	Show All 23 Lines
	; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; DISABLED_MASKED_STRIDED: vector.body:			; DISABLED_MASKED_STRIDED: vector.body:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[INDEX]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.]] = bitcast i16 [[TMP0]] to <4 x i16>*			; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.]] = bitcast i16 [[TMP0]] to <4 x i16>*
	; DISABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 2			; DISABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw <4 x i64> [[VEC_IND]], <i64 2, i64 2, i64 2, i64 2>			; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw <4 x i64> [[VEC_IND]], <i64 2, i64 2, i64 2, i64 2>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = extractelement <4 x i64> [[TMP2]], i32 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[POINTS:%.*]], <4 x i64> [[TMP2]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[POINTS:%.*]], i64 [[TMP3]]			; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.scatter.v4i16.v4p0i16(<4 x i16> [[WIDE_LOAD]], <4 x i16*> [[TMP3]], i32 2, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP2]], i32 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[Y:%.*]], i64 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[POINTS]], i64 [[TMP5]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP2]], i32 2			; DISABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i16>, <4 x i16> [[TMP5]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[POINTS]], i64 [[TMP7]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or <4 x i64> [[TMP2]], <i64 1, i64 1, i64 1, i64 1>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = extractelement <4 x i64> [[TMP2]], i32 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[POINTS]], <4 x i64> [[TMP6]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = getelementptr inbounds i16, i16 [[POINTS]], i64 [[TMP9]]			; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.scatter.v4i16.v4p0i16(<4 x i16> [[WIDE_LOAD1]], <4 x i16*> [[TMP7]], i32 2, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 0
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP11]], i16* [[TMP4]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP12:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 1
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP12]], i16* [[TMP6]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP13]], i16* [[TMP8]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP14:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP14]], i16* [[TMP10]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP15:%.]] = getelementptr inbounds i16, i16 [[Y:%.*]], i64 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP16:%.]] = bitcast i16 [[TMP15]] to <4 x i16>*
	; DISABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i16>, <4 x i16> [[TMP16]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP17:%.*]] = or <4 x i64> [[TMP2]], <i64 1, i64 1, i64 1, i64 1>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP18:%.*]] = extractelement <4 x i64> [[TMP17]], i32 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP19:%.]] = getelementptr inbounds i16, i16 [[POINTS]], i64 [[TMP18]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP20:%.*]] = extractelement <4 x i64> [[TMP17]], i32 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP21:%.]] = getelementptr inbounds i16, i16 [[POINTS]], i64 [[TMP20]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP22:%.*]] = extractelement <4 x i64> [[TMP17]], i32 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP23:%.]] = getelementptr inbounds i16, i16 [[POINTS]], i64 [[TMP22]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP24:%.*]] = extractelement <4 x i64> [[TMP17]], i32 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP25:%.]] = getelementptr inbounds i16, i16 [[POINTS]], i64 [[TMP24]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP26:%.*]] = extractelement <4 x i16> [[WIDE_LOAD1]], i32 0
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP26]], i16* [[TMP19]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP27:%.*]] = extractelement <4 x i16> [[WIDE_LOAD1]], i32 1
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP27]], i16* [[TMP21]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP28:%.*]] = extractelement <4 x i16> [[WIDE_LOAD1]], i32 2
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP28]], i16* [[TMP23]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP29:%.*]] = extractelement <4 x i16> [[WIDE_LOAD1]], i32 3
	; DISABLED_MASKED_STRIDED-NEXT: store i16 [[TMP29]], i16* [[TMP25]], align 2
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP30:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; DISABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP30]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP8]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: @test1(			; ENABLED_MASKED_STRIDED-LABEL: @test1(
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.]] = getelementptr inbounds i16, i16 [[POINTS:%.*]], i64 -1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.]] = getelementptr inbounds i16, i16 [[POINTS:%.*]], i64 -1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][LV][TTI][Costmodel] LoopVectorizer: don't use `TTI::isLegalMaskedGather()` hook, introduce `TTI::shouldUseMaskedGatherForVectorization()`AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 377664

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/gather-i32-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/gather-i64-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/gather-i8-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll

llvm/test/Analysis/CostModel/X86/scatter-i16-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/scatter-i32-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/scatter-i64-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/scatter-i8-with-i8-index.ll

llvm/test/Transforms/LoopVectorize/X86/gather-cost.ll

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll

[X86][LV][TTI][Costmodel] LoopVectorizer: don't use `TTI::isLegalMaskedGather()` hook, introduce `TTI::shouldUseMaskedGatherForVectorization()`
AbandonedPublic