This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1
LoopVectorizationPlanner.h
1
LoopVectorize.cpp
2/6
VPlan.h
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
interleave-load-i32.ll
-
interleave-store-i32.ll
-
interleaved-load-float.ll
-
interleaved-load-i8.ll
-
interleaved-load-store-double.ll
-
interleaved-load-store-i64.ll
-
interleaved-store-i8.ll
-
strided-load-i16.ll
-
strided-load-i32.ll
-
strided-load-i64.ll
-
strided-load-i8.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
AArch64/
-
aarch64-predication.ll
-
costmodel.ll
-
extractvalue-no-scalarization-required.ll
-
interleaved-vs-scalar.ll
-
interleaved_cost.ll
-
no_vector_instructions.ll
-
predication_costs.ll
-
ARM/
-
interleaved_cost.ll
-
mve-interleaved-cost.ll
-
mve-shiftcost.ll
-
SystemZ/
-
branch-for-predicated-block.ll
-
load-scalarization-cost-0.ll
-
load-scalarization-cost-1.ll
-
load-store-scalarization-cost.ll
-
mem-interleaving-costs-02.ll
-
mem-interleaving-costs.ll
-
X86/
-
fneg-cost.ll
-
fp_to_sint8-cost-model.ll
-
mul_slm_16bit.ll
-
reduction-small-size.ll
-
redundant-vf2-cost.ll
-
uint64_to_fp64-cost-model.ll
-
uniformshift.ll
-
vector-scalar-select-cost.ll
-
consecutive-ptr-uniforms.ll
-
loop-scalars.ll
-
phi-cost.ll

Differential D89322

[LV] Initial VPlan cost modelling
AbandonedPublic

Authored by dmgreen on Oct 13 2020, 8:44 AM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal
gilr
hsaito
SjoerdMeijer
rengolin
nadav

Summary

This adds the initial skeleton and cost modelling needed to cost vplans. This replaces the current method of summing the cost of each instruction in the loop body.

It currently attempts to fairly precisely mimic the existing code model in order to not introduce too many regressions at once. As a result some of the decisions it makes are not optimal, notable in how predication is handled.

The basic scheme is to call cost() on VPlans, which recurses into VPBasicBlocks and into VPRecipes. Most cost() methods for individual recipes currently call CostModel->getInstructionCost, which will be refactored to call TTI hooks directly in future patches. In order to mimic the existing model a ReciprocalPredBlockProb is added to VPBasicBlock to model the old method of reducing the scalar cost for predicated blocks. This is known to be rather inaccurate, but if removed can lead to regressions. I will hopefully improve this bit somehow..

It passes all the llvm tests but can still causes differences for some code, especially around loops which were already close to the same score between vector factors. One common place I've seen is that the backedge cost was often over-estimated in the past. It will now correctly cost VPReduction recipes, which is nice but should only effect MVE. VPInstructions will follow in a subsequent patch, but may need to start including type information.

The patch adds an option, "cost-using-vplan", that can be used pick between the old method and the new. The idea is to switch to the new method and remove the old code path once any regressions are addressed.

Diff Detail

Event Timeline

dmgreen created this revision.Oct 13 2020, 8:44 AM

Herald added a reviewer: rengolin. · View Herald TranscriptOct 13 2020, 8:44 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bmahjour, psnobl, rogfer01 and 2 others. · View Herald Transcript

dmgreen requested review of this revision.Oct 13 2020, 8:44 AM

Herald added a subscriber: vkmr. · View Herald TranscriptOct 13 2020, 8:44 AM

bmahjour added inline comments.Oct 13 2020, 10:32 AM

llvm/lib/Transforms/Vectorize/VPlan.h
372	The cost-model is conceptually the structure that knows all about calculating costs. Instead of packaging the cost-model and legality and send it to the VPlan, wouldn't it make more sense to send the VPlan to the cost-model as it already has access to legality?

dmgreen added inline comments.Oct 13 2020, 1:56 PM

llvm/lib/Transforms/Vectorize/VPlan.h
372	The idea, as far as I understand, is that just as the execution of a vplan is separated into the recipes that make it up, so should the cost model. All the CM.getInstructionCost(..) methods will be replaced by the code that make them up - VPInterleaveRecipe will know how to cost interleaving groups, VPReductions will know how to cost reduction, VPWidenRecipes will be able to handle widened instructions etc The LoopVectorizationCostModel is the old monolithic way of doing things, which vplan is trying to move away from. This is only the first step of trying to clean that up.. As for this struct specifically, yeah it doesn't feel like the best. I was trying to follow VPTransformState but it doesn't contain much in it at the moment. Plus I apparently misspelt analysis.

Update some spelling.

bmahjour added inline comments.Oct 15 2020, 12:21 PM

llvm/lib/Transforms/Vectorize/VPlan.h
372	Ok, if the idea is to eventually remove the `LoopVectorizationCostModel` in the future, this change makes sense as a transitional step. By the way, I find it strange that we rely on `LoopVectorizationLegality` to get information about the cost! Looks like we only need `getWidestInductionType` from it, so can we put that in the context instead of the whole class?

dmgreen added a child revision: D89323: [LV] Costing for VPInstructions.Oct 17 2020, 1:31 AM

dmgreen added inline comments.Oct 17 2020, 9:18 AM

llvm/lib/Transforms/Vectorize/VPlan.h
372	Ah Good point. Some of this code was added and removed along the way. You are right that the Legality isn't really needed much at the moment. We may need it for some things in the future, but if we do can address those as they come up.

Adjust VPCostContext.

dmgreen mentioned this in D88152: [VPlan][WIP] VMULH via VPRecipeBase.Nov 1 2020, 11:08 PM

Thank you very much for putting up the patch to get things started!

I think it would be good to make sure the first steps with the cost model won't make it harder to separate the 'decision' and the 'assign cost' steps in the current cost model, in particular moving decisions out of the legacy cost model.

I tried to get started with this by moving the initial VPlan generation earlier (D90711) and then applying/updating this patch on top of that. I still like to experiment a bit, but I think this patch is a good first step, that would allow us to make progress in 2 directions:

move decisions out of the cost model
add VPlan transformations that require some costing.

hiraditya added a reviewer: nadav.Nov 3 2020, 2:04 PM

Hi Dave. I have a quick question. Have you considered tuning the current cost model? What are the compile time implications of using VPlans?

simoll added a subscriber: simoll.Nov 4 2020, 2:46 AM

steleman added a subscriber: steleman.Nov 4 2020, 8:13 AM

Kazhuu added a subscriber: Kazhuu.Mar 3 2021, 8:52 PM

Herald added a subscriber: tschuett. · View Herald TranscriptMar 3 2021, 8:52 PM

a.elovikov added a subscriber: a.elovikov.Mar 4 2021, 9:24 AM

a.elovikov added inline comments.

llvm/lib/Transforms/Vectorize/VPlan.h
372	Instead of packaging the cost-model and legality and send it to the VPlan, wouldn't it make more sense to send the VPlan to the cost-model... My +1 for putting CM code outside the VPlan itself. I prefer to see VPlan as a kind of IR in itself and hence it would make sense to keep it similar to LLVM IR CostModeling via separate class. In fact, I'd rather move `::execute` methods out of VPlan as well. Moving the cost modeling code out of VPlan classes might also make it easier to prototype/implement some advanced heuristics that would work on the basicblock/whole vplan level (i.e. some kinds of register pressure/spills/fills estimates) by just putting all the code locally and grouped by different level of details we want to account for. We might also consider cost modeling for either throughput or latency depending on the trip counts. IMO, that would also be easier to code if we'd have a separate class.

Matt added a subscriber: Matt.May 20 2021, 8:11 AM

rui.zhang added a subscriber: rui.zhang.Mar 8 2023, 10:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 10:02 AM

Herald added subscribers: • pcwang-thead, StephenFan. · View Herald Transcript

rengolin added inline comments.Mar 8 2023, 10:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
195	This API seems a bit weird to me. I'd expect code generation decisions to be an entity on its own, not some pair (which would very well used `std::pair` typedef). Today is the VF we're looking at, perhaps one day we'll want to look at UF costs (less branches), particular options of the plans themselves (split/reorder outer-loops in different ways), etc. So, for now, if it's just a return value wrapper, we can do with `std::pair` or `std::tuple` and use `auto [vplan, VF] = ...` to extract on call. For later, if we want to carry more info without passing them all as arguments, we should have an actual `VPlanResult` struct or something, with a back pointer to the VPlan and the parameter selection.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7127	So, whatever is the first VPlan with a VF we return? Is it guaranteed to be 1?
llvm/lib/Transforms/Vectorize/VPlan.h
97	Funny enough, here the use of `std::pair` is worse. In the code that uses it, there's a lot of: Cost.first += C.first; Cost.second \|= C.second; and the typedef offers no help. Here an actual struct would be more suitable: struct VectoriztionCostTy { unsigned cost; bool active; }

This too old to be useful now and I don't have any plans to work on it in the near term. (It would be good to see improvements though, where the vplan is costed more directly as opposed continuing to go through the IR instructions).

In D89322#4188849, @dmgreen wrote:

This too old to be useful now and I don't have any plans to work on it in the near term. (It would be good to see improvements though, where the vplan is costed more directly as opposed continuing to go through the IR instructions).

I was worried the necro-bump would lead to this, but at least we have some valid reasoning here and a good seed for a future effort to resurrect this.

arcbbb mentioned this in D158716: [RFC][LV] VPlan-based cost model.Aug 24 2023, 1:12 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorizationPlanner.h

15 lines

LoopVectorize.cpp

380 lines

VPlan.h

73 lines

test/

Analysis/

CostModel/

X86/

interleave-load-i32.ll

33 lines

interleave-store-i32.ll

33 lines

interleaved-load-float.ll

9 lines

interleaved-load-i8.ll

39 lines

interleaved-load-store-double.ll

9 lines

interleaved-load-store-i64.ll

9 lines

interleaved-store-i8.ll

39 lines

75 lines

63 lines

39 lines

87 lines

Transforms/

LoopVectorize/

AArch64/

aarch64-predication.ll

5 lines

costmodel.ll

217 lines

extractvalue-no-scalarization-required.ll

21 lines

interleaved-vs-scalar.ll

9 lines

interleaved_cost.ll

55 lines

no_vector_instructions.ll

9 lines

predication_costs.ll

40 lines

ARM/

interleaved_cost.ll

45 lines

mve-interleaved-cost.ll

572 lines

mve-shiftcost.ll

7 lines

SystemZ/

branch-for-predicated-block.ll

16 lines

load-scalarization-cost-0.ll

13 lines

load-scalarization-cost-1.ll

11 lines

load-store-scalarization-cost.ll

14 lines

mem-interleaving-costs-02.ll

37 lines

mem-interleaving-costs.ll

20 lines

X86/

fneg-cost.ll

6 lines

fp_to_sint8-cost-model.ll

2 lines

mul_slm_16bit.ll

28 lines

reduction-small-size.ll

87 lines

redundant-vf2-cost.ll

2 lines

uint64_to_fp64-cost-model.ll

12 lines

uniformshift.ll

6 lines

vector-scalar-select-cost.ll

4 lines

consecutive-ptr-uniforms.ll

13 lines

loop-scalars.ll

4 lines

phi-cost.ll

17 lines

Diff 298832

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	static VectorizationFactor Disabled() {
return {ElementCount::getFixed(1), 0};		return {ElementCount::getFixed(1), 0};
}		}

bool operator==(const VectorizationFactor &rhs) const {		bool operator==(const VectorizationFactor &rhs) const {
return Width == rhs.Width && Cost == rhs.Cost;		return Width == rhs.Width && Cost == rhs.Cost;
}		}
};		};

		/// A pair of VPlan and VectorizationFactor, used as the best result of costing
		/// different VPlans.
		struct VPlanVFPair {
		/// The Plan
		VPlan *Plan;
		/// The VF/Cost from costing
		VectorizationFactor VF;
		rengolinUnsubmitted Not Done Reply Inline Actions This API seems a bit weird to me. I'd expect code generation decisions to be an entity on its own, not some pair (which would very well used `std::pair` typedef). Today is the VF we're looking at, perhaps one day we'll want to look at UF costs (less branches), particular options of the plans themselves (split/reorder outer-loops in different ways), etc. So, for now, if it's just a return value wrapper, we can do with `std::pair` or `std::tuple` and use `auto [vplan, VF] = ...` to extract on call. For later, if we want to carry more info without passing them all as arguments, we should have an actual `VPlanResult` struct or something, with a back pointer to the VPlan and the parameter selection. rengolin: This API seems a bit weird to me. I'd expect code generation decisions to be an entity on its…
		};

/// Planner drives the vectorization process after having passed		/// Planner drives the vectorization process after having passed
/// Legality checks.		/// Legality checks.
class LoopVectorizationPlanner {		class LoopVectorizationPlanner {
/// The loop that we evaluate.		/// The loop that we evaluate.
Loop *OrigLoop;		Loop *OrigLoop;

/// Loop Info analysis.		/// Loop Info analysis.
LoopInfo *LI;		LoopInfo *LI;
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner(Loop L, LoopInfo LI, const TargetLibraryInfo *TLI,
LoopVectorizationCostModel &CM,		LoopVectorizationCostModel &CM,
InterleavedAccessInfo &IAI,		InterleavedAccessInfo &IAI,
PredicatedScalarEvolution &PSE)		PredicatedScalarEvolution &PSE)
: OrigLoop(L), LI(LI), TLI(TLI), TTI(TTI), Legal(Legal), CM(CM), IAI(IAI),		: OrigLoop(L), LI(LI), TLI(TLI), TTI(TTI), Legal(Legal), CM(CM), IAI(IAI),
PSE(PSE) {}		PSE(PSE) {}

/// Plan how to best vectorize, return the best VF and its cost, or None if		/// Plan how to best vectorize, return the best VF and its cost, or None if
/// vectorization and interleaving should be avoided up front.		/// vectorization and interleaving should be avoided up front.
Optional<VectorizationFactor> plan(ElementCount UserVF, unsigned UserIC);		Optional<VPlanVFPair> plan(ElementCount UserVF, unsigned UserIC);

/// Use the VPlan-native path to plan how to best vectorize, return the best		/// Use the VPlan-native path to plan how to best vectorize, return the best
/// VF and its cost.		/// VF and its cost.
VectorizationFactor planInVPlanNativePath(ElementCount UserVF);		VPlanVFPair planInVPlanNativePath(ElementCount UserVF);

/// Finalize the best decision and dispose of all other VPlans.		/// Finalize the best decision and dispose of all other VPlans.
void setBestPlan(ElementCount VF, unsigned UF);		void setBestPlan(VPlan *Plan, ElementCount VF, unsigned UF);

/// Generate the IR code for the body of the vectorized loop according to the		/// Generate the IR code for the body of the vectorized loop according to the
/// best selected VPlan.		/// best selected VPlan.
void executePlan(InnerLoopVectorizer &LB, DominatorTree *DT);		void executePlan(InnerLoopVectorizer &LB, DominatorTree *DT);

void printPlans(raw_ostream &O) {		void printPlans(raw_ostream &O) {
for (const auto &Plan : VPlans)		for (const auto &Plan : VPlans)
O << *Plan;		O << *Plan;
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines

// FIXME: Remove this switch once we have divergence analysis. Currently we		// FIXME: Remove this switch once we have divergence analysis. Currently we
// assume divergent non-backedge branches when this switch is true.		// assume divergent non-backedge branches when this switch is true.
cl::opt<bool> EnableVPlanPredication(		cl::opt<bool> EnableVPlanPredication(
"enable-vplan-predication", cl::init(false), cl::Hidden,		"enable-vplan-predication", cl::init(false), cl::Hidden,
cl::desc("Enable VPlan-native vectorization path predicator with "		cl::desc("Enable VPlan-native vectorization path predicator with "
"support for outer loop vectorization."));		"support for outer loop vectorization."));

		cl::opt<bool> CostUsingVPlan("cost-using-vplan", cl::init(false), cl::Hidden,
		cl::desc("Enable VPlan based costing path. To "
		"become the default in the future."));

// This flag enables the stress testing of the VPlan H-CFG construction in the		// This flag enables the stress testing of the VPlan H-CFG construction in the
// VPlan-native vectorization path. It must be used in conjuction with		// VPlan-native vectorization path. It must be used in conjuction with
// -enable-vplan-native-path. -vplan-verify-hcfg can also be used to enable the		// -enable-vplan-native-path. -vplan-verify-hcfg can also be used to enable the
// verification of the H-CFGs built.		// verification of the H-CFGs built.
static cl::opt<bool> VPlanBuildStressTest(		static cl::opt<bool> VPlanBuildStressTest(
"vplan-build-stress-test", cl::init(false), cl::Hidden,		"vplan-build-stress-test", cl::init(false), cl::Hidden,
cl::desc(		cl::desc(
"Build VPlan for every supported loop nest in the function and bail "		"Build VPlan for every supported loop nest in the function and bail "
▲ Show 20 Lines • Show All 744 Lines • ▼ Show 20 Lines	public:
bool runtimeChecksRequired();		bool runtimeChecksRequired();

/// \return The most profitable vectorization factor and the cost of that VF.		/// \return The most profitable vectorization factor and the cost of that VF.
/// This method checks every power of two up to MaxVF. If UserVF is not ZERO		/// This method checks every power of two up to MaxVF. If UserVF is not ZERO
/// then this vectorization factor will be selected if vectorization is		/// then this vectorization factor will be selected if vectorization is
/// possible.		/// possible.
VectorizationFactor selectVectorizationFactor(unsigned MaxVF);		VectorizationFactor selectVectorizationFactor(unsigned MaxVF);

		/// \return The most profitable vplan and VF from a list of VPlans.
		VPlanVFPair
		selectVectorizationFactorFromVPlans(SmallVectorImpl<VPlanPtr> &VPlans,
		unsigned MaxVF);

/// Setup cost-based decisions for user vectorization factor.		/// Setup cost-based decisions for user vectorization factor.
void selectUserVectorizationFactor(ElementCount UserVF) {		void selectUserVectorizationFactor(ElementCount UserVF) {
collectUniformsAndScalars(UserVF);		collectUniformsAndScalars(UserVF);
collectInstsToScalarize(UserVF);		collectInstsToScalarize(UserVF);
}		}

/// \return The size (in bits) of the smallest and widest types in the code		/// \return The size (in bits) of the smallest and widest types in the code
/// that needs to be vectorized. We ignore values that remain scalar such as		/// that needs to be vectorized. We ignore values that remain scalar such as
/// 64 bit loop indices.		/// 64 bit loop indices.
std::pair<unsigned, unsigned> getSmallestAndWidestTypes();		std::pair<unsigned, unsigned> getSmallestAndWidestTypes();

/// \return The desired interleave count.		/// \return The desired interleave count.
/// If interleave count has been specified by metadata it will be returned.		/// If interleave count has been specified by metadata it will be returned.
/// Otherwise, the interleave count is computed and returned. VF and LoopCost		/// Otherwise, the interleave count is computed and returned. VF and LoopCost
/// are the selected vectorization factor and the cost of the selected VF.		/// are the selected vectorization factor and the cost of the selected VF.
unsigned selectInterleaveCount(ElementCount VF, unsigned LoopCost);		unsigned selectInterleaveCount(VPlan *Plan, ElementCount VF,
		unsigned LoopCost);

/// Memory access instruction may be vectorized in more than one way.		/// Memory access instruction may be vectorized in more than one way.
/// Form of instruction after vectorization depends on cost.		/// Form of instruction after vectorization depends on cost.
/// This function takes cost-based decisions for Load/Store instructions		/// This function takes cost-based decisions for Load/Store instructions
/// and collects them in a map. This decisions map is used for building		/// and collects them in a map. This decisions map is used for building
/// the lists of loop-uniform and loop-scalar instructions.		/// the lists of loop-uniform and loop-scalar instructions.
/// The calculated cost is saved with widening decision in order to		/// The calculated cost is saved with widening decision in order to
/// avoid redundant calculations.		/// avoid redundant calculations.
▲ Show 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	public:

/// Invalidates decisions already taken by the cost model.		/// Invalidates decisions already taken by the cost model.
void invalidateCostModelingDecisions() {		void invalidateCostModelingDecisions() {
WideningDecisions.clear();		WideningDecisions.clear();
Uniforms.clear();		Uniforms.clear();
Scalars.clear();		Scalars.clear();
}		}

		/// Returns the execution time cost of an instruction for a given vector
		/// width. Vector width of one means scalar.
		VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);

private:		private:
unsigned NumPredStores = 0;		unsigned NumPredStores = 0;

/// \return An upper bound for the vectorization factor, a power-of-2 larger		/// \return An upper bound for the vectorization factor, a power-of-2 larger
/// than zero. One is returned if vectorization should best be avoided due		/// than zero. One is returned if vectorization should best be avoided due
/// to cost.		/// to cost.
unsigned computeFeasibleMaxVF(unsigned ConstTripCount);		unsigned computeFeasibleMaxVF(unsigned ConstTripCount);

/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually
/// operate on
/// vector values after type legalization in the backend. If this latter value
/// is
/// false, then all operations will be scalarized (i.e. no vectorization has
/// actually taken place).
using VectorizationCostTy = std::pair<unsigned, bool>;

/// Returns the expected execution cost. The unit of the cost does		/// Returns the expected execution cost. The unit of the cost does
/// not matter because we use the 'cost' units to compare different		/// not matter because we use the 'cost' units to compare different
/// vector widths. The cost that is returned is not normalized by		/// vector widths. The cost that is returned is not normalized by
/// the factor width.		/// the factor width.
VectorizationCostTy expectedCost(ElementCount VF);		VectorizationCostTy expectedCost(ElementCount VF);

/// Returns the execution time cost of an instruction for a given vector
/// width. Vector width of one means scalar.
VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);

/// The cost-computation logic from getInstructionCost which provides		/// The cost-computation logic from getInstructionCost which provides
/// the vector type as an output parameter.		/// the vector type as an output parameter.
unsigned getInstructionCost(Instruction I, ElementCount VF, Type &VectorTy);		unsigned getInstructionCost(Instruction I, ElementCount VF, Type &VectorTy);

/// Calculate vectorization cost of memory instruction \p I.		/// Calculate vectorization cost of memory instruction \p I.
unsigned getMemoryInstructionCost(Instruction *I, ElementCount VF);		unsigned getMemoryInstructionCost(Instruction *I, ElementCount VF);

/// The cost computation for scalarized memory instruction.		/// The cost computation for scalarized memory instruction.
▲ Show 20 Lines • Show All 3,974 Lines • ▼ Show 20 Lines	LLVM_DEBUG(if (ForceVectorization && Width > 1 && Cost >= ScalarCost) dbgs()
<< "LV: Vectorization seems to be not beneficial, "		<< "LV: Vectorization seems to be not beneficial, "
<< "but was forced by a user.\n");		<< "but was forced by a user.\n");
LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");		LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");
VectorizationFactor Factor = {ElementCount::getFixed(Width),		VectorizationFactor Factor = {ElementCount::getFixed(Width),
(unsigned)(Width * Cost)};		(unsigned)(Width * Cost)};
return Factor;		return Factor;
}		}

		VPlanVFPair LoopVectorizationCostModel::selectVectorizationFactorFromVPlans(
		SmallVectorImpl<VPlanPtr> &VPlans, unsigned MaxVF) {
		VPCostContext Ctx{*this, &TTI, Legal->getWidestInductionType()};
		bool ForceVectorization =
		Hints->getForce() == LoopVectorizeHints::FK_Enabled && MaxVF > 1;

		VPlan BestPlan = nullptr, ScalarPlan = nullptr;
		ElementCount BestVF = ElementCount::getNull();
		float BestCost, ScalarCost;
		for (const auto &Plan : VPlans) {
		for (ElementCount VF : Plan->getVFs()) {

		if (ForceVectorization && VF.isScalar()) {
		LLVM_DEBUG(dbgs() << " Skipping due to force vectorization\n");
		continue;
		}
		if (VF.getKnownMinValue() > MaxVF) {
		LLVM_DEBUG(dbgs() << " Skipping due to MaxVF\n");
		continue;
		}

		VectorizationCostTy Cost = Plan->cost(VF, Ctx);
		float VectorCost = Cost.first / (float)VF.getKnownMinValue();
		LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << VF.getKnownMinValue()
		<< " costs: " << (int)VectorCost << ".\n");
		if (!VF.isScalar() && !Cost.second && !ForceVectorization) {
		LLVM_DEBUG(
		dbgs()
		<< "LV: Not considering vector loop of width "
		<< VF.getKnownMinValue()
		<< " because it will not generate any vector instructions.\n");
		continue;
		}

		if (!BestPlan \|\| VectorCost < BestCost) {
		BestPlan = &*Plan;
		BestVF = VF;
		BestCost = VectorCost;
		}
		if (!ScalarPlan && VF.isScalar()) {
		ScalarPlan = &*Plan;
		ScalarCost = VectorCost;
		}
		}
		}

		if (!EnableCondStoresVectorization && NumPredStores) {
		reportVectorizationFailure("There are conditional stores.",
		"store that is conditionally executed prevents vectorization",
		"ConditionalStore", ORE, TheLoop);
		BestPlan = ScalarPlan;
		BestVF = ElementCount::getFixed(1);
		BestCost = ScalarCost;
		}

		if (!BestPlan) {
		assert(ScalarPlan);
		BestPlan = ScalarPlan;
		BestVF = ElementCount::getFixed(1);
		BestCost = ScalarCost;
		}

		LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << BestVF << ".\n");
		VectorizationFactor Factor = {
		BestVF, (unsigned)(BestCost * BestVF.getKnownMinValue())};
		return {BestPlan, Factor};
		}

std::pair<unsigned, unsigned>		std::pair<unsigned, unsigned>
LoopVectorizationCostModel::getSmallestAndWidestTypes() {		LoopVectorizationCostModel::getSmallestAndWidestTypes() {
unsigned MinWidth = -1U;		unsigned MinWidth = -1U;
unsigned MaxWidth = 8;		unsigned MaxWidth = 8;
const DataLayout &DL = TheFunction->getParent()->getDataLayout();		const DataLayout &DL = TheFunction->getParent()->getDataLayout();

// For each block.		// For each block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
Show All 40 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
MaxWidth = std::max(MaxWidth,		MaxWidth = std::max(MaxWidth,
(unsigned)DL.getTypeSizeInBits(T->getScalarType()));		(unsigned)DL.getTypeSizeInBits(T->getScalarType()));
}		}
}		}

return {MinWidth, MaxWidth};		return {MinWidth, MaxWidth};
}		}

unsigned LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,		unsigned LoopVectorizationCostModel::selectInterleaveCount(VPlan *Plan,
		ElementCount VF,
unsigned LoopCost) {		unsigned LoopCost) {
// -- The interleave heuristics --		// -- The interleave heuristics --
// We interleave the loop in order to expose ILP and reduce the loop overhead.		// We interleave the loop in order to expose ILP and reduce the loop overhead.
// There are many micro-architectural considerations that we can't predict		// There are many micro-architectural considerations that we can't predict
// at this level. For example, frontend pressure (on decode or fetch) due to		// at this level. For example, frontend pressure (on decode or fetch) due to
// code size, or the number and capabilities of the execution ports.		// code size, or the number and capabilities of the execution ports.
//		//
// We use the following heuristics to select the interleave count:		// We use the following heuristics to select the interleave count:
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::selectInterleaveCount(VPlan *Plan,
else		else
// Make sure IC is greater than 0.		// Make sure IC is greater than 0.
IC = std::max(1u, IC);		IC = std::max(1u, IC);

assert(IC > 0 && "Interleave count must be greater than 0.");		assert(IC > 0 && "Interleave count must be greater than 0.");

// If we did not calculate the cost for VF (because the user selected the VF)		// If we did not calculate the cost for VF (because the user selected the VF)
// then we calculate the cost of VF here.		// then we calculate the cost of VF here.
if (LoopCost == 0)		if (LoopCost == 0) {
		if (CostUsingVPlan) {
		VPCostContext Ctx{*this, &TTI, Legal->getWidestInductionType()};
		LoopCost = Plan->cost(VF, Ctx).first;
		} else
LoopCost = expectedCost(VF).first;		LoopCost = expectedCost(VF).first;
		}

assert(LoopCost && "Non-zero loop cost expected");		assert(LoopCost && "Non-zero loop cost expected");

// Interleave if we vectorized this loop and there is a reduction that could		// Interleave if we vectorized this loop and there is a reduction that could
// benefit from interleaving.		// benefit from interleaving.
if (VF.isVector() && HasReductions) {		if (VF.isVector() && HasReductions) {
LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");		LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");
return IC;		return IC;
▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// of the instruction costs more, and scalarizing would be beneficial.		// of the instruction costs more, and scalarizing would be beneficial.
Discount += VectorCost - ScalarCost;		Discount += VectorCost - ScalarCost;
ScalarCosts[I] = ScalarCost;		ScalarCosts[I] = ScalarCost;
}		}

return Discount;		return Discount;
}		}

LoopVectorizationCostModel::VectorizationCostTy		VectorizationCostTy LoopVectorizationCostModel::expectedCost(ElementCount VF) {
LoopVectorizationCostModel::expectedCost(ElementCount VF) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
VectorizationCostTy Cost;		VectorizationCostTy Cost;

// For each block.		// For each block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
VectorizationCostTy BlockCost;		VectorizationCostTy BlockCost;

// For each instruction in the old loop.		// For each instruction in the old loop.
▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	if (VF.isScalar()) {

return TTI.getAddressComputationCost(ValTy) +		return TTI.getAddressComputationCost(ValTy) +
TTI.getMemoryOpCost(I->getOpcode(), ValTy, Alignment, AS,		TTI.getMemoryOpCost(I->getOpcode(), ValTy, Alignment, AS,
TTI::TCK_RecipThroughput, I);		TTI::TCK_RecipThroughput, I);
}		}
return getWideningCost(I, VF);		return getWideningCost(I, VF);
}		}

LoopVectorizationCostModel::VectorizationCostTy		VectorizationCostTy
LoopVectorizationCostModel::getInstructionCost(Instruction *I,		LoopVectorizationCostModel::getInstructionCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
assert(!VF.isScalable() &&		assert(!VF.isScalable() &&
"the cost model is not yet implemented for scalable vectorization");		"the cost model is not yet implemented for scalable vectorization");
// If we know that this instruction will remain uniform, check the cost of		// If we know that this instruction will remain uniform, check the cost of
// the scalar version.		// the scalar version.
if (isUniformAfterVectorization(I, VF))		if (isUniformAfterVectorization(I, VF))
VF = ElementCount::getFixed(1);		VF = ElementCount::getFixed(1);
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines
// more than one is generated.		// more than one is generated.
static unsigned determineVPlanVF(const unsigned WidestVectorRegBits,		static unsigned determineVPlanVF(const unsigned WidestVectorRegBits,
LoopVectorizationCostModel &CM) {		LoopVectorizationCostModel &CM) {
unsigned WidestType;		unsigned WidestType;
std::tie(std::ignore, WidestType) = CM.getSmallestAndWidestTypes();		std::tie(std::ignore, WidestType) = CM.getSmallestAndWidestTypes();
return WidestVectorRegBits / WidestType;		return WidestVectorRegBits / WidestType;
}		}

VectorizationFactor		VPlanVFPair
LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {		LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {
assert(!UserVF.isScalable() && "scalable vectors not yet supported");		assert(!UserVF.isScalable() && "scalable vectors not yet supported");
ElementCount VF = UserVF;		ElementCount VF = UserVF;
// Outer loop handling: They may require CFG and instruction level		// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.		// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline.
if (!OrigLoop->isInnermost()) {		if (!OrigLoop->isInnermost()) {
Show All 15 Lines	if (!OrigLoop->isInnermost()) {
assert(isPowerOf2_32(VF.getKnownMinValue()) &&		assert(isPowerOf2_32(VF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")		LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")
<< "VF " << VF << " to build VPlans.\n");		<< "VF " << VF << " to build VPlans.\n");
buildVPlans(VF.getKnownMinValue(), VF.getKnownMinValue());		buildVPlans(VF.getKnownMinValue(), VF.getKnownMinValue());

// For VPlan build stress testing, we bail out after VPlan construction.		// For VPlan build stress testing, we bail out after VPlan construction.
if (VPlanBuildStressTest)		if (VPlanBuildStressTest)
return VectorizationFactor::Disabled();		return {nullptr, VectorizationFactor::Disabled()};

return {VF, 0 /Cost/};		assert(VPlans.size() == 1 && "Expected a single vplan!");
		return {&VPlans.front(), {VF, 0 /Cost*/}};
}		}

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "		dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
"VPlan-native path.\n");		"VPlan-native path.\n");
return VectorizationFactor::Disabled();		return {nullptr, VectorizationFactor::Disabled()};
}		}

Optional<VectorizationFactor>		Optional<VPlanVFPair> LoopVectorizationPlanner::plan(ElementCount UserVF,
LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {		unsigned UserIC) {
assert(!UserVF.isScalable() && "scalable vectorization not yet handled");		assert(!UserVF.isScalable() && "scalable vectorization not yet handled");
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");
Optional<unsigned> MaybeMaxVF =		Optional<unsigned> MaybeMaxVF =
CM.computeMaxVF(UserVF.getKnownMinValue(), UserIC);		CM.computeMaxVF(UserVF.getKnownMinValue(), UserIC);
if (!MaybeMaxVF) // Cases that should not to be vectorized nor interleaved.		if (!MaybeMaxVF) // Cases that should not to be vectorized nor interleaved.
return None;		return None;

// Invalidate interleave groups if all blocks of loop will be predicated.		// Invalidate interleave groups if all blocks of loop will be predicated.
Show All 16 Lines	assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(UserVF);		CM.selectUserVectorizationFactor(UserVF);
CM.collectInLoopReductions();		CM.collectInLoopReductions();
buildVPlansWithVPRecipes(UserVF.getKnownMinValue(),		buildVPlansWithVPRecipes(UserVF.getKnownMinValue(),
UserVF.getKnownMinValue());		UserVF.getKnownMinValue());
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
return {{UserVF, 0}};		assert(VPlans.size() == 1 && VPlans.front()->hasVF(UserVF) &&
		"Expected a correct width vplan!");
		return VPlanVFPair{&*VPlans.front(), {UserVF, 0}};
}		}

unsigned MaxVF = MaybeMaxVF.getValue();		unsigned MaxVF = MaybeMaxVF.getValue();
assert(MaxVF != 0 && "MaxVF is zero.");		assert(MaxVF != 0 && "MaxVF is zero.");

for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {		for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {
// Collect Uniform and Scalar instructions after vectorization with VF.		// Collect Uniform and Scalar instructions after vectorization with VF.
CM.collectUniformsAndScalars(ElementCount::getFixed(VF));		CM.collectUniformsAndScalars(ElementCount::getFixed(VF));

// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
if (VF > 1)		if (VF > 1)
CM.collectInstsToScalarize(ElementCount::getFixed(VF));		CM.collectInstsToScalarize(ElementCount::getFixed(VF));
}		}

CM.collectInLoopReductions();		CM.collectInLoopReductions();

buildVPlansWithVPRecipes(1, MaxVF);		buildVPlansWithVPRecipes(1, MaxVF);
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
if (MaxVF == 1)		if (MaxVF == 1) {
return VectorizationFactor::Disabled();		assert(VPlans.size() == 1 &&
		VPlans.front()->hasVF(ElementCount::getFixed(MaxVF)));
		return VPlanVFPair{&*VPlans.front(), VectorizationFactor::Disabled()};
		}

// Select the optimal vectorization factor.		// Select the optimal vectorization factor.
return CM.selectVectorizationFactor(MaxVF);		if (CostUsingVPlan)
		return CM.selectVectorizationFactorFromVPlans(VPlans, MaxVF);
		else {
		VectorizationFactor VF = CM.selectVectorizationFactor(MaxVF);
		for (VPlanPtr &Plan : VPlans)
		if (Plan->hasVF(VF.Width))
		rengolinUnsubmitted Not Done Reply Inline Actions So, whatever is the first VPlan with a VF we return? Is it guaranteed to be 1? rengolin: So, whatever is the first VPlan with a VF we return? Is it guaranteed to be 1?
		return VPlanVFPair{&*Plan, VF};
		llvm_unreachable("Expected to find a vplan with width VF!");
		}
}		}

void LoopVectorizationPlanner::setBestPlan(ElementCount VF, unsigned UF) {		void LoopVectorizationPlanner::setBestPlan(VPlan *Plan, ElementCount VF,
		unsigned UF) {
LLVM_DEBUG(dbgs() << "Setting best plan to VF=" << VF << ", UF=" << UF		LLVM_DEBUG(dbgs() << "Setting best plan to VF=" << VF << ", UF=" << UF
<< '\n');		<< '\n');
BestVF = VF;		BestVF = VF;
BestUF = UF;		BestUF = UF;

erase_if(VPlans, [VF](const VPlanPtr &Plan) {		if (!Plan) {
return !Plan->hasVF(VF);		// No best.
});		VPlans.clear();
		return;
		}

		erase_if(VPlans, [Plan](const VPlanPtr &P) { return &*P != Plan; });
assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");		assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");
}		}

void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,		void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,
DominatorTree *DT) {		DominatorTree *DT) {
// Perform the actual loop transformation.		// Perform the actual loop transformation.

// 1. Create a new empty loop. Unlink the old loop and connect the new one.		// 1. Create a new empty loop. Unlink the old loop and connect the new one.
▲ Show 20 Lines • Show All 472 Lines • ▼ Show 20 Lines	VPBasicBlock *VPRecipeBuilder::handleReplication(
LLVM_DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");		LLVM_DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");
assert(VPBB->getSuccessors().empty() &&		assert(VPBB->getSuccessors().empty() &&
"VPBB has successors when handling predicated replication.");		"VPBB has successors when handling predicated replication.");
// Record predicated instructions for above packing optimizations.		// Record predicated instructions for above packing optimizations.
PredInst2Recipe[I] = Recipe;		PredInst2Recipe[I] = Recipe;
VPBlockBase *Region = createReplicateRegion(I, Recipe, Plan);		VPBlockBase *Region = createReplicateRegion(I, Recipe, Plan);
VPBlockUtils::insertBlockAfter(Region, VPBB);		VPBlockUtils::insertBlockAfter(Region, VPBB);
auto *RegSucc = new VPBasicBlock();		auto *RegSucc = new VPBasicBlock();
		RegSucc->setReciprocalPredBlockProb(getReciprocalPredBlockProb());
VPBlockUtils::insertBlockAfter(RegSucc, Region);		VPBlockUtils::insertBlockAfter(RegSucc, Region);
return RegSucc;		return RegSucc;
}		}

VPRegionBlock VPRecipeBuilder::createReplicateRegion(Instruction Instr,		VPRegionBlock VPRecipeBuilder::createReplicateRegion(Instruction Instr,
VPRecipeBase *PredRecipe,		VPRecipeBase *PredRecipe,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
// Instructions marked for predication are replicated and placed under an		// Instructions marked for predication are replicated and placed under an
// if-then construct to prevent side-effects.		// if-then construct to prevent side-effects.

// Generate recipes to compute the block mask for this region.		// Generate recipes to compute the block mask for this region.
VPValue *BlockInMask = createBlockInMask(Instr->getParent(), Plan);		VPValue *BlockInMask = createBlockInMask(Instr->getParent(), Plan);

// Build the triangular if-then region.		// Build the triangular if-then region.
std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str();		std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str();
assert(Instr->getParent() && "Predicated instruction not in any basic block");		assert(Instr->getParent() && "Predicated instruction not in any basic block");
auto *BOMRecipe = new VPBranchOnMaskRecipe(BlockInMask);		auto *BOMRecipe = new VPBranchOnMaskRecipe(BlockInMask);
auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe);		auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe);
		Entry->setReciprocalPredBlockProb(Builder.getInsertBlock()->getReciprocalPredBlockProb());
auto *PHIRecipe =		auto *PHIRecipe =
Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);		Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);
auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);		auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);
		Exit->setReciprocalPredBlockProb(getReciprocalPredBlockProb());
auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);		auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);
		Pred->setReciprocalPredBlockProb(getReciprocalPredBlockProb());
VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);		VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);

// Note: first set Entry as region entry and then connect successors starting		// Note: first set Entry as region entry and then connect successors starting
// from it in order, to propagate the "parent" of each VPBasicBlock.		// from it in order, to propagate the "parent" of each VPBasicBlock.
VPBlockUtils::insertTwoBlocksAfter(Pred, Exit, BlockInMask, Entry);		VPBlockUtils::insertTwoBlocksAfter(Pred, Exit, BlockInMask, Entry);
VPBlockUtils::connectBlocks(Pred, Exit);		VPBlockUtils::connectBlocks(Pred, Exit);

return Region;		return Region;
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
// Relevant instructions from basic block BB will be grouped into VPRecipe		// Relevant instructions from basic block BB will be grouped into VPRecipe
// ingredients and fill a new VPBasicBlock.		// ingredients and fill a new VPBasicBlock.
unsigned VPBBsForBB = 0;		unsigned VPBBsForBB = 0;
auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());		auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());
VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB);		VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB);
VPBB = FirstVPBBForBB;		VPBB = FirstVPBBForBB;
Builder.setInsertPoint(VPBB);		Builder.setInsertPoint(VPBB);

		// Update the ReciprocalPredBlockProb of the block, used in costing.
		// FIXME: This is not very accurate, and could be improved / replaced.
		if (CM.blockNeedsPredication(BB))
		VPBB->setReciprocalPredBlockProb(getReciprocalPredBlockProb());

// Introduce each ingredient into VPlan.		// Introduce each ingredient into VPlan.
// TODO: Model and preserve debug instrinsics in VPlan.		// TODO: Model and preserve debug instrinsics in VPlan.
for (Instruction &I : BB->instructionsWithoutDebug()) {		for (Instruction &I : BB->instructionsWithoutDebug()) {
Instruction *Instr = &I;		Instruction *Instr = &I;

// First filter out irrelevant instructions, to ensure no recipes are		// First filter out irrelevant instructions, to ensure no recipes are
// built for them.		// built for them.
if (isa<BranchInst>(Instr) \|\| DeadInstructions.count(Instr))		if (isa<BranchInst>(Instr) \|\| DeadInstructions.count(Instr))
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	getOrCreateVectorValues(Value *V, unsigned Part) {
return ILV.getOrCreateVectorValue(V, Part);		return ILV.getOrCreateVectorValue(V, Part);
}		}

Value *LoopVectorizationPlanner::VPCallbackILV::getOrCreateScalarValue(		Value *LoopVectorizationPlanner::VPCallbackILV::getOrCreateScalarValue(
Value *V, const VPIteration &Instance) {		Value *V, const VPIteration &Instance) {
return ILV.getOrCreateScalarValue(V, Instance);		return ILV.getOrCreateScalarValue(V, Instance);
}		}

		VectorizationCostTy VPlan::cost(ElementCount VF, VPCostContext &Ctx) {
		VectorizationCostTy Cost;

		for (VPBlockBase *Block : depth_first(Entry)) {
		VectorizationCostTy C = Block->cost(VF, Ctx);

		Cost.first += C.first;
		Cost.second \|= C.second;
		}

		// The vplan does not contain the add+icmp for the loop iteration check. Add
		// those costs here.
		unsigned ExtraCost =
		Ctx.TTI->getArithmeticInstrCost(Instruction::Add,
		Ctx.WidestInductionType) +
		Ctx.TTI->getCmpSelInstrCost(Instruction::ICmp,
		Ctx.WidestInductionType);
		Cost.first += ExtraCost;
		LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << ExtraCost
		<< " for VF " << VF
		<< " For loop induction check (add + icmp)\n");
		// And then add the cost of the backedge, which is often but not always 0.
		ExtraCost =
		Ctx.TTI->getCFInstrCost(Instruction::Br, TTI::TCK_RecipThroughput);
		Cost.first += ExtraCost;
		LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << ExtraCost
		<< " for VF " << VF
		<< " For loop backedge cost (br)\n");

		return Cost;
		}

		VectorizationCostTy VPRegionBlock::cost(ElementCount VF, VPCostContext &Ctx) {
		ReversePostOrderTraversal<VPBlockBase *> RPOT(Entry);
		VectorizationCostTy Cost;

		for (VPBlockBase *Block : RPOT) {
		VectorizationCostTy C = Block->cost(VF, Ctx);

		Cost.first += C.first;
		Cost.second \|= C.second;
		}

		return Cost;
		}

		VectorizationCostTy VPBasicBlock::cost(ElementCount VF, VPCostContext &Ctx) {
		VectorizationCostTy BlockCost;
		VPSlotTracker Tracker(getPlan());

		for (VPRecipeBase &Recipe : Recipes) {
		// Skip ignored values.
		// FIXME: This should go via VPValues getUnderlyingValue.
		VPValue *Val = Recipe.toVPValue();
		if (Val && (Ctx.CM.ValuesToIgnore.count(Val->getUnderlyingValue()) \|\|
		(VF.isVector() &&
		Ctx.CM.VecValuesToIgnore.count(Val->getUnderlyingValue()))))
		continue;

		VectorizationCostTy C = Recipe.cost(VF, Ctx);

		// Check if we should override the cost.
		if (ForceTargetInstructionCost.getNumOccurrences() > 0)
		C.first = ForceTargetInstructionCost;

		BlockCost.first += C.first;
		BlockCost.second \|= C.second;
		LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C.first
		<< " for VF " << VF << " For recipe: ";
		Recipe.print(dbgs(), "", Tracker); dbgs() << '\n');
		}

		// If we are vectorizing a predicated block, it will have been
		// if-converted. This means that the block's instructions (aside from
		// stores and instructions that may divide by zero) will now be
		// unconditionally executed. For the scalar case, we may not always execute
		// the predicated block. Thus, scale the block's cost by the probability of
		// executing it.
		if (VF.isScalar())
		BlockCost.first /= getReciprocalPredBlockProb();

		return BlockCost;
		}

void VPInterleaveRecipe::print(raw_ostream &O, const Twine &Indent,		void VPInterleaveRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
O << "\"INTERLEAVE-GROUP with factor " << IG->getFactor() << " at ";		O << "\"INTERLEAVE-GROUP with factor " << IG->getFactor() << " at ";
IG->getInsertPos()->printAsOperand(O, false);		IG->getInsertPos()->printAsOperand(O, false);
O << ", ";		O << ", ";
getAddr()->printAsOperand(O, SlotTracker);		getAddr()->printAsOperand(O, SlotTracker);
VPValue *Mask = getMask();		VPValue *Mask = getMask();
if (Mask) {		if (Mask) {
O << ", ";		O << ", ";
Mask->printAsOperand(O, SlotTracker);		Mask->printAsOperand(O, SlotTracker);
}		}
for (unsigned i = 0; i < IG->getFactor(); ++i)		for (unsigned i = 0; i < IG->getFactor(); ++i)
if (Instruction *I = IG->getMember(i))		if (Instruction *I = IG->getMember(i))
O << "\\l\" +\n" << Indent << "\" " << VPlanIngredient(I) << " " << i;		O << "\\l\" +\n" << Indent << "\" " << VPlanIngredient(I) << " " << i;
}		}

void VPWidenCallRecipe::execute(VPTransformState &State) {		void VPWidenCallRecipe::execute(VPTransformState &State) {
State.ILV->widenCallInstruction(Ingredient, *this, State);		State.ILV->widenCallInstruction(Ingredient, *this, State);
}		}

		VectorizationCostTy VPWidenCallRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(&Ingredient, VF);
		}

void VPWidenSelectRecipe::execute(VPTransformState &State) {		void VPWidenSelectRecipe::execute(VPTransformState &State) {
State.ILV->widenSelectInstruction(Ingredient, *this, InvariantCond, State);		State.ILV->widenSelectInstruction(Ingredient, *this, InvariantCond, State);
}		}

		VectorizationCostTy VPWidenSelectRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(&Ingredient, VF);
		}

void VPWidenRecipe::execute(VPTransformState &State) {		void VPWidenRecipe::execute(VPTransformState &State) {
State.ILV->widenInstruction(getUnderlyingInstr(), this, State);		State.ILV->widenInstruction(getUnderlyingInstr(), this, State);
}		}

		VectorizationCostTy VPWidenRecipe::cost(ElementCount VF, VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(getUnderlyingInstr(), VF);
		}

void VPWidenGEPRecipe::execute(VPTransformState &State) {		void VPWidenGEPRecipe::execute(VPTransformState &State) {
State.ILV->widenGEP(GEP, *this, State.UF, State.VF, IsPtrLoopInvariant,		State.ILV->widenGEP(GEP, *this, State.UF, State.VF, IsPtrLoopInvariant,
IsIndexLoopInvariant, State);		IsIndexLoopInvariant, State);
}		}

		VectorizationCostTy VPWidenGEPRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(GEP, VF);
		}

void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {		void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Int or FP induction being replicated.");		assert(!State.Instance && "Int or FP induction being replicated.");
State.ILV->widenIntOrFpInduction(IV, Trunc);		State.ILV->widenIntOrFpInduction(IV, Trunc);
}		}

		VectorizationCostTy VPWidenIntOrFpInductionRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(IV, VF);
		}

void VPWidenPHIRecipe::execute(VPTransformState &State) {		void VPWidenPHIRecipe::execute(VPTransformState &State) {
State.ILV->widenPHIInstruction(Phi, State.UF, State.VF);		State.ILV->widenPHIInstruction(Phi, State.UF, State.VF);
}		}

		VectorizationCostTy VPWidenPHIRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(Phi, VF);
		}

void VPBlendRecipe::execute(VPTransformState &State) {		void VPBlendRecipe::execute(VPTransformState &State) {
State.ILV->setDebugLocFromInst(State.Builder, Phi);		State.ILV->setDebugLocFromInst(State.Builder, Phi);
// We know that all PHIs in non-header blocks are converted into		// We know that all PHIs in non-header blocks are converted into
// selects, so we don't have to worry about the insertion order and we		// selects, so we don't have to worry about the insertion order and we
// can just use the builder.		// can just use the builder.
// At this point we generate the predication tree. There may be		// At this point we generate the predication tree. There may be
// duplications since this is a simple recursive scan, but future		// duplications since this is a simple recursive scan, but future
// optimizations will clean it up.		// optimizations will clean it up.
Show All 23 Lines	for (unsigned Part = 0; Part < State.UF; ++Part) {
State.Builder.CreateSelect(Cond, In0, Entry[Part], "predphi");		State.Builder.CreateSelect(Cond, In0, Entry[Part], "predphi");
}		}
}		}
}		}
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
State.ValueMap.setVectorValue(Phi, Part, Entry[Part]);		State.ValueMap.setVectorValue(Phi, Part, Entry[Part]);
}		}

		VectorizationCostTy VPBlendRecipe::cost(ElementCount VF, VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(Phi, VF);
		}

void VPInterleaveRecipe::execute(VPTransformState &State) {		void VPInterleaveRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Interleave group being replicated.");		assert(!State.Instance && "Interleave group being replicated.");
State.ILV->vectorizeInterleaveGroup(IG, State, getAddr(), getMask());		State.ILV->vectorizeInterleaveGroup(IG, State, getAddr(), getMask());
}		}

		VectorizationCostTy VPInterleaveRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		VectorizationCostTy Cost = {0, false};
		for (unsigned i = 0; i < IG->getNumMembers(); i++) {
		if (!IG->getMember(i))
		continue;
		VectorizationCostTy MC = Ctx.CM.getInstructionCost(IG->getMember(i), VF);
		Cost.first += MC.first;
		Cost.second \|= MC.second;
		}
		return Cost;
		}

void VPReductionRecipe::execute(VPTransformState &State) {		void VPReductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Reduction being replicated.");		assert(!State.Instance && "Reduction being replicated.");
for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
RecurrenceDescriptor::RecurrenceKind Kind = RdxDesc->getRecurrenceKind();		RecurrenceDescriptor::RecurrenceKind Kind = RdxDesc->getRecurrenceKind();
Value *NewVecOp = State.get(getVecOp(), Part);		Value *NewVecOp = State.get(getVecOp(), Part);
if (getCondOp()) {		if (getCondOp()) {
Value *NewCond = State.get(getCondOp(), Part);		Value *NewCond = State.get(getCondOp(), Part);
VectorType *VecTy = cast<VectorType>(NewVecOp->getType());		VectorType *VecTy = cast<VectorType>(NewVecOp->getType());
Show All 17 Lines	if (Kind == RecurrenceDescriptor::RK_IntegerMinMax \|\|
NextInChain = State.Builder.CreateBinOp(		NextInChain = State.Builder.CreateBinOp(
(Instruction::BinaryOps)getUnderlyingInstr()->getOpcode(), NewRed,		(Instruction::BinaryOps)getUnderlyingInstr()->getOpcode(), NewRed,
PrevInChain);		PrevInChain);
}		}
State.ValueMap.setVectorValue(getUnderlyingInstr(), Part, NextInChain);		State.ValueMap.setVectorValue(getUnderlyingInstr(), Part, NextInChain);
}		}
}		}

		VectorizationCostTy VPReductionRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		unsigned Cost = Ctx.TTI->getArithmeticReductionCost(
		RdxDesc->getRecurrenceBinOp(),
		VectorType::get(RdxDesc->getRecurrenceType(), VF), false,
		TTI::TCK_RecipThroughput);
		return {Cost, false};
		}

void VPReplicateRecipe::execute(VPTransformState &State) {		void VPReplicateRecipe::execute(VPTransformState &State) {
if (State.Instance) { // Generate a single instance.		if (State.Instance) { // Generate a single instance.
State.ILV->scalarizeInstruction(Ingredient, this, State.Instance,		State.ILV->scalarizeInstruction(Ingredient, this, State.Instance,
IsPredicated, State);		IsPredicated, State);
// Insert scalar instance packing it into a vector.		// Insert scalar instance packing it into a vector.
if (AlsoPack && State.VF.isVector()) {		if (AlsoPack && State.VF.isVector()) {
// If we're constructing lane 0, initialize to start from undef.		// If we're constructing lane 0, initialize to start from undef.
if (State.Instance->Lane == 0) {		if (State.Instance->Lane == 0) {
Show All 12 Lines	void VPReplicateRecipe::execute(VPTransformState &State) {
// of the UF parts.		// of the UF parts.
unsigned EndLane = IsUniform ? 1 : State.VF.getKnownMinValue();		unsigned EndLane = IsUniform ? 1 : State.VF.getKnownMinValue();
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
for (unsigned Lane = 0; Lane < EndLane; ++Lane)		for (unsigned Lane = 0; Lane < EndLane; ++Lane)
State.ILV->scalarizeInstruction(Ingredient, *this, {Part, Lane},		State.ILV->scalarizeInstruction(Ingredient, *this, {Part, Lane},
IsPredicated, State);		IsPredicated, State);
}		}

		VectorizationCostTy VPReplicateRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(Ingredient, VF);
		}

void VPBranchOnMaskRecipe::execute(VPTransformState &State) {		void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Branch on Mask works only on single instance.");		assert(State.Instance && "Branch on Mask works only on single instance.");

unsigned Part = State.Instance->Part;		unsigned Part = State.Instance->Part;
unsigned Lane = State.Instance->Lane;		unsigned Lane = State.Instance->Lane;

Value *ConditionBit = nullptr;		Value *ConditionBit = nullptr;
VPValue *BlockInMask = getMask();		VPValue *BlockInMask = getMask();
Show All 10 Lines	void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
auto *CurrentTerminator = State.CFG.PrevBB->getTerminator();		auto *CurrentTerminator = State.CFG.PrevBB->getTerminator();
assert(isa<UnreachableInst>(CurrentTerminator) &&		assert(isa<UnreachableInst>(CurrentTerminator) &&
"Expected to replace unreachable terminator with conditional branch.");		"Expected to replace unreachable terminator with conditional branch.");
auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit);		auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit);
CondBr->setSuccessor(0, nullptr);		CondBr->setSuccessor(0, nullptr);
ReplaceInstWithInst(CurrentTerminator, CondBr);		ReplaceInstWithInst(CurrentTerminator, CondBr);
}		}

		VectorizationCostTy VPBranchOnMaskRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		// In cases of scalarized and predicated instructions, there will be VF
		// predicated blocks in the vectorized loop. Each branch around these
		// blocks requires also an extract of its vector compare i1 element.
		if (VF.isVector()) {
		// Return cost for branches around scalarized and predicated blocks.
		assert(!VF.isScalable() && "scalable vectors not yet supported.");
		LLVMContext &C = Ctx.CM.TheLoop->getHeader()->getContext();
		auto *Vec_i1Ty = VectorType::get(IntegerType::getInt1Ty(C), VF);
		unsigned Cost =
		Ctx.TTI->getScalarizationOverhead(
		Vec_i1Ty, APInt::getAllOnesValue(VF.getKnownMinValue()), false,
		true) +
		(Ctx.TTI->getCFInstrCost(Instruction::Br,
		TargetTransformInfo::TCK_RecipThroughput) *
		VF.getKnownMinValue());
		return {Cost, false};
		}
		return {0, false};
		}

void VPPredInstPHIRecipe::execute(VPTransformState &State) {		void VPPredInstPHIRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Predicated instruction PHI works per instance.");		assert(State.Instance && "Predicated instruction PHI works per instance.");
Instruction *ScalarPredInst = cast<Instruction>(		Instruction *ScalarPredInst = cast<Instruction>(
State.ValueMap.getScalarValue(PredInst, *State.Instance));		State.ValueMap.getScalarValue(PredInst, *State.Instance));
BasicBlock *PredicatedBB = ScalarPredInst->getParent();		BasicBlock *PredicatedBB = ScalarPredInst->getParent();
BasicBlock *PredicatingBB = PredicatedBB->getSinglePredecessor();		BasicBlock *PredicatingBB = PredicatedBB->getSinglePredecessor();
assert(PredicatingBB && "Predicated block has no single predecessor.");		assert(PredicatingBB && "Predicated block has no single predecessor.");

Show All 15 Lines	if (State.ValueMap.hasVectorValue(PredInst, Part)) {
Type *PredInstType = PredInst->getType();		Type *PredInstType = PredInst->getType();
PHINode *Phi = State.Builder.CreatePHI(PredInstType, 2);		PHINode *Phi = State.Builder.CreatePHI(PredInstType, 2);
Phi->addIncoming(UndefValue::get(ScalarPredInst->getType()), PredicatingBB);		Phi->addIncoming(UndefValue::get(ScalarPredInst->getType()), PredicatingBB);
Phi->addIncoming(ScalarPredInst, PredicatedBB);		Phi->addIncoming(ScalarPredInst, PredicatedBB);
State.ValueMap.resetScalarValue(PredInst, *State.Instance, Phi);		State.ValueMap.resetScalarValue(PredInst, *State.Instance, Phi);
}		}
}		}

		VectorizationCostTy VPPredInstPHIRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return { 0, false };
		}

void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {		void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
Instruction *Instr = getUnderlyingInstr();		Instruction *Instr = getUnderlyingInstr();
VPValue *StoredValue = isa<StoreInst>(Instr) ? getStoredValue() : nullptr;		VPValue *StoredValue = isa<StoreInst>(Instr) ? getStoredValue() : nullptr;
State.ILV->vectorizeMemoryInstruction(Instr, State,		State.ILV->vectorizeMemoryInstruction(Instr, State,
StoredValue ? nullptr : this, getAddr(),		StoredValue ? nullptr : this, getAddr(),
StoredValue, getMask());		StoredValue, getMask());
}		}

		VectorizationCostTy VPWidenMemoryInstructionRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return Ctx.CM.getInstructionCost(getUnderlyingInstr(), VF);
		}

		VectorizationCostTy VPWidenCanonicalIVRecipe::cost(ElementCount VF,
		VPCostContext &Ctx) {
		return {Ctx.TTI->getCFInstrCost(Instruction::PHI,
		TargetTransformInfo::TCK_RecipThroughput),
		false};
		}

		VectorizationCostTy VPInstruction::cost(ElementCount VF, VPCostContext &Ctx) {
		// FIXME: Cost everything that a VPInstruction can be, which likely needs type
		// information.
		return {0, false};
		}

// Determine how to lower the scalar epilogue, which depends on 1) optimising		// Determine how to lower the scalar epilogue, which depends on 1) optimising
// for minimum code-size, 2) predicate compiler options, 3) loop hints forcing		// for minimum code-size, 2) predicate compiler options, 3) loop hints forcing
// predication, and 4) a TTI hook that analyses whether the loop is suitable		// predication, and 4) a TTI hook that analyses whether the loop is suitable
// for predication.		// for predication.
static ScalarEpilogueLowering getScalarEpilogueLowering(		static ScalarEpilogueLowering getScalarEpilogueLowering(
Function F, Loop L, LoopVectorizeHints &Hints, ProfileSummaryInfo *PSI,		Function F, Loop L, LoopVectorizeHints &Hints, ProfileSummaryInfo *PSI,
BlockFrequencyInfo BFI, TargetTransformInfo TTI, TargetLibraryInfo *TLI,		BlockFrequencyInfo BFI, TargetTransformInfo TTI, TargetLibraryInfo *TLI,
AssumptionCache AC, LoopInfo LI, ScalarEvolution SE, DominatorTree DT,		AssumptionCache AC, LoopInfo LI, ScalarEvolution SE, DominatorTree DT,
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	static bool processLoopInVPlanNativePath(
// TODO: CM is not used at this point inside the planner. Turn CM into an		// TODO: CM is not used at this point inside the planner. Turn CM into an
// optional argument if we don't need it in the future.		// optional argument if we don't need it in the future.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI, PSE);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI, PSE);

// Get user vectorization factor.		// Get user vectorization factor.
const unsigned UserVF = Hints.getWidth();		const unsigned UserVF = Hints.getWidth();

// Plan how to best vectorize, return the best VF and its cost.		// Plan how to best vectorize, return the best VF and its cost.
const VectorizationFactor VF =		auto PlanVF = LVP.planInVPlanNativePath(ElementCount::getFixed(UserVF));
LVP.planInVPlanNativePath(ElementCount::getFixed(UserVF));

// If we are stress testing VPlan builds, do not attempt to generate vector		// If we are stress testing VPlan builds, do not attempt to generate vector
// code. Masked vector code generation support will follow soon.		// code. Masked vector code generation support will follow soon.
// Also, do not attempt to vectorize if no vector code will be produced.		// Also, do not attempt to vectorize if no vector code will be produced.
if (VPlanBuildStressTest \|\| EnableVPlanPredication \|\|		if (VPlanBuildStressTest \|\| EnableVPlanPredication \|\|
VectorizationFactor::Disabled() == VF)		VectorizationFactor::Disabled() == PlanVF.VF)
return false;		return false;

LVP.setBestPlan(VF.Width, 1);		LVP.setBestPlan(PlanVF.Plan, PlanVF.VF.Width, 1);

InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, 1, LVL,		InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, PlanVF.VF.Width, 1,
&CM, BFI, PSI);		LVL, &CM, BFI, PSI);
LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""		LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
<< L->getHeader()->getParent()->getName() << "\"\n");		<< L->getHeader()->getParent()->getName() << "\"\n");
LVP.executePlan(LB, DT);		LVP.executePlan(LB, DT);

// Mark the loop as already vectorized to avoid vectorizing again.		// Mark the loop as already vectorized to avoid vectorizing again.
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();

assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));		assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
// Use the planner for vectorization.		// Use the planner for vectorization.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, CM, IAI, PSE);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, CM, IAI, PSE);

// Get user vectorization factor and interleave count.		// Get user vectorization factor and interleave count.
unsigned UserVF = Hints.getWidth();		unsigned UserVF = Hints.getWidth();
unsigned UserIC = Hints.getInterleave();		unsigned UserIC = Hints.getInterleave();

// Plan how to best vectorize, return the best VF and its cost.		// Plan how to best vectorize, return the best VF and its cost.
Optional<VectorizationFactor> MaybeVF =		Optional<VPlanVFPair> MaybeVF =
LVP.plan(ElementCount::getFixed(UserVF), UserIC);		LVP.plan(ElementCount::getFixed(UserVF), UserIC);

		VPlan *BestPlan = nullptr;
VectorizationFactor VF = VectorizationFactor::Disabled();		VectorizationFactor VF = VectorizationFactor::Disabled();
unsigned IC = 1;		unsigned IC = 1;

if (MaybeVF) {		if (MaybeVF) {
VF = *MaybeVF;		BestPlan = (*MaybeVF).Plan;
		VF = (*MaybeVF).VF;
// Select the interleave count.		// Select the interleave count.
IC = CM.selectInterleaveCount(VF.Width, VF.Cost);		IC = CM.selectInterleaveCount(BestPlan, VF.Width, VF.Cost);
}		}

// Identify the diagnostic messages that should be produced.		// Identify the diagnostic messages that should be produced.
std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;		std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;
bool VectorizeLoop = true, InterleaveLoop = true;		bool VectorizeLoop = true, InterleaveLoop = true;
if (Requirements.doesNotMeet(F, L, Hints)) {		if (Requirements.doesNotMeet(F, L, Hints)) {
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "
"requirements.\n");		"requirements.\n");
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	ORE->emit([&]() {
<< IntDiagMsg.second;		<< IntDiagMsg.second;
});		});
} else if (VectorizeLoop && InterleaveLoop) {		} else if (VectorizeLoop && InterleaveLoop) {
LLVM_DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width		LLVM_DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width
<< ") in " << DebugLocStr << '\n');		<< ") in " << DebugLocStr << '\n');
LLVM_DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');		LLVM_DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
}		}

LVP.setBestPlan(VF.Width, IC);		LVP.setBestPlan(BestPlan, VF.Width, IC);

using namespace ore;		using namespace ore;
bool DisableRuntimeUnroll = false;		bool DisableRuntimeUnroll = false;
MDNode *OrigLoopID = L->getLoopID();		MDNode *OrigLoopID = L->getLoopID();

if (!VectorizeLoop) {		if (!VectorizeLoop) {
assert(IC > 1 && "interleave count should not be 1 or 0");		assert(IC > 1 && "interleave count should not be 1 or 0");
// If we decided that it is not legal to vectorize the loop, then		// If we decided that it is not legal to vectorize the loop, then
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
class LoopInfo;		class LoopInfo;
class raw_ostream;		class raw_ostream;
class RecurrenceDescriptor;		class RecurrenceDescriptor;
class Value;		class Value;
class VPBasicBlock;		class VPBasicBlock;
class VPRegionBlock;		class VPRegionBlock;
class VPlan;		class VPlan;
class VPlanSlp;		class VPlanSlp;
		class LoopVectorizationCostModel;
		class LoopVectorizationLegality;

/// A range of powers-of-2 vectorization factors with fixed start and		/// A range of powers-of-2 vectorization factors with fixed start and
/// adjustable end. The range includes start and excludes end, e.g.,:		/// adjustable end. The range includes start and excludes end, e.g.,:
/// [1, 9) = {1, 2, 4, 8}		/// [1, 9) = {1, 2, 4, 8}
struct VFRange {		struct VFRange {
// A power of 2.		// A power of 2.
const unsigned Start;		const unsigned Start;

Show All 12 Lines
struct VPIteration {		struct VPIteration {
/// in [0..UF)		/// in [0..UF)
unsigned Part;		unsigned Part;

/// in [0..VF)		/// in [0..VF)
unsigned Lane;		unsigned Lane;
};		};

		/// The vectorization cost is a combination of the cost itself and a boolean
		/// indicating whether any of the contributing operations will actually
		/// operate on vector values after type legalization in the backend. If this
		/// latter value is false, then all operations will be scalarized (i.e. no
		/// vectorization has actually taken place).
		using VectorizationCostTy = std::pair<unsigned, bool>;
		rengolinUnsubmitted Not Done Reply Inline Actions Funny enough, here the use of `std::pair` is worse. In the code that uses it, there's a lot of: Cost.first += C.first; Cost.second \|= C.second; and the typedef offers no help. Here an actual struct would be more suitable: struct VectoriztionCostTy { unsigned cost; bool active; } rengolin: Funny enough, here the use of `std::pair` is worse. In the code that uses it, there's a lot of…

/// This is a helper struct for maintaining vectorization state. It's used for		/// This is a helper struct for maintaining vectorization state. It's used for
/// mapping values from the original loop to their corresponding values in		/// mapping values from the original loop to their corresponding values in
/// the new loop. Two mappings are maintained: one for vectorized values and		/// the new loop. Two mappings are maintained: one for vectorized values and
/// one for scalarized values. Vectorized values are represented with UF		/// one for scalarized values. Vectorized values are represented with UF
/// vector values in the new loop, and scalarized values are represented with		/// vector values in the new loop, and scalarized values are represented with
/// UF x VF scalar values in the new loop. UF and VF are the unroll and		/// UF x VF scalar values in the new loop. UF and VF are the unroll and
/// vectorization factors, respectively.		/// vectorization factors, respectively.
///		///
▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	struct VPTransformState {
Value *TripCount = nullptr;		Value *TripCount = nullptr;

/// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.		/// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.
InnerLoopVectorizer *ILV;		InnerLoopVectorizer *ILV;

VPCallback &Callback;		VPCallback &Callback;
};		};

		/// A struct to hold the context used during cost calculations.
		struct VPCostContext {
		/// The original CostModel, which is currently used for getting instruction
		bmahjourUnsubmitted Not Done Reply Inline Actions The cost-model is conceptually the structure that knows all about calculating costs. Instead of packaging the cost-model and legality and send it to the VPlan, wouldn't it make more sense to send the VPlan to the cost-model as it already has access to legality? bmahjour: The cost-model is conceptually the structure that knows all about calculating costs. Instead of…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions The idea, as far as I understand, is that just as the execution of a vplan is separated into the recipes that make it up, so should the cost model. All the CM.getInstructionCost(..) methods will be replaced by the code that make them up - VPInterleaveRecipe will know how to cost interleaving groups, VPReductions will know how to cost reduction, VPWidenRecipes will be able to handle widened instructions etc The LoopVectorizationCostModel is the old monolithic way of doing things, which vplan is trying to move away from. This is only the first step of trying to clean that up.. As for this struct specifically, yeah it doesn't feel like the best. I was trying to follow VPTransformState but it doesn't contain much in it at the moment. Plus I apparently misspelt analysis. dmgreen: The idea, as far as I understand, is that just as the execution of a vplan is separated into…
		bmahjourUnsubmitted Not Done Reply Inline Actions Ok, if the idea is to eventually remove the `LoopVectorizationCostModel` in the future, this change makes sense as a transitional step. By the way, I find it strange that we rely on `LoopVectorizationLegality` to get information about the cost! Looks like we only need `getWidestInductionType` from it, so can we put that in the context instead of the whole class? bmahjour: Ok, if the idea is to eventually remove the `LoopVectorizationCostModel` in the future, this…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Ah Good point. Some of this code was added and removed along the way. You are right that the Legality isn't really needed much at the moment. We may need it for some things in the future, but if we do can address those as they come up. dmgreen: Ah Good point. Some of this code was added and removed along the way. You are right that the…
		a.elovikovUnsubmitted Not Done Reply Inline Actions Instead of packaging the cost-model and legality and send it to the VPlan, wouldn't it make more sense to send the VPlan to the cost-model... My +1 for putting CM code outside the VPlan itself. I prefer to see VPlan as a kind of IR in itself and hence it would make sense to keep it similar to LLVM IR CostModeling via separate class. In fact, I'd rather move `::execute` methods out of VPlan as well. Moving the cost modeling code out of VPlan classes might also make it easier to prototype/implement some advanced heuristics that would work on the basicblock/whole vplan level (i.e. some kinds of register pressure/spills/fills estimates) by just putting all the code locally and grouped by different level of details we want to account for. We might also consider cost modeling for either throughput or latency depending on the trip counts. IMO, that would also be easier to code if we'd have a separate class. a.elovikov: > Instead of packaging the cost-model and legality and send it to the VPlan, wouldn't it make…
		/// cost.
		LoopVectorizationCostModel &CM;

		/// The TTI to query target costs
		const TargetTransformInfo *TTI;

		/// The widest induction type, as in Legal->getWidestInductionType()
		Type *WidestInductionType;
		};

/// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.		/// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.
/// A VPBlockBase can be either a VPBasicBlock or a VPRegionBlock.		/// A VPBlockBase can be either a VPBasicBlock or a VPRegionBlock.
class VPBlockBase {		class VPBlockBase {
friend class VPBlockUtils;		friend class VPBlockUtils;

const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).		const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

/// An optional name for the block.		/// An optional name for the block.
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	void clearSuccessors() {
Successors.clear();		Successors.clear();
CondBit = nullptr;		CondBit = nullptr;
}		}

/// The method which generates the output IR that correspond to this		/// The method which generates the output IR that correspond to this
/// VPBlockBase, thereby "executing" the VPlan.		/// VPBlockBase, thereby "executing" the VPlan.
virtual void execute(struct VPTransformState *State) = 0;		virtual void execute(struct VPTransformState *State) = 0;

		virtual VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) = 0;

/// Delete all blocks reachable from a given VPBlockBase, inclusive.		/// Delete all blocks reachable from a given VPBlockBase, inclusive.
static void deleteCFG(VPBlockBase *Entry);		static void deleteCFG(VPBlockBase *Entry);

void printAsOperand(raw_ostream &OS, bool PrintType) const {		void printAsOperand(raw_ostream &OS, bool PrintType) const {
OS << getName();		OS << getName();
}		}

void print(raw_ostream &OS) const {		void print(raw_ostream &OS) const {
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
/// \return the VPBasicBlock which this VPRecipe belongs to.		/// \return the VPBasicBlock which this VPRecipe belongs to.
VPBasicBlock *getParent() { return Parent; }		VPBasicBlock *getParent() { return Parent; }
const VPBasicBlock *getParent() const { return Parent; }		const VPBasicBlock *getParent() const { return Parent; }

/// The method which generates the output IR instructions that correspond to		/// The method which generates the output IR instructions that correspond to
/// this VPRecipe, thereby "executing" the VPlan.		/// this VPRecipe, thereby "executing" the VPlan.
virtual void execute(struct VPTransformState &State) = 0;		virtual void execute(struct VPTransformState &State) = 0;

		virtual VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) = 0;

/// Each recipe prints itself.		/// Each recipe prints itself.
virtual void print(raw_ostream &O, const Twine &Indent,		virtual void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const = 0;		VPSlotTracker &SlotTracker) const = 0;

/// Dump the recipe to stderr (for debugging).		/// Dump the recipe to stderr (for debugging).
void dump() const;		void dump() const;

/// Insert an unlinked recipe into a basic block immediately before		/// Insert an unlinked recipe into a basic block immediately before
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	public:

unsigned getOpcode() const { return Opcode; }		unsigned getOpcode() const { return Opcode; }

/// Generate the instruction.		/// Generate the instruction.
/// TODO: We currently execute only per-part unless a specific instance is		/// TODO: We currently execute only per-part unless a specific instance is
/// provided.		/// provided.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the Recipe.		/// Print the Recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;

/// Print the VPInstruction.		/// Print the VPInstruction.
void print(raw_ostream &O) const;		void print(raw_ostream &O) const;
void print(raw_ostream &O, VPSlotTracker &SlotTracker) const;		void print(raw_ostream &O, VPSlotTracker &SlotTracker) const;

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	public:
}		}
static inline bool classof(const VPValue *V) {		static inline bool classof(const VPValue *V) {
return V->getVPValueID() == VPValue::VPWidenSC;		return V->getVPValueID() == VPValue::VPWidenSC;
}		}

/// Produce widened copies of all Ingredients.		/// Produce widened copies of all Ingredients.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A recipe for widening Call instructions.		/// A recipe for widening Call instructions.
class VPWidenCallRecipe : public VPRecipeBase, public VPUser {		class VPWidenCallRecipe : public VPRecipeBase, public VPUser {
/// Hold the call to be widened.		/// Hold the call to be widened.
Show All 9 Lines	public:
/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPWidenCallSC;		return V->getVPRecipeID() == VPRecipeBase::VPWidenCallSC;
}		}

/// Produce a widened version of the call instruction.		/// Produce a widened version of the call instruction.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A recipe for widening select instructions.		/// A recipe for widening select instructions.
class VPWidenSelectRecipe : public VPRecipeBase, public VPUser {		class VPWidenSelectRecipe : public VPRecipeBase, public VPUser {
private:		private:
Show All 15 Lines	public:
/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPWidenSelectSC;		return V->getVPRecipeID() == VPRecipeBase::VPWidenSelectSC;
}		}

/// Produce a widened version of the select instruction.		/// Produce a widened version of the select instruction.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A recipe for handling GEP instructions.		/// A recipe for handling GEP instructions.
class VPWidenGEPRecipe : public VPRecipeBase, public VPUser {		class VPWidenGEPRecipe : public VPRecipeBase, public VPUser {
GetElementPtrInst *GEP;		GetElementPtrInst *GEP;
Show All 22 Lines	public:
/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPWidenGEPSC;		return V->getVPRecipeID() == VPRecipeBase::VPWidenGEPSC;
}		}

/// Generate the gep nodes.		/// Generate the gep nodes.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A recipe for handling phi nodes of integer and floating-point inductions,		/// A recipe for handling phi nodes of integer and floating-point inductions,
/// producing their vector and scalar values.		/// producing their vector and scalar values.
class VPWidenIntOrFpInductionRecipe : public VPRecipeBase {		class VPWidenIntOrFpInductionRecipe : public VPRecipeBase {
Show All 9 Lines	public:
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPWidenIntOrFpInductionSC;		return V->getVPRecipeID() == VPRecipeBase::VPWidenIntOrFpInductionSC;
}		}

/// Generate the vectorized and scalarized versions of the phi node as		/// Generate the vectorized and scalarized versions of the phi node as
/// needed by their users.		/// needed by their users.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A recipe for handling all phi nodes except for integer and FP inductions.		/// A recipe for handling all phi nodes except for integer and FP inductions.
class VPWidenPHIRecipe : public VPRecipeBase {		class VPWidenPHIRecipe : public VPRecipeBase {
PHINode *Phi;		PHINode *Phi;

public:		public:
VPWidenPHIRecipe(PHINode *Phi) : VPRecipeBase(VPWidenPHISC), Phi(Phi) {}		VPWidenPHIRecipe(PHINode *Phi) : VPRecipeBase(VPWidenPHISC), Phi(Phi) {}
~VPWidenPHIRecipe() override = default;		~VPWidenPHIRecipe() override = default;

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPWidenPHISC;		return V->getVPRecipeID() == VPRecipeBase::VPWidenPHISC;
}		}

/// Generate the phi/select nodes.		/// Generate the phi/select nodes.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A recipe for vectorizing a phi-node as a sequence of mask-based select		/// A recipe for vectorizing a phi-node as a sequence of mask-based select
/// instructions.		/// instructions.
class VPBlendRecipe : public VPRecipeBase, public VPUser {		class VPBlendRecipe : public VPRecipeBase, public VPUser {
Show All 24 Lines	public:
VPValue getIncomingValue(unsigned Idx) const { return getOperand(Idx 2); }		VPValue getIncomingValue(unsigned Idx) const { return getOperand(Idx 2); }

/// Return mask number \p Idx.		/// Return mask number \p Idx.
VPValue getMask(unsigned Idx) const { return getOperand(Idx 2 + 1); }		VPValue getMask(unsigned Idx) const { return getOperand(Idx 2 + 1); }

/// Generate the phi/select nodes.		/// Generate the phi/select nodes.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// VPInterleaveRecipe is a recipe for transforming an interleave group of load		/// VPInterleaveRecipe is a recipe for transforming an interleave group of load
/// or stores into one wide load/store and shuffles.		/// or stores into one wide load/store and shuffles.
class VPInterleaveRecipe : public VPRecipeBase, public VPUser {		class VPInterleaveRecipe : public VPRecipeBase, public VPUser {
Show All 23 Lines	public:
VPValue *getMask() const {		VPValue *getMask() const {
// Mask is optional and therefore the last, currently 2nd operand.		// Mask is optional and therefore the last, currently 2nd operand.
return getNumOperands() == 2 ? getOperand(1) : nullptr;		return getNumOperands() == 2 ? getOperand(1) : nullptr;
}		}

/// Generate the wide load or store, and shuffles.		/// Generate the wide load or store, and shuffles.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;

const InterleaveGroup<Instruction> *getInterleaveGroup() { return IG; }		const InterleaveGroup<Instruction> *getInterleaveGroup() { return IG; }
};		};

/// A recipe to represent inloop reduction operations, performing a reduction on		/// A recipe to represent inloop reduction operations, performing a reduction on
Show All 26 Lines	public:
}		}
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPReductionSC;		return V->getVPRecipeID() == VPRecipeBase::VPReductionSC;
}		}

/// Generate the reduction in the loop		/// Generate the reduction in the loop
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;

/// The VPValue of the scalar Chain being accumulated.		/// The VPValue of the scalar Chain being accumulated.
VPValue *getChainOp() const { return getOperand(0); }		VPValue *getChainOp() const { return getOperand(0); }
/// The VPValue of the vector value to be reduced.		/// The VPValue of the vector value to be reduced.
VPValue *getVecOp() const { return getOperand(1); }		VPValue *getVecOp() const { return getOperand(1); }
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPReplicateSC;		return V->getVPRecipeID() == VPRecipeBase::VPReplicateSC;
}		}

/// Generate replicas of the desired Ingredient. Replicas will be generated		/// Generate replicas of the desired Ingredient. Replicas will be generated
/// for all parts and lanes unless a specific part and lane are specified in		/// for all parts and lanes unless a specific part and lane are specified in
/// the \p State.		/// the \p State.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

void setAlsoPack(bool Pack) { AlsoPack = Pack; }		void setAlsoPack(bool Pack) { AlsoPack = Pack; }

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A recipe for generating conditional branches on the bits of a mask.		/// A recipe for generating conditional branches on the bits of a mask.
class VPBranchOnMaskRecipe : public VPRecipeBase, public VPUser {		class VPBranchOnMaskRecipe : public VPRecipeBase, public VPUser {
public:		public:
VPBranchOnMaskRecipe(VPValue *BlockInMask) : VPRecipeBase(VPBranchOnMaskSC) {		VPBranchOnMaskRecipe(VPValue *BlockInMask) : VPRecipeBase(VPBranchOnMaskSC) {
if (BlockInMask) // nullptr means all-one mask.		if (BlockInMask) // nullptr means all-one mask.
addOperand(BlockInMask);		addOperand(BlockInMask);
}		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPBranchOnMaskSC;		return V->getVPRecipeID() == VPRecipeBase::VPBranchOnMaskSC;
}		}

/// Generate the extraction of the appropriate bit from the block mask and the		/// Generate the extraction of the appropriate bit from the block mask and the
/// conditional branch.		/// conditional branch.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override {		VPSlotTracker &SlotTracker) const override {
O << " +\n" << Indent << "\"BRANCH-ON-MASK ";		O << Indent << "\"BRANCH-ON-MASK ";
if (VPValue *Mask = getMask())		if (VPValue *Mask = getMask())
Mask->print(O, SlotTracker);		Mask->print(O, SlotTracker);
else		else
O << " All-One";		O << " All-One";
O << "\\l\"";		O << "\\l\"";
}		}

/// Return the mask used by this recipe. Note that a full mask is represented		/// Return the mask used by this recipe. Note that a full mask is represented
Show All 23 Lines	public:
/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPPredInstPHISC;		return V->getVPRecipeID() == VPRecipeBase::VPPredInstPHISC;
}		}

/// Generates phi nodes for live-outs as needed to retain SSA form.		/// Generates phi nodes for live-outs as needed to retain SSA form.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A Recipe for widening load/store operations.		/// A Recipe for widening load/store operations.
/// The recipe uses the following VPValues:		/// The recipe uses the following VPValues:
/// - For load: Address, optional mask		/// - For load: Address, optional mask
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	VPValue *getStoredValue() const {
assert(isa<StoreInst>(getUnderlyingInstr()) &&		assert(isa<StoreInst>(getUnderlyingInstr()) &&
"Stored value only available for store instructions");		"Stored value only available for store instructions");
return getOperand(1); // Stored value is the 2nd, mandatory operand.		return getOperand(1); // Stored value is the 2nd, mandatory operand.
}		}

/// Generate the wide load/store.		/// Generate the wide load/store.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// A Recipe for widening the canonical induction variable of the vector loop.		/// A Recipe for widening the canonical induction variable of the vector loop.
class VPWidenCanonicalIVRecipe : public VPRecipeBase {		class VPWidenCanonicalIVRecipe : public VPRecipeBase {
/// A VPValue representing the canonical vector IV.		/// A VPValue representing the canonical vector IV.
Show All 13 Lines	static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPWidenCanonicalIVSC;		return V->getVPRecipeID() == VPRecipeBase::VPWidenCanonicalIVSC;
}		}

/// Generate a canonical vector induction variable of the vector loop, with		/// Generate a canonical vector induction variable of the vector loop, with
/// start = {<PartVF, PartVF+1, ..., Part*VF+VF-1> for 0 <= Part < UF}, and		/// start = {<PartVF, PartVF+1, ..., Part*VF+VF-1> for 0 <= Part < UF}, and
/// step = <VFUF, VFUF, ..., VF*UF>.		/// step = <VFUF, VFUF, ..., VF*UF>.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// VPBasicBlock serves as the leaf of the Hierarchical Control-Flow Graph. It		/// VPBasicBlock serves as the leaf of the Hierarchical Control-Flow Graph. It
/// holds a sequence of zero or more VPRecipe's each representing a sequence of		/// holds a sequence of zero or more VPRecipe's each representing a sequence of
/// output IR instructions.		/// output IR instructions.
class VPBasicBlock : public VPBlockBase {		class VPBasicBlock : public VPBlockBase {
public:		public:
using RecipeListTy = iplist<VPRecipeBase>;		using RecipeListTy = iplist<VPRecipeBase>;

private:		private:
/// The VPRecipes held in the order of output instructions to generate.		/// The VPRecipes held in the order of output instructions to generate.
RecipeListTy Recipes;		RecipeListTy Recipes;

		unsigned ReciprocalPredBlockProb = 1;

public:		public:
VPBasicBlock(const Twine &Name = "", VPRecipeBase *Recipe = nullptr)		VPBasicBlock(const Twine &Name = "", VPRecipeBase *Recipe = nullptr)
: VPBlockBase(VPBasicBlockSC, Name.str()) {		: VPBlockBase(VPBasicBlockSC, Name.str()) {
if (Recipe)		if (Recipe)
appendRecipe(Recipe);		appendRecipe(Recipe);
}		}

~VPBasicBlock() override { Recipes.clear(); }		~VPBasicBlock() override { Recipes.clear(); }
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	public:

/// Replace all operands of VPUsers in the block with \p NewValue and also		/// Replace all operands of VPUsers in the block with \p NewValue and also
/// replaces all uses of VPValues defined in the block with NewValue.		/// replaces all uses of VPValues defined in the block with NewValue.
void dropAllReferences(VPValue *NewValue);		void dropAllReferences(VPValue *NewValue);

/// Return the position of the first non-phi node recipe in the block.		/// Return the position of the first non-phi node recipe in the block.
iterator getFirstNonPhi();		iterator getFirstNonPhi();

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;

		unsigned getReciprocalPredBlockProb() const {
		return ReciprocalPredBlockProb;
		}
		void setReciprocalPredBlockProb(unsigned V) { ReciprocalPredBlockProb = V; }

private:		private:
/// Create an IR BasicBlock to hold the output instructions generated by this		/// Create an IR BasicBlock to hold the output instructions generated by this
/// VPBasicBlock, and return it. Update the CFGState accordingly.		/// VPBasicBlock, and return it. Update the CFGState accordingly.
BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);		BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);
};		};

/// VPRegionBlock represents a collection of VPBasicBlocks and VPRegionBlocks		/// VPRegionBlock represents a collection of VPBasicBlocks and VPRegionBlocks
/// which form a Single-Entry-Single-Exit subgraph of the output IR CFG.		/// which form a Single-Entry-Single-Exit subgraph of the output IR CFG.
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	public:

/// An indicator whether this region is to generate multiple replicated		/// An indicator whether this region is to generate multiple replicated
/// instances of output IR corresponding to its VPBlockBases.		/// instances of output IR corresponding to its VPBlockBases.
bool isReplicator() const { return IsReplicator; }		bool isReplicator() const { return IsReplicator; }

/// The method which generates the output IR instructions that correspond to		/// The method which generates the output IR instructions that correspond to
/// this VPRegionBlock, thereby "executing" the VPlan.		/// this VPRegionBlock, thereby "executing" the VPlan.
void execute(struct VPTransformState *State) override;		void execute(struct VPTransformState *State) override;

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx) override;
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GraphTraits specializations for VPlan Hierarchical Control-Flow Graphs //		// GraphTraits specializations for VPlan Hierarchical Control-Flow Graphs //
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// The following set of template specializations implement GraphTraits to treat		// The following set of template specializations implement GraphTraits to treat
// any VPBlockBase as a node in a graph of VPBlockBases. It's important to note		// any VPBlockBase as a node in a graph of VPBlockBases. It's important to note
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	for (VPValue *Def : VPExternalDefs)
delete Def;		delete Def;
for (VPValue *CBV : VPCBVs)		for (VPValue *CBV : VPCBVs)
delete CBV;		delete CBV;
}		}

/// Generate the IR code for this VPlan.		/// Generate the IR code for this VPlan.
void execute(struct VPTransformState *State);		void execute(struct VPTransformState *State);

		VectorizationCostTy cost(ElementCount VF, VPCostContext &Ctx);

VPBlockBase *getEntry() { return Entry; }		VPBlockBase *getEntry() { return Entry; }
const VPBlockBase *getEntry() const { return Entry; }		const VPBlockBase *getEntry() const { return Entry; }

VPBlockBase setEntry(VPBlockBase Block) {		VPBlockBase setEntry(VPBlockBase Block) {
Entry = Block;		Entry = Block;
Block->setPlan(this);		Block->setPlan(this);
return Entry;		return Entry;
}		}

/// The backedge taken count of the original loop.		/// The backedge taken count of the original loop.
VPValue *getOrCreateBackedgeTakenCount() {		VPValue *getOrCreateBackedgeTakenCount() {
if (!BackedgeTakenCount)		if (!BackedgeTakenCount)
BackedgeTakenCount = new VPValue();		BackedgeTakenCount = new VPValue();
return BackedgeTakenCount;		return BackedgeTakenCount;
}		}

		const SmallSetVector<ElementCount, 2> &getVFs() const { return VFs; }

void addVF(ElementCount VF) { VFs.insert(VF); }		void addVF(ElementCount VF) { VFs.insert(VF); }

bool hasVF(ElementCount VF) { return VFs.count(VF); }		bool hasVF(ElementCount VF) { return VFs.count(VF); }

const std::string &getName() const { return Name; }		const std::string &getName() const { return Name; }

void setName(const Twine &newName) { Name = newName.str(); }		void setName(const Twine &newName) { Name = newName.str(); }

▲ Show 20 Lines • Show All 374 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/interleave-load-i32.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt -loop-vectorize -S -mattr=avx512f --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s		; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=false --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
		; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=true --debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

@A = global [10240 x i32] zeroinitializer, align 16		@A = global [10240 x i32] zeroinitializer, align 16
@B = global [10240 x i32] zeroinitializer, align 16		@B = global [10240 x i32] zeroinitializer, align 16

; Function Attrs: nounwind uwtable		; Function Attrs: nounwind uwtable
define void @load_i32_interleave4() {		define void @load_i32_interleave4() {
;CHECK-LABEL: load_i32_interleave4		;CHECK-LABEL: load_i32_interleave4
;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %0 = load
;CHECK: Found an estimated cost of 5 for VF 2 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 5 for VF 2 For instruction: %0 = load
;CHECK: Found an estimated cost of 5 for VF 4 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 5 for VF 4 For instruction: %0 = load
;CHECK: Found an estimated cost of 8 for VF 8 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 8 for VF 8 For instruction: %0 = load
;CHECK: Found an estimated cost of 22 for VF 16 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 22 for VF 16 For instruction: %0 = load
		;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %0 = load
		;CHECK-VP: Found an estimated cost of 5 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: Found an estimated cost of 5 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: Found an estimated cost of 22 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 4
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
ret void		ret void

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
Show All 15 Lines	for.body: ; preds = %entry, %for.body
store i32 %add11, i32* %arrayidx13, align 16		store i32 %add11, i32* %arrayidx13, align 16
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 4		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 4
%cmp = icmp slt i64 %indvars.iv.next, 1024		%cmp = icmp slt i64 %indvars.iv.next, 1024
br i1 %cmp, label %for.body, label %for.cond.cleanup		br i1 %cmp, label %for.body, label %for.cond.cleanup
}		}

define void @load_i32_interleave5() {		define void @load_i32_interleave5() {
;CHECK-LABEL: load_i32_interleave5		;CHECK-LABEL: load_i32_interleave5
;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %0 = load
;CHECK: Found an estimated cost of 6 for VF 2 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 6 for VF 2 For instruction: %0 = load
;CHECK: Found an estimated cost of 9 for VF 4 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 9 for VF 4 For instruction: %0 = load
;CHECK: Found an estimated cost of 18 for VF 8 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 18 for VF 8 For instruction: %0 = load
;CHECK: Found an estimated cost of 35 for VF 16 For instruction: %0 = load		;CHECK-CM: Found an estimated cost of 35 for VF 16 For instruction: %0 = load
		;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %0 = load
		;CHECK-VP: Found an estimated cost of 6 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 5
		;CHECK-VP: Found an estimated cost of 9 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 5
		;CHECK-VP: Found an estimated cost of 18 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 5
		;CHECK-VP: Found an estimated cost of 35 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 5
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
ret void		ret void

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
Show All 24 Lines

llvm/test/Analysis/CostModel/X86/interleave-store-i32.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt -loop-vectorize -S -mattr=avx512f --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s		; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=false --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-CM
		; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=true --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-VP

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

@A = global [10240 x i32] zeroinitializer, align 16		@A = global [10240 x i32] zeroinitializer, align 16
@B = global [10240 x i32] zeroinitializer, align 16		@B = global [10240 x i32] zeroinitializer, align 16

; Function Attrs: nounwind uwtable		; Function Attrs: nounwind uwtable
define void @store_i32_interleave4() {		define void @store_i32_interleave4() {
;CHECK-LABEL: store_i32_interleave4		;CHECK-LABEL: store_i32_interleave4
;CHECK: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add16		;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add16
;CHECK: Found an estimated cost of 5 for VF 2 For instruction: store i32 %add16		;CHECK-CM: Found an estimated cost of 5 for VF 2 For instruction: store i32 %add16
;CHECK: Found an estimated cost of 5 for VF 4 For instruction: store i32 %add16		;CHECK-CM: Found an estimated cost of 5 for VF 4 For instruction: store i32 %add16
;CHECK: Found an estimated cost of 11 for VF 8 For instruction: store i32 %add16		;CHECK-CM: Found an estimated cost of 11 for VF 8 For instruction: store i32 %add16
;CHECK: Found an estimated cost of 22 for VF 16 For instruction: store i32 %add16		;CHECK-CM: Found an estimated cost of 22 for VF 16 For instruction: store i32 %add16
		;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE store %add16
		;CHECK-VP: Found an estimated cost of 5 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: Found an estimated cost of 5 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: Found an estimated cost of 11 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: Found an estimated cost of 22 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 4
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
ret void		ret void

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
Show All 15 Lines	for.body: ; preds = %entry, %for.body
store i32 %add16, i32* %arrayidx19, align 4		store i32 %add16, i32* %arrayidx19, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 4		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 4
%cmp = icmp slt i64 %indvars.iv.next, 1024		%cmp = icmp slt i64 %indvars.iv.next, 1024
br i1 %cmp, label %for.body, label %for.cond.cleanup		br i1 %cmp, label %for.body, label %for.cond.cleanup
}		}

define void @store_i32_interleave5() {		define void @store_i32_interleave5() {
;CHECK-LABEL: store_i32_interleave5		;CHECK-LABEL: store_i32_interleave5
;CHECK: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add22		;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add22
;CHECK: Found an estimated cost of 7 for VF 2 For instruction: store i32 %add22		;CHECK-CM: Found an estimated cost of 7 for VF 2 For instruction: store i32 %add22
;CHECK: Found an estimated cost of 14 for VF 4 For instruction: store i32 %add22		;CHECK-CM: Found an estimated cost of 14 for VF 4 For instruction: store i32 %add22
;CHECK: Found an estimated cost of 21 for VF 8 For instruction: store i32 %add22		;CHECK-CM: Found an estimated cost of 21 for VF 8 For instruction: store i32 %add22
;CHECK: Found an estimated cost of 35 for VF 16 For instruction: store i32 %add22		;CHECK-CM: Found an estimated cost of 35 for VF 16 For instruction: store i32 %add22
		;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE store %add22
		;CHECK-VP: Found an estimated cost of 7 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 5
		;CHECK-VP: Found an estimated cost of 14 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 5
		;CHECK-VP: Found an estimated cost of 21 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 5
		;CHECK-VP: Found an estimated cost of 35 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 5
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
ret void		ret void

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
Show All 24 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-float.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=skylake %s 2>&1 \| FileCheck %s		; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=skylake -cost-using-vplan=false %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
		; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=skylake -cost-using-vplan=true %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"		target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"		target triple = "i386-unknown-linux-gnu"

@src = common local_unnamed_addr global [120 x float] zeroinitializer, align 4		@src = common local_unnamed_addr global [120 x float] zeroinitializer, align 4
@dst = common local_unnamed_addr global [120 x float] zeroinitializer, align 4		@dst = common local_unnamed_addr global [120 x float] zeroinitializer, align 4

; Function Attrs: norecurse nounwind		; Function Attrs: norecurse nounwind
define void @stride8(float %k, i32 %width_) {		define void @stride8(float %k, i32 %width_) {
entry:		entry:

; CHECK: Found an estimated cost of 48 for VF 8 For instruction: %0 = load float		; CHECK-CM: Found an estimated cost of 48 for VF 8 For instruction: %0 = load float
		; CHECK-VP: Found an estimated cost of 48 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 8

%cmp72 = icmp sgt i32 %width_, 0		%cmp72 = icmp sgt i32 %width_, 0
br i1 %cmp72, label %for.body.lr.ph, label %for.cond.cleanup		br i1 %cmp72, label %for.body.lr.ph, label %for.cond.cleanup

for.body.lr.ph: ; preds = %entry		for.body.lr.ph: ; preds = %entry
br label %for.body		br label %for.body

for.cond.cleanup.loopexit: ; preds = %for.body		for.cond.cleanup.loopexit: ; preds = %for.body
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body.lr.ph, %for.body
%cmp = icmp slt i32 %add46, %width_		%cmp = icmp slt i32 %add46, %width_
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit		br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}		}

; Function Attrs: norecurse nounwind		; Function Attrs: norecurse nounwind
define void @stride3(float %k, i32 %width_) {		define void @stride3(float %k, i32 %width_) {
entry:		entry:

; CHECK: Found an estimated cost of 20 for VF 8 For instruction: %0 = load float		; CHECK-CM: Found an estimated cost of 20 for VF 8 For instruction: %0 = load float
		; CHECK-VP: Found an estimated cost of 20 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 3

%cmp27 = icmp sgt i32 %width_, 0		%cmp27 = icmp sgt i32 %width_, 0
br i1 %cmp27, label %for.body.lr.ph, label %for.cond.cleanup		br i1 %cmp27, label %for.body.lr.ph, label %for.cond.cleanup

for.body.lr.ph: ; preds = %entry		for.body.lr.ph: ; preds = %entry
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body, %entry		for.cond.cleanup: ; preds = %for.body, %entry
Show All 32 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -loop-vectorize -S -mcpu=core-avx2 --debug-only=loop-vectorize -vectorizer-maximize-bandwidth < %s 2>&1 \| FileCheck %s			; RUN: opt -loop-vectorize -S -mcpu=core-avx2 -cost-using-vplan=false --debug-only=loop-vectorize -vectorizer-maximize-bandwidth < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -loop-vectorize -S -mcpu=core-avx2 -cost-using-vplan=true --debug-only=loop-vectorize -vectorizer-maximize-bandwidth < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: norecurse nounwind readonly uwtable			; Function Attrs: norecurse nounwind readonly uwtable
	define i32 @doit_stride3(i8* nocapture readonly %Ptr, i32 %Nels) {			define i32 @doit_stride3(i8* nocapture readonly %Ptr, i32 %Nels) {
	;CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 11 for VF 2 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 11 for VF 2 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 5 for VF 4 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 5 for VF 4 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 10 for VF 8 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 10 for VF 8 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 13 for VF 16 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 13 for VF 16 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 16 for VF 32 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 16 for VF 32 For instruction: %0 = load i8
				;CHECK-VP: LV: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %0 = load
				;CHECK-VP: LV: Found an estimated cost of 11 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 3
				;CHECK-VP: LV: Found an estimated cost of 5 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 3
				;CHECK-VP: LV: Found an estimated cost of 10 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 3
				;CHECK-VP: LV: Found an estimated cost of 13 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 3
				;CHECK-VP: LV: Found an estimated cost of 16 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 3
	entry:			entry:
	%cmp13 = icmp sgt i32 %Nels, 0			%cmp13 = icmp sgt i32 %Nels, 0
	br i1 %cmp13, label %for.body.preheader, label %for.end			br i1 %cmp13, label %for.body.preheader, label %for.end

	for.body.preheader:			for.body.preheader:
	br label %for.body			br label %for.body

	for.body:			for.body:
	Show All 22 Lines

	for.end:			for.end:
	%s.0.lcssa = phi i32 [ 0, %entry ], [ %add6.lcssa, %for.end.loopexit ]			%s.0.lcssa = phi i32 [ 0, %entry ], [ %add6.lcssa, %for.end.loopexit ]
	ret i32 %s.0.lcssa			ret i32 %s.0.lcssa
	}			}

	; Function Attrs: norecurse nounwind readonly uwtable			; Function Attrs: norecurse nounwind readonly uwtable
	define i32 @doit_stride4(i8* nocapture readonly %Ptr, i32 %Nels) local_unnamed_addr {			define i32 @doit_stride4(i8* nocapture readonly %Ptr, i32 %Nels) local_unnamed_addr {
	;CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 13 for VF 2 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 13 for VF 2 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 5 for VF 4 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 5 for VF 4 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 21 for VF 8 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 21 for VF 8 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 41 for VF 16 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 41 for VF 16 For instruction: %0 = load i8
	;CHECK: LV: Found an estimated cost of 84 for VF 32 For instruction: %0 = load i8			;CHECK-CM: LV: Found an estimated cost of 84 for VF 32 For instruction: %0 = load i8
				;CHECK-VP: LV: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %0 = load
				;CHECK-VP: LV: Found an estimated cost of 13 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4
				;CHECK-VP: LV: Found an estimated cost of 5 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4
				;CHECK-VP: LV: Found an estimated cost of 21 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4
				;CHECK-VP: LV: Found an estimated cost of 41 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 4
				;CHECK-VP: LV: Found an estimated cost of 84 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 4
	entry:			entry:
	%cmp59 = icmp sgt i32 %Nels, 0			%cmp59 = icmp sgt i32 %Nels, 0
	br i1 %cmp59, label %for.body.preheader, label %for.end			br i1 %cmp59, label %for.body.preheader, label %for.end

	for.body.preheader:			for.body.preheader:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body.preheader, %for.body			for.body: ; preds = %for.body.preheader, %for.body
	Show All 32 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-store-double.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=skylake %s 2>&1 \| FileCheck %s			; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=skylake -cost-using-vplan=false %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=skylake -cost-using-vplan=true %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
	target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"			target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
	target triple = "i386-unknown-linux-gnu"			target triple = "i386-unknown-linux-gnu"

	@doublesrc = common local_unnamed_addr global [120 x double] zeroinitializer, align 4			@doublesrc = common local_unnamed_addr global [120 x double] zeroinitializer, align 4
	@doubledst = common local_unnamed_addr global [120 x double] zeroinitializer, align 4			@doubledst = common local_unnamed_addr global [120 x double] zeroinitializer, align 4

	; Function Attrs: norecurse nounwind			; Function Attrs: norecurse nounwind
	define void @stride2double(double %k, i32 %width_) {			define void @stride2double(double %k, i32 %width_) {
	entry:			entry:

	; CHECK: Found an estimated cost of 8 for VF 4 For instruction: %0 = load double			; CHECK-CM: Found an estimated cost of 8 for VF 4 For instruction: %0 = load double
	; CHECK: Found an estimated cost of 8 for VF 4 For instruction: store double			; CHECK-CM: Found an estimated cost of 8 for VF 4 For instruction: store double
				; CHECK-VP: Found an estimated cost of 8 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %0
				; CHECK-VP: Found an estimated cost of 8 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2

	%cmp27 = icmp sgt i32 %width_, 0			%cmp27 = icmp sgt i32 %width_, 0
	br i1 %cmp27, label %for.body.lr.ph, label %for.cond.cleanup			br i1 %cmp27, label %for.body.lr.ph, label %for.cond.cleanup

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body, %entry			for.cond.cleanup: ; preds = %for.body, %entry
	Show All 18 Lines

llvm/test/Analysis/CostModel/X86/interleaved-load-store-i64.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=core-avx2 %s 2>&1 \| FileCheck %s			; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=core-avx2 -cost-using-vplan=false %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -S -loop-vectorize -debug-only=loop-vectorize -mcpu=core-avx2 -cost-using-vplan=true %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
	target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"			target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
	target triple = "i386-unknown-linux-gnu"			target triple = "i386-unknown-linux-gnu"

	@i64src = common local_unnamed_addr global [120 x i64] zeroinitializer, align 4			@i64src = common local_unnamed_addr global [120 x i64] zeroinitializer, align 4
	@i64dst = common local_unnamed_addr global [120 x i64] zeroinitializer, align 4			@i64dst = common local_unnamed_addr global [120 x i64] zeroinitializer, align 4

	; Function Attrs: norecurse nounwind			; Function Attrs: norecurse nounwind
	define void @stride2i64(i64 %k, i32 %width_) {			define void @stride2i64(i64 %k, i32 %width_) {
	entry:			entry:

	; CHECK: Found an estimated cost of 8 for VF 4 For instruction: %0 = load i64			; CHECK-CM: Found an estimated cost of 8 for VF 4 For instruction: %0 = load i64
	; CHECK: Found an estimated cost of 8 for VF 4 For instruction: store i64			; CHECK-CM: Found an estimated cost of 8 for VF 4 For instruction: store i64
				; CHECK-VP: Found an estimated cost of 8 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %0
				; CHECK-VP: Found an estimated cost of 8 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2

	%cmp27 = icmp sgt i32 %width_, 0			%cmp27 = icmp sgt i32 %width_, 0
	br i1 %cmp27, label %for.body.lr.ph, label %for.cond.cleanup			br i1 %cmp27, label %for.body.lr.ph, label %for.cond.cleanup

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body, %entry			for.cond.cleanup: ; preds = %for.body, %entry
	Show All 18 Lines

llvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt -loop-vectorize -S -mcpu=core-avx2 --debug-only=loop-vectorize -vectorizer-maximize-bandwidth < %s 2>&1 \| FileCheck %s		; RUN: opt -loop-vectorize -S -mcpu=core-avx2 -cost-using-vplan=false --debug-only=loop-vectorize -vectorizer-maximize-bandwidth < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
		; RUN: opt -loop-vectorize -S -mcpu=core-avx2 -cost-using-vplan=true --debug-only=loop-vectorize -vectorizer-maximize-bandwidth < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: norecurse nounwind uwtable		; Function Attrs: norecurse nounwind uwtable
define void @doit_stride3(i8* nocapture %Ptr, i32 %Nels) local_unnamed_addr {		define void @doit_stride3(i8* nocapture %Ptr, i32 %Nels) local_unnamed_addr {
;CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %conv4		;CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %conv4
;CHECK: LV: Found an estimated cost of 8 for VF 2 For instruction: store i8 %conv4		;CHECK-CM: LV: Found an estimated cost of 8 for VF 2 For instruction: store i8 %conv4
;CHECK: LV: Found an estimated cost of 9 for VF 4 For instruction: store i8 %conv4		;CHECK-CM: LV: Found an estimated cost of 9 for VF 4 For instruction: store i8 %conv4
;CHECK: LV: Found an estimated cost of 12 for VF 8 For instruction: store i8 %conv4		;CHECK-CM: LV: Found an estimated cost of 12 for VF 8 For instruction: store i8 %conv4
;CHECK: LV: Found an estimated cost of 13 for VF 16 For instruction: store i8 %conv4		;CHECK-CM: LV: Found an estimated cost of 13 for VF 16 For instruction: store i8 %conv4
;CHECK: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %conv4		;CHECK-CM: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %conv4
		;CHECK-VP: LV: Found an estimated cost of 1 for VF 1 For recipe: "CLONE store %conv
		;CHECK-VP: LV: Found an estimated cost of 8 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 3
		;CHECK-VP: LV: Found an estimated cost of 9 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 3
		;CHECK-VP: LV: Found an estimated cost of 12 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 3
		;CHECK-VP: LV: Found an estimated cost of 13 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 3
		;CHECK-VP: LV: Found an estimated cost of 16 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 3
entry:		entry:
%cmp14 = icmp sgt i32 %Nels, 0		%cmp14 = icmp sgt i32 %Nels, 0
br i1 %cmp14, label %for.body.lr.ph, label %for.end		br i1 %cmp14, label %for.body.lr.ph, label %for.end

for.body.lr.ph:		for.body.lr.ph:
%conv = trunc i32 %Nels to i8		%conv = trunc i32 %Nels to i8
%conv1 = shl i8 %conv, 1		%conv1 = shl i8 %conv, 1
%conv4 = shl i8 %conv, 2		%conv4 = shl i8 %conv, 2
Show All 16 Lines	for.end.loopexit:
br label %for.end		br label %for.end

for.end:		for.end:
ret void		ret void
}		}

; Function Attrs: norecurse nounwind uwtable		; Function Attrs: norecurse nounwind uwtable
define void @doit_stride4(i8* nocapture %Ptr, i32 %Nels) local_unnamed_addr {		define void @doit_stride4(i8* nocapture %Ptr, i32 %Nels) local_unnamed_addr {
;CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %conv7		;CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %conv7
;CHECK: LV: Found an estimated cost of 13 for VF 2 For instruction: store i8 %conv7		;CHECK-CM: LV: Found an estimated cost of 13 for VF 2 For instruction: store i8 %conv7
;CHECK: LV: Found an estimated cost of 10 for VF 4 For instruction: store i8 %conv7		;CHECK-CM: LV: Found an estimated cost of 10 for VF 4 For instruction: store i8 %conv7
;CHECK: LV: Found an estimated cost of 11 for VF 8 For instruction: store i8 %conv7		;CHECK-CM: LV: Found an estimated cost of 11 for VF 8 For instruction: store i8 %conv7
;CHECK: LV: Found an estimated cost of 12 for VF 16 For instruction: store i8 %conv7		;CHECK-CM: LV: Found an estimated cost of 12 for VF 16 For instruction: store i8 %conv7
;CHECK: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %conv7		;CHECK-CM: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %conv7
		;CHECK-VP: LV: Found an estimated cost of 1 for VF 1 For recipe: "CLONE store %conv
		;CHECK-VP: LV: Found an estimated cost of 13 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: LV: Found an estimated cost of 10 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: LV: Found an estimated cost of 11 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: LV: Found an estimated cost of 12 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 4
		;CHECK-VP: LV: Found an estimated cost of 16 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 4
entry:		entry:
%cmp19 = icmp sgt i32 %Nels, 0		%cmp19 = icmp sgt i32 %Nels, 0
br i1 %cmp19, label %for.body.lr.ph, label %for.end		br i1 %cmp19, label %for.body.lr.ph, label %for.end

for.body.lr.ph:		for.body.lr.ph:
%conv = trunc i32 %Nels to i8		%conv = trunc i32 %Nels to i8
%conv1 = shl i8 %conv, 1		%conv1 = shl i8 %conv, 1
%conv4 = shl i8 %conv, 2		%conv4 = shl i8 %conv, 2
Show All 25 Lines

llvm/test/Analysis/CostModel/X86/strided-load-i16.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -loop-vectorize -S -mattr=avx512bw --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s			; RUN: opt -loop-vectorize -S -mattr=avx512bw -cost-using-vplan=false --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -loop-vectorize -S -mattr=avx512bw -cost-using-vplan=true --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [10240 x i16] zeroinitializer, align 16			@A = global [10240 x i16] zeroinitializer, align 16
	@B = global [10240 x i16] zeroinitializer, align 16			@B = global [10240 x i16] zeroinitializer, align 16

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @load_i16_stride2() {			define void @load_i16_stride2() {
	;CHECK-LABEL: load_i16_stride2			;CHECK-LABEL: load_i16_stride2
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 3 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 3 for VF 32 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 3 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = shl nsw i64 %indvars.iv, 1			%0 = shl nsw i64 %indvars.iv, 1
	%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
	%1 = load i16, i16* %arrayidx, align 4			%1 = load i16, i16* %arrayidx, align 4
	%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv
	store i16 %1, i16* %arrayidx2, align 2			store i16 %1, i16* %arrayidx2, align 2
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i16_stride3() {			define void @load_i16_stride3() {
	;CHECK-LABEL: load_i16_stride3			;CHECK-LABEL: load_i16_stride3
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 3 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 3 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 5 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 5 for VF 32 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 3 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 5 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 3			%0 = mul nsw i64 %indvars.iv, 3
	%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
	%1 = load i16, i16* %arrayidx, align 4			%1 = load i16, i16* %arrayidx, align 4
	%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv
	store i16 %1, i16* %arrayidx2, align 2			store i16 %1, i16* %arrayidx2, align 2
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i16_stride4() {			define void @load_i16_stride4() {
	;CHECK-LABEL: load_i16_stride4			;CHECK-LABEL: load_i16_stride4
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 3 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 3 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 8 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 8 for VF 32 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 3 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 8 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = shl nsw i64 %indvars.iv, 2			%0 = shl nsw i64 %indvars.iv, 2
	%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
	%1 = load i16, i16* %arrayidx, align 4			%1 = load i16, i16* %arrayidx, align 4
	%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i16], [10240 x i16]* @B, i64 0, i64 %indvars.iv
	store i16 %1, i16* %arrayidx2, align 2			store i16 %1, i16* %arrayidx2, align 2
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i16_stride5() {			define void @load_i16_stride5() {
	;CHECK-LABEL: load_i16_stride5			;CHECK-LABEL: load_i16_stride5
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 3 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 3 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 5 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 5 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 10 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 10 for VF 32 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 2 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 3 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 5 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 10 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 5			%0 = mul nsw i64 %indvars.iv, 5
	%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i16], [10240 x i16]* @A, i64 0, i64 %0
	%1 = load i16, i16* %arrayidx, align 4			%1 = load i16, i16* %arrayidx, align 4
	Show All 9 Lines

llvm/test/Analysis/CostModel/X86/strided-load-i32.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -loop-vectorize -S -mattr=avx512f --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s			; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=false --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=true --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [10240 x i32] zeroinitializer, align 16			@A = global [10240 x i32] zeroinitializer, align 16
	@B = global [10240 x i32] zeroinitializer, align 16			@B = global [10240 x i32] zeroinitializer, align 16

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @load_int_stride2() {			define void @load_int_stride2() {
	;CHECK-LABEL: load_int_stride2			;CHECK-LABEL: load_int_stride2
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 16 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = shl nsw i64 %indvars.iv, 1			%0 = shl nsw i64 %indvars.iv, 1
	%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
	store i32 %1, i32* %arrayidx2, align 2			store i32 %1, i32* %arrayidx2, align 2
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_int_stride3() {			define void @load_int_stride3() {
	;CHECK-LABEL: load_int_stride3			;CHECK-LABEL: load_int_stride3
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 3 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 3 for VF 16 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 3 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 3			%0 = mul nsw i64 %indvars.iv, 3
	%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
	store i32 %1, i32* %arrayidx2, align 2			store i32 %1, i32* %arrayidx2, align 2
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_int_stride4() {			define void @load_int_stride4() {
	;CHECK-LABEL: load_int_stride4			;CHECK-LABEL: load_int_stride4
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 5 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 5 for VF 16 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 5 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = shl nsw i64 %indvars.iv, 2			%0 = shl nsw i64 %indvars.iv, 2
	%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i32], [10240 x i32]* @B, i64 0, i64 %indvars.iv
	store i32 %1, i32* %arrayidx2, align 2			store i32 %1, i32* %arrayidx2, align 2
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_int_stride5() {			define void @load_int_stride5() {
	;CHECK-LABEL: load_int_stride5			;CHECK-LABEL: load_int_stride5
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 3 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 3 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 6 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 6 for VF 16 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 3 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 6 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 5			%0 = mul nsw i64 %indvars.iv, 5
	%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i32], [10240 x i32]* @A, i64 0, i64 %0
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	Show All 10 Lines

llvm/test/Analysis/CostModel/X86/strided-load-i64.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -loop-vectorize -S -mattr=avx512f --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s			; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=false --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -loop-vectorize -S -mattr=avx512f -cost-using-vplan=true --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [10240 x i64] zeroinitializer, align 16			@A = global [10240 x i64] zeroinitializer, align 16
	@B = global [10240 x i64] zeroinitializer, align 16			@B = global [10240 x i64] zeroinitializer, align 16

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @load_i64_stride2() {			define void @load_i64_stride2() {
	;CHECK-LABEL: load_i64_stride2			;CHECK-LABEL: load_i64_stride2
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 8 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = shl nsw i64 %indvars.iv, 1			%0 = shl nsw i64 %indvars.iv, 1
	%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0
	%1 = load i64, i64* %arrayidx, align 16			%1 = load i64, i64* %arrayidx, align 16
	%arrayidx2 = getelementptr inbounds [10240 x i64], [10240 x i64]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i64], [10240 x i64]* @B, i64 0, i64 %indvars.iv
	store i64 %1, i64* %arrayidx2, align 8			store i64 %1, i64* %arrayidx2, align 8
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i64_stride3() {			define void @load_i64_stride3() {
	;CHECK-LABEL: load_i64_stride3			;CHECK-LABEL: load_i64_stride3
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 3 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 3 for VF 8 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 3 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 3			%0 = mul nsw i64 %indvars.iv, 3
	%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0
	%1 = load i64, i64* %arrayidx, align 16			%1 = load i64, i64* %arrayidx, align 16
	%arrayidx2 = getelementptr inbounds [10240 x i64], [10240 x i64]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i64], [10240 x i64]* @B, i64 0, i64 %indvars.iv
	store i64 %1, i64* %arrayidx2, align 8			store i64 %1, i64* %arrayidx2, align 8
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i64_stride4() {			define void @load_i64_stride4() {
	;CHECK-LABEL: load_i64_stride4			;CHECK-LABEL: load_i64_stride4
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 2 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 2 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 5 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 5 for VF 8 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 5 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 4			%0 = mul nsw i64 %indvars.iv, 4
	%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i64], [10240 x i64]* @A, i64 0, i64 %0
	%1 = load i64, i64* %arrayidx, align 16			%1 = load i64, i64* %arrayidx, align 16
	Show All 9 Lines

llvm/test/Analysis/CostModel/X86/strided-load-i8.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -loop-vectorize -S -mattr=avx512bw --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s			; RUN: opt -loop-vectorize -S -mattr=avx512bw -cost-using-vplan=false --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -loop-vectorize -S -mattr=avx512bw -cost-using-vplan=true --debug-only=loop-vectorize < %s 2>&1\| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [10240 x i8] zeroinitializer, align 16			@A = global [10240 x i8] zeroinitializer, align 16
	@B = global [10240 x i8] zeroinitializer, align 16			@B = global [10240 x i8] zeroinitializer, align 16

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @load_i8_stride2() {			define void @load_i8_stride2() {
	;CHECK-LABEL: load_i8_stride2			;CHECK-LABEL: load_i8_stride2
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 4 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 4 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 8 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 8 for VF 32 For instruction: %1 = load
	;CHECK: Found an estimated cost of 20 for VF 64 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 20 for VF 64 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 4 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 8 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
				;CHECK-VP: Found an estimated cost of 20 for VF 64 For recipe: "INTERLEAVE-GROUP with factor 2 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = shl nsw i64 %indvars.iv, 1			%0 = shl nsw i64 %indvars.iv, 1
	%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
	%1 = load i8, i8* %arrayidx, align 2			%1 = load i8, i8* %arrayidx, align 2
	%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv
	store i8 %1, i8* %arrayidx2, align 1			store i8 %1, i8* %arrayidx2, align 1
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i8_stride3() {			define void @load_i8_stride3() {
	;CHECK-LABEL: load_i8_stride3			;CHECK-LABEL: load_i8_stride3
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 4 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 4 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 13 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 13 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 16 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 16 for VF 32 For instruction: %1 = load
	;CHECK: Found an estimated cost of 25 for VF 64 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 25 for VF 64 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 13 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 16 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
				;CHECK-VP: Found an estimated cost of 25 for VF 64 For recipe: "INTERLEAVE-GROUP with factor 3 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 3			%0 = mul nsw i64 %indvars.iv, 3
	%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
	%1 = load i8, i8* %arrayidx, align 2			%1 = load i8, i8* %arrayidx, align 2
	%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv
	store i8 %1, i8* %arrayidx2, align 1			store i8 %1, i8* %arrayidx2, align 1
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i8_stride4() {			define void @load_i8_stride4() {
	;CHECK-LABEL: load_i8_stride4			;CHECK-LABEL: load_i8_stride4
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 4 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 4 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 8 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 8 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 20 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 20 for VF 32 For instruction: %1 = load
	;CHECK: Found an estimated cost of 59 for VF 64 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 59 for VF 64 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 1 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 20 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
				;CHECK-VP: Found an estimated cost of 59 for VF 64 For recipe: "INTERLEAVE-GROUP with factor 4 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = shl nsw i64 %indvars.iv, 2			%0 = shl nsw i64 %indvars.iv, 2
	%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
	%1 = load i8, i8* %arrayidx, align 2			%1 = load i8, i8* %arrayidx, align 2
	%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds [10240 x i8], [10240 x i8]* @B, i64 0, i64 %indvars.iv
	store i8 %1, i8* %arrayidx2, align 1			store i8 %1, i8* %arrayidx2, align 1
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1024			%exitcond = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	define void @load_i8_stride5() {			define void @load_i8_stride5() {
	;CHECK-LABEL: load_i8_stride5			;CHECK-LABEL: load_i8_stride5
	;CHECK: Found an estimated cost of 1 for VF 1 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 1 For instruction: %1 = load
	;CHECK: Found an estimated cost of 1 for VF 2 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %1 = load
	;CHECK: Found an estimated cost of 4 for VF 4 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 4 for VF 4 For instruction: %1 = load
	;CHECK: Found an estimated cost of 8 for VF 8 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 8 for VF 8 For instruction: %1 = load
	;CHECK: Found an estimated cost of 20 for VF 16 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 20 for VF 16 For instruction: %1 = load
	;CHECK: Found an estimated cost of 39 for VF 32 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 39 for VF 32 For instruction: %1 = load
	;CHECK: Found an estimated cost of 78 for VF 64 For instruction: %1 = load			;CHECK-CM: Found an estimated cost of 78 for VF 64 For instruction: %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %1 = load
				;CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 20 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 39 for VF 32 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
				;CHECK-VP: Found an estimated cost of 78 for VF 64 For recipe: "INTERLEAVE-GROUP with factor 5 at %1
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%0 = mul nsw i64 %indvars.iv, 5			%0 = mul nsw i64 %indvars.iv, 5
	%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0			%arrayidx = getelementptr inbounds [10240 x i8], [10240 x i8]* @A, i64 0, i64 %0
	%1 = load i8, i8* %arrayidx, align 2			%1 = load i8, i8* %arrayidx, align 2
	Show All 9 Lines

llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -disable-output -debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefix=COST			; RUN: opt < %s -loop-vectorize -disable-output -debug-only=loop-vectorize -cost-using-vplan=false 2>&1 \| FileCheck %s --check-prefix=COST
				; RUN: opt < %s -loop-vectorize -disable-output -debug-only=loop-vectorize -cost-using-vplan=true 2>&1 \| FileCheck %s --check-prefix=COST-VPLAN
	; RUN: opt < %s -loop-vectorize -force-vector-width=2 -instcombine -simplifycfg -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=2 -instcombine -simplifycfg -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; This test checks that we correctly compute the scalarized operands for a			; This test checks that we correctly compute the scalarized operands for a
	; user-specified vectorization factor when interleaving is disabled. We use the			; user-specified vectorization factor when interleaving is disabled. We use the
	; "optsize" attribute to disable all interleaving calculations. A cost of 4			; "optsize" attribute to disable all interleaving calculations. A cost of 4
	; for %tmp4 indicates that we would scalarize it's operand (%tmp3), giving			; for %tmp4 indicates that we would scalarize it's operand (%tmp3), giving
	; %tmp4 a lower scalarization overhead.			; %tmp4 a lower scalarization overhead.
	;			;
	; COST-LABEL: predicated_udiv_scalarized_operand			; COST-LABEL: predicated_udiv_scalarized_operand
	; COST: LV: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i64 %tmp2, %tmp3			; COST: LV: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i64 %tmp2, %tmp3
				; COST-VPLAN-LABEL: predicated_udiv_scalarized_operand
				; COST-VPLAN: LV: Found an estimated cost of 4 for VF 2 For recipe: "REPLICATE %tmp4 = udiv %tmp2, %tmp3 (S->V)
	;			;
	; CHECK-LABEL: @predicated_udiv_scalarized_operand(			; CHECK-LABEL: @predicated_udiv_scalarized_operand(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %entry ], [ [[INDEX_NEXT:%.]], %[[PRED_UDIV_CONTINUE2:.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %entry ], [ [[INDEX_NEXT:%.]], %[[PRED_UDIV_CONTINUE2:.*]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i64> [ zeroinitializer, %entry ], [ [[TMP17:%.]], %[[PRED_UDIV_CONTINUE2]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i64> [ zeroinitializer, %entry ], [ [[TMP17:%.]], %[[PRED_UDIV_CONTINUE2]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i64, i64 %a, i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i64, i64 %a, i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[TMP0]] to <2 x i64>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[TMP0]] to <2 x i64>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 4
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/costmodel.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt < %s -loop-vectorize -cost-using-vplan=false -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt < %s -loop-vectorize -cost-using-vplan=true -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--linux-gnu"

				; This is a series of test cases that show potential differences between the
				; old cost model and the vplan version. The score are not necessarily precise,
				; but just to show differences not tested elsewhere.

				; CHECK-LABEL: predicated_store
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.inc ]
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds i32, i32* %CF_marker_x, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 2 for VF 1 For instruction: %0 = load i32, i32* %arrayidx, align 4
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %cmp1 = icmp eq i32 %0, %fpt
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %cmp1, label %if.then, label %for.inc
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx3 = getelementptr inbounds double, double* %y_data, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 2 for VF 1 For instruction: store double 0.000000e+00, double* %arrayidx3, align 8
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: br label %for.inc
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
				; CHECK-CM: LV: Scalar loop costs: 6.
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.inc ]
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds i32, i32* %CF_marker_x, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %0 = load i32, i32* %arrayidx, align 4
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %cmp1 = icmp eq i32 %0, %fpt
				; CHECK-CM: LV: Found an estimated cost of 3 for VF 2 For instruction: br i1 %cmp1, label %if.then, label %for.inc
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx3 = getelementptr inbounds double, double* %y_data, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 2 for VF 2 For instruction: store double 0.000000e+00, double* %arrayidx3, align 8
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: br label %for.inc
				; CHECK-CM: LV: Found an estimated cost of 2 for VF 2 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
				; CHECK-CM: LV: Vector loop of width 2 costs: 5.
				; CHECK-CM: LV: Selecting VF: 2.
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "WIDEN-INDUCTION %indvars.iv = phi 0, %indvars.iv.next
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "CLONE %arrayidx = getelementptr %CF_marker_x, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 1 For recipe: "CLONE %0 = load %arrayidx
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %cmp1 = icmp %0, %fpt
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "CLONE %arrayidx3 = getelementptr %y_data, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "BRANCH-ON-MASK ir<%cmp1>\l"
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 1 For recipe: "CLONE store 0.000000e+00, %arrayidx3
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 1 For loop induction check (add + icmp)
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For loop backedge cost (br)
				; CHECK-VP: LV: Vector loop of width 1 costs: 6.
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For recipe: "WIDEN-INDUCTION %indvars.iv = phi 0, %indvars.iv.next
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For recipe: "CLONE %arrayidx = getelementptr %CF_marker_x, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For recipe: "WIDEN load ir<%arrayidx>
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For recipe: "WIDEN\l"" %cmp1 = icmp %0, %fpt
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %arrayidx3 = getelementptr %y_data, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 3 for VF 2 For recipe: "BRANCH-ON-MASK ir<%cmp1>\l"
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %arrayidx3
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 2 For loop induction check (add + icmp)
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For loop backedge cost (br)
				; CHECK-VP: LV: Vector loop of width 2 costs: 4.
				; CHECK-VP: LV: Selecting VF: 2.
				define i32 @predicated_store(i32* nocapture readonly %CF_marker_x, double* nocapture %y_data, i32 %num_rows, i32 %fpt) {
				entry:
				%cmp8 = icmp sgt i32 %num_rows, 0
				br i1 %cmp8, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %num_rows to i64
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.inc
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret i32 undef

				for.body: ; preds = %for.body.preheader, %for.inc
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.inc ]
				%arrayidx = getelementptr inbounds i32, i32* %CF_marker_x, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp1 = icmp eq i32 %0, %fpt
				br i1 %cmp1, label %if.then, label %for.inc

				if.then: ; preds = %for.body
				%arrayidx3 = getelementptr inbounds double, double* %y_data, i64 %indvars.iv
				store double 0.000000e+00, double* %arrayidx3, align 8
				br label %for.inc

				for.inc: ; preds = %for.body, %if.then
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
				}

				; CHECK-LABEL: vif
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv = phi i64 [ 0, %for.cond1.preheader.us ], [ %indvars.iv.next, %for.inc.us ]
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx.us = getelementptr inbounds float, float* %b, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 2 for VF 1 For instruction: %1 = load float, float* %arrayidx.us, align 4
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %cmp5.us = fcmp ogt float %1, 0.000000e+00
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %cmp5.us, label %if.then.us, label %for.inc.us
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx9.us = getelementptr inbounds float, float* %a, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 2 for VF 1 For instruction: store float %1, float* %arrayidx9.us, align 4
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: br label %for.inc.us
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us
				; CHECK-CM: LV: Scalar loop costs: 6.
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv = phi i64 [ 0, %for.cond1.preheader.us ], [ %indvars.iv.next, %for.inc.us ]
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx.us = getelementptr inbounds float, float* %b, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %1 = load float, float* %arrayidx.us, align 4
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %cmp5.us = fcmp ogt float %1, 0.000000e+00
				; CHECK-CM: LV: Found an estimated cost of 3 for VF 2 For instruction: br i1 %cmp5.us, label %if.then.us, label %for.inc.us
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx9.us = getelementptr inbounds float, float* %a, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 3 for VF 2 For instruction: store float %1, float* %arrayidx9.us, align 4
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: br label %for.inc.us
				; CHECK-CM: LV: Found an estimated cost of 2 for VF 2 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us
				; CHECK-CM: LV: Vector loop of width 2 costs: 5.
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv = phi i64 [ 0, %for.cond1.preheader.us ], [ %indvars.iv.next, %for.inc.us ]
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx.us = getelementptr inbounds float, float* %b, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 4 For instruction: %1 = load float, float* %arrayidx.us, align 4
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 4 For instruction: %cmp5.us = fcmp ogt float %1, 0.000000e+00
				; CHECK-CM: LV: Found an estimated cost of 9 for VF 4 For instruction: br i1 %cmp5.us, label %if.then.us, label %for.inc.us
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx9.us = getelementptr inbounds float, float* %a, i64 %indvars.iv
				; CHECK-CM: LV: Found an estimated cost of 8 for VF 4 For instruction: store float %1, float* %arrayidx9.us, align 4
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: br label %for.inc.us
				; CHECK-CM: LV: Found an estimated cost of 4 for VF 4 For instruction: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-CM: LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us
				; CHECK-CM: LV: Vector loop of width 4 costs: 6.
				; CHECK-CM: LV: Selecting VF: 2.
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "WIDEN-INDUCTION %indvars.iv = phi 0, %indvars.iv.next
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "CLONE %arrayidx.us = getelementptr %b, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 1 For recipe: "CLONE %1 = load %arrayidx.us
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %cmp5.us = fcmp %1, 0.000000e+00
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "CLONE %arrayidx9.us = getelementptr %a, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For recipe: "BRANCH-ON-MASK ir<%cmp5.us>\l"
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 1 For recipe: "CLONE store %1, %arrayidx9.us
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 1 For loop induction check (add + icmp)
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 1 For loop backedge cost (br)
				; CHECK-VP: LV: Vector loop of width 1 costs: 6.
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For recipe: "WIDEN-INDUCTION %indvars.iv = phi 0, %indvars.iv.next
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For recipe: "CLONE %arrayidx.us = getelementptr %b, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For recipe: "WIDEN load ir<%arrayidx.us>
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For recipe: "WIDEN\l"" %cmp5.us = fcmp %1, 0.000000e+00
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %arrayidx9.us = getelementptr %a, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 3 for VF 2 For recipe: "BRANCH-ON-MASK ir<%cmp5.us>\l"
				; CHECK-VP: LV: Found an estimated cost of 3 for VF 2 For recipe: "REPLICATE store %1, %arrayidx9.us
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 2 For loop induction check (add + icmp)
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 2 For loop backedge cost (br)
				; CHECK-VP: LV: Vector loop of width 2 costs: 5.
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN-INDUCTION %indvars.iv = phi 0, %indvars.iv.next
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 4 For recipe: "CLONE %arrayidx.us = getelementptr %b, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 4 For recipe: "WIDEN load ir<%arrayidx.us>
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 4 For recipe: "WIDEN\l"" %cmp5.us = fcmp %1, 0.000000e+00
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %arrayidx9.us = getelementptr %a, %indvars.iv
				; CHECK-VP: LV: Found an estimated cost of 9 for VF 4 For recipe: "BRANCH-ON-MASK ir<%cmp5.us>\l"
				; CHECK-VP: LV: Found an estimated cost of 8 for VF 4 For recipe: "REPLICATE store %1, %arrayidx9.us
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 4 For loop induction check (add + icmp)
				; CHECK-VP: LV: Found an estimated cost of 0 for VF 4 For loop backedge cost (br)
				; CHECK-VP: LV: Vector loop of width 4 costs: 5.
				; CHECK-VP: LV: Selecting VF: 2.
				define i32 @vif(i32 %ntimes, i32 %LEN, float* %a, float* %b, float* %c, float* %d, float* %e, i32 %aa, i32 %bb, i32 %cc) {
				entry:
				%cmp27 = icmp sgt i32 %ntimes, 0
				br i1 %cmp27, label %for.cond1.preheader.lr.ph, label %for.cond.cleanup

				for.cond1.preheader.lr.ph: ; preds = %entry
				%cmp225 = icmp sgt i32 %LEN, 0
				br i1 %cmp225, label %for.cond1.preheader.us.preheader, label %for.cond1.preheader.preheader

				for.cond1.preheader.preheader: ; preds = %for.cond1.preheader.lr.ph
				br label %for.cond1.preheader

				for.cond1.preheader.us.preheader: ; preds = %for.cond1.preheader.lr.ph
				%wide.trip.count = zext i32 %LEN to i64
				br label %for.cond1.preheader.us

				for.cond1.preheader.us: ; preds = %for.cond1.preheader.us.preheader, %for.cond1.for.cond.cleanup3_crit_edge.us
				%nl.028.us = phi i32 [ %inc12.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ]
				br label %for.body4.us

				for.body4.us: ; preds = %for.cond1.preheader.us, %for.inc.us
				%indvars.iv = phi i64 [ 0, %for.cond1.preheader.us ], [ %indvars.iv.next, %for.inc.us ]
				%arrayidx.us = getelementptr inbounds float, float* %b, i64 %indvars.iv
				%0 = load float, float* %arrayidx.us, align 4
				%cmp5.us = fcmp ogt float %0, 0.000000e+00
				br i1 %cmp5.us, label %if.then.us, label %for.inc.us

				if.then.us: ; preds = %for.body4.us
				%arrayidx9.us = getelementptr inbounds float, float* %a, i64 %indvars.iv
				store float %0, float* %arrayidx9.us, align 4
				br label %for.inc.us

				for.inc.us: ; preds = %if.then.us, %for.body4.us
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us

				for.cond1.for.cond.cleanup3_crit_edge.us: ; preds = %for.inc.us
				%inc12.us = add nuw nsw i32 %nl.028.us, 1
				%exitcond30.not = icmp eq i32 %inc12.us, %ntimes
				br i1 %exitcond30.not, label %for.cond.cleanup.loopexit, label %for.cond1.preheader.us

				for.cond1.preheader: ; preds = %for.cond1.preheader.preheader, %for.cond1.preheader
				%nl.028 = phi i32 [ %inc12, %for.cond1.preheader ], [ 0, %for.cond1.preheader.preheader ]
				%inc12 = add nuw nsw i32 %nl.028, 1
				%exitcond31.not = icmp eq i32 %inc12, %ntimes
				br i1 %exitcond31.not, label %for.cond.cleanup.loopexit33, label %for.cond1.preheader

				for.cond.cleanup.loopexit: ; preds = %for.cond1.for.cond.cleanup3_crit_edge.us
				br label %for.cond.cleanup

				for.cond.cleanup.loopexit33: ; preds = %for.cond1.preheader
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit33, %for.cond.cleanup.loopexit, %entry
				ret i32 0
				}

llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll

	; REQUIRES: asserts			; REQUIRES: asserts

	; RUN: opt -loop-vectorize -mtriple=arm64-apple-ios %s -S -debug -disable-output 2>&1 \| FileCheck --check-prefix=CM %s			; RUN: opt -loop-vectorize -mtriple=arm64-apple-ios -cost-using-vplan=false %s -S -debug -disable-output 2>&1 \| FileCheck --check-prefix=CM-OLD %s
				; RUN: opt -loop-vectorize -mtriple=arm64-apple-ios -cost-using-vplan=true %s -S -debug -disable-output 2>&1 \| FileCheck --check-prefix=CM-VPLAN %s
	; RUN: opt -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 %s -S \| FileCheck --check-prefix=FORCED %s			; RUN: opt -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 %s -S \| FileCheck --check-prefix=FORCED %s

	; Test case from PR41294.			; Test case from PR41294.

	; Check scalar cost for extractvalue. The constant and loop invariant operands are free,			; Check scalar cost for extractvalue. The constant and loop invariant operands are free,
	; leaving cost 3 for scalarizing the result + 2 for executing the op with VF 2.			; leaving cost 3 for scalarizing the result + 2 for executing the op with VF 2.

	; CM: LV: Scalar loop costs: 7.			; CM-OLD: LV: Scalar loop costs: 7.
	; CM: LV: Found an estimated cost of 5 for VF 2 For instruction: %a = extractvalue { i64, i64 } %sv, 0			; CM-OLD: LV: Found an estimated cost of 5 for VF 2 For instruction: %a = extractvalue { i64, i64 } %sv, 0
	; CM-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction: %b = extractvalue { i64, i64 } %sv, 1			; CM-OLD-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction: %b = extractvalue { i64, i64 } %sv, 1
				; CM-VPLAN: LV: Vector loop of width 1 costs: 7.
				; CM-VPLAN: LV: Found an estimated cost of 5 for VF 2 For recipe: "REPLICATE %a = extractvalue %sv
				; CM-VPLAN-NEXT: LV: Found an estimated cost of 5 for VF 2 For recipe: "REPLICATE %b = extractvalue %sv

	; Check that the extractvalue operands are actually free in vector code.			; Check that the extractvalue operands are actually free in vector code.

	; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph			; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph
	; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; FORCED-NEXT: %0 = add i32 %index, 0			; FORCED-NEXT: %0 = add i32 %index, 0
	; FORCED-NEXT: %1 = extractvalue { i64, i64 } %sv, 0			; FORCED-NEXT: %1 = extractvalue { i64, i64 } %sv, 0
	; FORCED-NEXT: %2 = extractvalue { i64, i64 } %sv, 0			; FORCED-NEXT: %2 = extractvalue { i64, i64 } %sv, 0
	Show All 30 Lines
	exit:			exit:
	ret void			ret void
	}			}


	; Similar to the test case above, but checks getVectorCallCost as well.			; Similar to the test case above, but checks getVectorCallCost as well.
	declare float @pow(float, float) readnone nounwind			declare float @pow(float, float) readnone nounwind

	; CM: LV: Scalar loop costs: 16.			; CM-OLD: LV: Scalar loop costs: 16.
	; CM: LV: Found an estimated cost of 5 for VF 2 For instruction: %a = extractvalue { float, float } %sv, 0			; CM-OLD: LV: Found an estimated cost of 5 for VF 2 For instruction: %a = extractvalue { float, float } %sv, 0
	; CM-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction: %b = extractvalue { float, float } %sv, 1			; CM-OLD-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction: %b = extractvalue { float, float } %sv, 1
				; CM-VPLAN: LV: Vector loop of width 1 costs: 16.
				; CM-VPLAN: LV: Found an estimated cost of 5 for VF 2 For recipe: "REPLICATE %a = extractvalue %sv
				; CM-VPLAN-NEXT: LV: Found an estimated cost of 5 for VF 2 For recipe: "REPLICATE %b = extractvalue %sv

	; FORCED-LABEL: define void @test_getVectorCallCost			; FORCED-LABEL: define void @test_getVectorCallCost

	; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph			; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph
	; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; FORCED-NEXT: %0 = add i32 %index, 0			; FORCED-NEXT: %0 = add i32 %index, 0
	; FORCED-NEXT: %1 = extractvalue { float, float } %sv, 0			; FORCED-NEXT: %1 = extractvalue { float, float } %sv, 0
	; FORCED-NEXT: %2 = extractvalue { float, float } %sv, 0			; FORCED-NEXT: %2 = extractvalue { float, float } %sv, 0
	Show All 36 Lines

llvm/test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -cost-using-vplan=false -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt < %s -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -cost-using-vplan=true -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	; This test shows extremely high interleaving cost that, probably, should be fixed.			; This test shows extremely high interleaving cost that, probably, should be fixed.
	; Due to the high cost, interleaving is not beneficial and the cost model chooses to scalarize			; Due to the high cost, interleaving is not beneficial and the cost model chooses to scalarize
	; the load instructions.			; the load instructions.

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	%pair = type { i8, i8 }			%pair = type { i8, i8 }

	; CHECK-LABEL: test			; CHECK-LABEL: test
	; CHECK: Found an estimated cost of 20 for VF 2 For instruction: {{.*}} load i8			; CHECK-CM: Found an estimated cost of 20 for VF 2 For instruction: {{.*}} load i8
	; CHECK: Found an estimated cost of 0 for VF 2 For instruction: {{.*}} load i8			; CHECK-CM: Found an estimated cost of 0 for VF 2 For instruction: {{.*}} load i8
				; CHECK-VP: Found an estimated cost of 20 for VF 2 For recipe: {{.*}} load
				; CHECK-VP: Found an estimated cost of 0 for VF 2 For recipe: {{.*}} load
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: load i8			; CHECK: load i8
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body

	define void @test(%pair* %p, i64 %n) {			define void @test(%pair* %p, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	Show All 14 Lines

llvm/test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll

	; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_2			; RUN: opt -loop-vectorize -force-vector-width=2 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_2
	; RUN: opt -loop-vectorize -force-vector-width=4 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_4			; RUN: opt -loop-vectorize -force-vector-width=4 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_4
	; RUN: opt -loop-vectorize -force-vector-width=8 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_8			; RUN: opt -loop-vectorize -force-vector-width=8 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_8
	; RUN: opt -loop-vectorize -force-vector-width=16 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_16			; RUN: opt -loop-vectorize -force-vector-width=16 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_16
				; RUN: opt -loop-vectorize -force-vector-width=2 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_2
				; RUN: opt -loop-vectorize -force-vector-width=4 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_4
				; RUN: opt -loop-vectorize -force-vector-width=8 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_8
				; RUN: opt -loop-vectorize -force-vector-width=16 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_16
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnueabi"			target triple = "aarch64--linux-gnueabi"

	%i8.2 = type {i8, i8}			%i8.2 = type {i8, i8}
	define void @i8_factor_2(%i8.2* %data, i64 %n) {			define void @i8_factor_2(%i8.2* %data, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	; VF_8-LABEL: Checking a loop in "i8_factor_2"			; VF_8-LABEL: Checking a loop in "i8_factor_2"
	; VF_8: Found an estimated cost of 2 for VF 8 For instruction: %tmp2 = load i8, i8* %tmp0, align 1			; VF_8: Found an estimated cost of 2 for VF 8 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1			; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1
	; VF_16-LABEL: Checking a loop in "i8_factor_2"			; VF_16-LABEL: Checking a loop in "i8_factor_2"
	; VF_16: Found an estimated cost of 2 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1			; VF_16: Found an estimated cost of 2 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 2 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 2 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1
				; VP_8-LABEL: Checking a loop in "i8_factor_2"
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i8_factor_2"
				; VP_16: Found an estimated cost of 2 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 2 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1
	%tmp2 = load i8, i8* %tmp0, align 1			%tmp2 = load i8, i8* %tmp0, align 1
	%tmp3 = load i8, i8* %tmp1, align 1			%tmp3 = load i8, i8* %tmp1, align 1
	store i8 0, i8* %tmp0, align 1			store i8 0, i8* %tmp0, align 1
	store i8 0, i8* %tmp1, align 1			store i8 0, i8* %tmp1, align 1
	Show All 20 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2
	; VF_16-LABEL: Checking a loop in "i16_factor_2"			; VF_16-LABEL: Checking a loop in "i16_factor_2"
	; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2			; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2
				; VP_4-LABEL: Checking a loop in "i16_factor_2"
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i16_factor_2"
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i16_factor_2"
				; VP_16: Found an estimated cost of 4 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 4 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1
	%tmp2 = load i16, i16* %tmp0, align 2			%tmp2 = load i16, i16* %tmp0, align 2
	%tmp3 = load i16, i16* %tmp1, align 2			%tmp3 = load i16, i16* %tmp1, align 2
	store i16 0, i16* %tmp0, align 2			store i16 0, i16* %tmp0, align 2
	store i16 0, i16* %tmp1, align 2			store i16 0, i16* %tmp1, align 2
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4
	; VF_16-LABEL: Checking a loop in "i32_factor_2"			; VF_16-LABEL: Checking a loop in "i32_factor_2"
	; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4			; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4
				; VP_2-LABEL: Checking a loop in "i32_factor_2"
				; VP_2: Found an estimated cost of 2 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_2: Found an estimated cost of 2 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_4-LABEL: Checking a loop in "i32_factor_2"
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i32_factor_2"
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i32_factor_2"
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1
	%tmp2 = load i32, i32* %tmp0, align 4			%tmp2 = load i32, i32* %tmp0, align 4
	%tmp3 = load i32, i32* %tmp1, align 4			%tmp3 = load i32, i32* %tmp1, align 4
	store i32 0, i32* %tmp0, align 4			store i32 0, i32* %tmp0, align 4
	store i32 0, i32* %tmp1, align 4			store i32 0, i32* %tmp1, align 4
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i64, i64* %tmp1, align 8			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i64, i64* %tmp1, align 8
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i64 0, i64* %tmp0, align 8			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i64 0, i64* %tmp0, align 8
	; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store i64 0, i64* %tmp1, align 8			; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store i64 0, i64* %tmp1, align 8
	; VF_16-LABEL: Checking a loop in "i64_factor_2"			; VF_16-LABEL: Checking a loop in "i64_factor_2"
	; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load i64, i64* %tmp0, align 8			; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load i64, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i64, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i64, i64* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8
				; VP_2-LABEL: Checking a loop in "i64_factor_2"
				; VP_2: Found an estimated cost of 2 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_2: Found an estimated cost of 2 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_4-LABEL: Checking a loop in "i64_factor_2"
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i64_factor_2"
				; VP_8: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i64_factor_2"
				; VP_16: Found an estimated cost of 16 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 16 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 1
	%tmp2 = load i64, i64* %tmp0, align 8			%tmp2 = load i64, i64* %tmp0, align 8
	%tmp3 = load i64, i64* %tmp1, align 8			%tmp3 = load i64, i64* %tmp1, align 8
	store i64 0, i64* %tmp0, align 8			store i64 0, i64* %tmp0, align 8
	store i64 0, i64* %tmp1, align 8			store i64 0, i64* %tmp1, align 8
	Show All 16 Lines
	; stores do not form a legal interleaved group because the group would contain			; stores do not form a legal interleaved group because the group would contain
	; gaps.			; gaps.
	;			;
	; VF_2-LABEL: Checking a loop in "i64_factor_8"			; VF_2-LABEL: Checking a loop in "i64_factor_8"
	; VF_2: Found an estimated cost of 6 for VF 2 For instruction: %tmp2 = load i64, i64* %tmp0, align 8			; VF_2: Found an estimated cost of 6 for VF 2 For instruction: %tmp2 = load i64, i64* %tmp0, align 8
	; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i64, i64* %tmp1, align 8			; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i64, i64* %tmp1, align 8
	; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp0, align 8			; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp0, align 8
	; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp1, align 8			; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp1, align 8
				; VP_2-LABEL: Checking a loop in "i64_factor_8"
				; VP_2: Found an estimated cost of 6 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 8 at %tmp2, ir<%tmp0>
				; VP_2: Found an estimated cost of 7 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 7 for VF 2 For recipe: "REPLICATE store 0, %tmp1
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 2			%tmp0 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 2
	%tmp1 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 6			%tmp1 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 6
	%tmp2 = load i64, i64* %tmp0, align 8			%tmp2 = load i64, i64* %tmp0, align 8
	%tmp3 = load i64, i64* %tmp1, align 8			%tmp3 = load i64, i64* %tmp1, align 8
	store i64 0, i64* %tmp0, align 8			store i64 0, i64* %tmp0, align 8
	store i64 0, i64* %tmp1, align 8			store i64 0, i64* %tmp1, align 8
	%i.next = add nuw nsw i64 %i, 1			%i.next = add nuw nsw i64 %i, 1
	%cond = icmp slt i64 %i.next, %n			%cond = icmp slt i64 %i.next, %n
	br i1 %cond, label %for.body, label %for.end			br i1 %cond, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -cost-using-vplan=false -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -cost-using-vplan=true -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; CHECK-LABEL: all_scalar			; CHECK-LABEL: all_scalar
	; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2			; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
	; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2			; CHECK-CM: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
				; CHECK-VP-NOT: LV: Found an estimated cost of {{.}} for VF 2 For Recipe: {{.}} zext
	; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions			; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions
	;			;
	define void @all_scalar(i64* %a, i64 %n) {			define void @all_scalar(i64* %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr i64, i64* %a, i64 %i			%tmp0 = getelementptr i64, i64* %a, i64 %i
	store i64 0, i64* %tmp0, align 1			store i64 0, i64* %tmp0, align 1
	%i.next = add nuw nsw i64 %i, 2			%i.next = add nuw nsw i64 %i, 2
	%cond = icmp eq i64 %i.next, %n			%cond = icmp eq i64 %i.next, %n
	br i1 %cond, label %for.end, label %for.body			br i1 %cond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	; CHECK-LABEL: PR33193			; CHECK-LABEL: PR33193
	; CHECK: LV: Found scalar instruction: %i.next = zext i32 %j.next to i64			; CHECK: LV: Found scalar instruction: %i.next = zext i32 %j.next to i64
	; CHECK: LV: Found an estimated cost of 0 for VF 8 For instruction: %i.next = zext i32 %j.next to i64			; CHECK-CM: LV: Found an estimated cost of 0 for VF 8 For instruction: %i.next = zext i32 %j.next to i64
				; CHECK-VP-NOT: LV: Found an estimated cost of {{.}} for VF 8 For Recipe: {{.}} zext
	; CHECK: LV: Not considering vector loop of width 8 because it will not generate any vector instructions			; CHECK: LV: Not considering vector loop of width 8 because it will not generate any vector instructions
	%struct.a = type { i32, i8 }			%struct.a = type { i32, i8 }
	define void @PR33193(%struct.a* %a, i64 %n) {			define void @PR33193(%struct.a* %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	Show All 11 Lines

llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -force-vector-width=2 -loop-vectorize -debug-only=loop-vectorize -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -force-vector-width=2 -loop-vectorize -cost-using-vplan=false -debug-only=loop-vectorize -disable-output 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt < %s -force-vector-width=2 -loop-vectorize -cost-using-vplan=true -debug-only=loop-vectorize -disable-output 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; Check predication-related cost calculations, including scalarization overhead			; Check predication-related cost calculations, including scalarization overhead
	; and block probability scaling. Note that the functionality being tested is			; and block probability scaling. Note that the functionality being tested is
	; not specific to AArch64. We specify a target to get actual values for the			; not specific to AArch64. We specify a target to get actual values for the
	; instruction costs.			; instruction costs.

	; CHECK-LABEL: predicated_udiv			; CHECK-LABEL: predicated_udiv
	;			;
	; This test checks that we correctly compute the cost of the predicated udiv			; This test checks that we correctly compute the cost of the predicated udiv
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5
	;			;
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK-CM: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
				; CHECK-VP: Found an estimated cost of 5 for VF 2 For recipe: "REPLICATE %tmp4 = udiv %tmp2, %tmp3 (S->V)
	;			;
	define i32 @predicated_udiv(i32* %a, i32* %b, i1 %c, i64 %n) {			define i32 @predicated_udiv(i32* %a, i32* %b, i1 %c, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]			%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
	Show All 23 Lines
	;			;
	; This test checks that we correctly compute the cost of the predicated store			; This test checks that we correctly compute the cost of the predicated store
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of store:			; Cost of store:
	; (store(4) + extractelement(3)) / 2 = 3			; (store(4) + extractelement(3)) / 2 = 3
	;			;
	; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4
	; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4			; CHECK-CM: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
				; CHECK-VP: Found an estimated cost of 3 for VF 2 For recipe: "REPLICATE store %tmp2, %tmp0
	;			;
	define void @predicated_store(i32* %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
	Show All 23 Lines
	;			;
	; Cost of add:			; Cost of add:
	; (add(2) + extractelement(3)) / 2 = 2			; (add(2) + extractelement(3)) / 2 = 2
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(3) + insertelement(3)) / 2 = 4			; (udiv(2) + extractelement(3) + insertelement(3)) / 2 = 4
	;			;
	; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x			; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp3 = add nsw i32 %tmp2, %x			; CHECK-CM: Found an estimated cost of 2 for VF 2 For instruction: %tmp3 = add nsw i32 %tmp2, %x
	; CHECK: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK-CM: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
				; CHECK-VP: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE %tmp3 = add %tmp2, %x
				; CHECK-VP: Found an estimated cost of 4 for VF 2 For recipe: "REPLICATE %tmp4 = udiv %tmp2, %tmp3
	;			;
	define i32 @predicated_udiv_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {			define i32 @predicated_udiv_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]			%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
	Show All 27 Lines
	;			;
	; Cost of add:			; Cost of add:
	; (add(2) + extractelement(3)) / 2 = 2			; (add(2) + extractelement(3)) / 2 = 2
	; Cost of store:			; Cost of store:
	; store(4) / 2 = 2			; store(4) / 2 = 2
	;			;
	; CHECK: Scalarizing: %tmp2 = add nsw i32 %tmp1, %x			; CHECK: Scalarizing: %tmp2 = add nsw i32 %tmp1, %x
	; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = add nsw i32 %tmp1, %x			; CHECK-CM: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = add nsw i32 %tmp1, %x
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4			; CHECK-CM: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
				; CHECK-VP: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE %tmp2 = add %tmp1, %x
				; CHECK-VP: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE store %tmp2, %tmp0
	;			;
	define void @predicated_store_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
	Show All 34 Lines
	; Cost of store:			; Cost of store:
	; store(4) / 2 = 2			; store(4) / 2 = 2
	;			;
	; CHECK-NOT: Scalarizing: %tmp2 = add i32 %tmp1, %x			; CHECK-NOT: Scalarizing: %tmp2 = add i32 %tmp1, %x
	; CHECK: Scalarizing and predicating: %tmp3 = sdiv i32 %tmp1, %tmp2			; CHECK: Scalarizing and predicating: %tmp3 = sdiv i32 %tmp1, %tmp2
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2
	; CHECK: Scalarizing: %tmp5 = sub i32 %tmp4, %x			; CHECK: Scalarizing: %tmp5 = sub i32 %tmp4, %x
	; CHECK: Scalarizing and predicating: store i32 %tmp5, i32* %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp5, i32* %tmp0, align 4
	; CHECK: Found an estimated cost of 1 for VF 2 For instruction: %tmp2 = add i32 %tmp1, %x			; CHECK-CM: Found an estimated cost of 1 for VF 2 For instruction: %tmp2 = add i32 %tmp1, %x
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp3 = sdiv i32 %tmp1, %tmp2			; CHECK-CM: Found an estimated cost of 5 for VF 2 For instruction: %tmp3 = sdiv i32 %tmp1, %tmp2
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp3, %tmp2			; CHECK-CM: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp3, %tmp2
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp5 = sub i32 %tmp4, %x			; CHECK-CM: Found an estimated cost of 2 for VF 2 For instruction: %tmp5 = sub i32 %tmp4, %x
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp5, i32* %tmp0, align 4			; CHECK-CM: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp5, i32* %tmp0, align 4
				; CHECK-VP: Found an estimated cost of 1 for VF 2 For recipe: "WIDEN\l"" %tmp2 = add %tmp1, %x
				; CHECK-VP: Found an estimated cost of 5 for VF 2 For recipe: "REPLICATE %tmp3 = sdiv %tmp1, %tmp2
				; CHECK-VP: Found an estimated cost of 5 for VF 2 For recipe: "REPLICATE %tmp4 = udiv %tmp3, %tmp2
				; CHECK-VP: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE %tmp5 = sub %tmp4, %x
				; CHECK-VP: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE store %tmp5, %tmp0
	;			;
	define void @predication_multi_context(i32* %a, i1 %c, i32 %x, i64 %n) {			define void @predication_multi_context(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
	Show All 19 Lines

llvm/test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

	; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_2			; RUN: opt -loop-vectorize -force-vector-width=2 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_2
	; RUN: opt -loop-vectorize -force-vector-width=4 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_4			; RUN: opt -loop-vectorize -force-vector-width=4 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_4
	; RUN: opt -loop-vectorize -force-vector-width=8 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_8			; RUN: opt -loop-vectorize -force-vector-width=8 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_8
	; RUN: opt -loop-vectorize -force-vector-width=16 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_16			; RUN: opt -loop-vectorize -force-vector-width=16 -cost-using-vplan=false -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_16
				; RUN: opt -loop-vectorize -force-vector-width=2 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_2
				; RUN: opt -loop-vectorize -force-vector-width=4 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_4
				; RUN: opt -loop-vectorize -force-vector-width=8 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_8
				; RUN: opt -loop-vectorize -force-vector-width=16 -cost-using-vplan=true -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_16
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "armv8--linux-gnueabihf"			target triple = "armv8--linux-gnueabihf"

	%i8.2 = type {i8, i8}			%i8.2 = type {i8, i8}
	define void @i8_factor_2(%i8.2* %data, i64 %n) {			define void @i8_factor_2(%i8.2* %data, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	; VF_8-LABEL: Checking a loop in "i8_factor_2"			; VF_8-LABEL: Checking a loop in "i8_factor_2"
	; VF_8: Found an estimated cost of 2 for VF 8 For instruction: %tmp2 = load i8, i8* %tmp0, align 1			; VF_8: Found an estimated cost of 2 for VF 8 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1			; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1
	; VF_16-LABEL: Checking a loop in "i8_factor_2"			; VF_16-LABEL: Checking a loop in "i8_factor_2"
	; VF_16: Found an estimated cost of 2 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1			; VF_16: Found an estimated cost of 2 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 2 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 2 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1
				; VP_8-LABEL: Checking a loop in "i8_factor_2"
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i8_factor_2"
				; VP_16: Found an estimated cost of 2 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 2 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1
	%tmp2 = load i8, i8* %tmp0, align 1			%tmp2 = load i8, i8* %tmp0, align 1
	%tmp3 = load i8, i8* %tmp1, align 1			%tmp3 = load i8, i8* %tmp1, align 1
	store i8 0, i8* %tmp0, align 1			store i8 0, i8* %tmp0, align 1
	store i8 0, i8* %tmp1, align 1			store i8 0, i8* %tmp1, align 1
	Show All 20 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2
	; VF_16-LABEL: Checking a loop in "i16_factor_2"			; VF_16-LABEL: Checking a loop in "i16_factor_2"
	; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2			; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2
				; VP_4-LABEL: Checking a loop in "i16_factor_2"
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i16_factor_2"
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 2 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i16_factor_2"
				; VP_16: Found an estimated cost of 4 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 4 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1
	%tmp2 = load i16, i16* %tmp0, align 2			%tmp2 = load i16, i16* %tmp0, align 2
	%tmp3 = load i16, i16* %tmp1, align 2			%tmp3 = load i16, i16* %tmp1, align 2
	store i16 0, i16* %tmp0, align 2			store i16 0, i16* %tmp0, align 2
	store i16 0, i16* %tmp1, align 2			store i16 0, i16* %tmp1, align 2
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4
	; VF_16-LABEL: Checking a loop in "i32_factor_2"			; VF_16-LABEL: Checking a loop in "i32_factor_2"
	; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4			; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4
				; VP_2-LABEL: Checking a loop in "i32_factor_2"
				; VP_2: Found an estimated cost of 2 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_2: Found an estimated cost of 2 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_4-LABEL: Checking a loop in "i32_factor_2"
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 2 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i32_factor_2"
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i32_factor_2"
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1
	%tmp2 = load i32, i32* %tmp0, align 4			%tmp2 = load i32, i32* %tmp0, align 4
	%tmp3 = load i32, i32* %tmp1, align 4			%tmp3 = load i32, i32* %tmp1, align 4
	store i32 0, i32* %tmp0, align 4			store i32 0, i32* %tmp0, align 4
	store i32 0, i32* %tmp1, align 4			store i32 0, i32* %tmp1, align 4
	Show All 15 Lines
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_4-NEXT: Found an estimated cost of 32 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 32 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2
	; VF_8-LABEL: Checking a loop in "half_factor_2"			; VF_8-LABEL: Checking a loop in "half_factor_2"
	; VF_8: Found an estimated cost of 80 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_8: Found an estimated cost of 80 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 64 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 64 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2
				; VP_4-LABEL: Checking a loop in "half_factor_2"
				; VP_4: Found an estimated cost of 40 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 32 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_8-LABEL: Checking a loop in "half_factor_2"
				; VP_8: Found an estimated cost of 80 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 64 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp1
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1
	%tmp2 = load half, half* %tmp0, align 2			%tmp2 = load half, half* %tmp0, align 2
	%tmp3 = load half, half* %tmp1, align 2			%tmp3 = load half, half* %tmp1, align 2
	store half 0., half* %tmp0, align 2			store half 0., half* %tmp0, align 2
	store half 0., half* %tmp1, align 2			store half 0., half* %tmp1, align 2
	%i.next = add nuw nsw i64 %i, 1			%i.next = add nuw nsw i64 %i, 1
	%cond = icmp slt i64 %i.next, %n			%cond = icmp slt i64 %i.next, %n
	br i1 %cond, label %for.body, label %for.end			br i1 %cond, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

llvm/test/Transforms/LoopVectorize/ARM/mve-interleaved-cost.ll

	; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_2			; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -cost-using-vplan=false -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_2
	; RUN: opt -loop-vectorize -force-vector-width=4 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_4			; RUN: opt -loop-vectorize -force-vector-width=4 -debug-only=loop-vectorize -cost-using-vplan=false -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_4
	; RUN: opt -loop-vectorize -force-vector-width=8 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_8			; RUN: opt -loop-vectorize -force-vector-width=8 -debug-only=loop-vectorize -cost-using-vplan=false -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_8
	; RUN: opt -loop-vectorize -force-vector-width=16 -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_16			; RUN: opt -loop-vectorize -force-vector-width=16 -debug-only=loop-vectorize -cost-using-vplan=false -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VF_16
				; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -cost-using-vplan=true -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_2
				; RUN: opt -loop-vectorize -force-vector-width=4 -debug-only=loop-vectorize -cost-using-vplan=true -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_4
				; RUN: opt -loop-vectorize -force-vector-width=8 -debug-only=loop-vectorize -cost-using-vplan=true -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_8
				; RUN: opt -loop-vectorize -force-vector-width=16 -debug-only=loop-vectorize -cost-using-vplan=true -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=VP_16
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv8.1m.main-none-eabi"			target triple = "thumbv8.1m.main-none-eabi"

	; Factor 2			; Factor 2

	%i8.2 = type {i8, i8}			%i8.2 = type {i8, i8}
	Show All 16 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1			; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1
	; VF_16-LABEL: Checking a loop in "i8_factor_2"			; VF_16-LABEL: Checking a loop in "i8_factor_2"
	; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1			; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1
				; VP_2-LABEL: Checking a loop in "i8_factor_2"
				; VP_2: Found an estimated cost of 20 for VF 2 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 12 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-LABEL: Checking a loop in "i8_factor_2"
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i8_factor_2"
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i8_factor_2"
				; VP_16: Found an estimated cost of 4 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 4 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1
	%tmp2 = load i8, i8* %tmp0, align 1			%tmp2 = load i8, i8* %tmp0, align 1
	%tmp3 = load i8, i8* %tmp1, align 1			%tmp3 = load i8, i8* %tmp1, align 1
	store i8 0, i8* %tmp0, align 1			store i8 0, i8* %tmp0, align 1
	store i8 0, i8* %tmp1, align 1			store i8 0, i8* %tmp1, align 1
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2
	; VF_16-LABEL: Checking a loop in "i16_factor_2"			; VF_16-LABEL: Checking a loop in "i16_factor_2"
	; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2			; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2
				; VP_2-LABEL: Checking a loop in "i16_factor_2"
				; VP_2: Found an estimated cost of 20 for VF 2 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 12 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-LABEL: Checking a loop in "i16_factor_2"
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i16_factor_2"
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i16_factor_2"
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1
	%tmp2 = load i16, i16* %tmp0, align 2			%tmp2 = load i16, i16* %tmp0, align 2
	%tmp3 = load i16, i16* %tmp1, align 2			%tmp3 = load i16, i16* %tmp1, align 2
	store i16 0, i16* %tmp0, align 2			store i16 0, i16* %tmp0, align 2
	store i16 0, i16* %tmp1, align 2			store i16 0, i16* %tmp1, align 2
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4
	; VF_16-LABEL: Checking a loop in "i32_factor_2"			; VF_16-LABEL: Checking a loop in "i32_factor_2"
	; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4			; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4
				; VP_2-LABEL: Checking a loop in "i32_factor_2"
				; VP_2: Found an estimated cost of 20 for VF 2 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 12 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-LABEL: Checking a loop in "i32_factor_2"
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "i32_factor_2"
				; VP_8: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "i32_factor_2"
				; VP_16: Found an estimated cost of 16 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 16 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1
	%tmp2 = load i32, i32* %tmp0, align 4			%tmp2 = load i32, i32* %tmp0, align 4
	%tmp3 = load i32, i32* %tmp1, align 4			%tmp3 = load i32, i32* %tmp1, align 4
	store i32 0, i32* %tmp0, align 4			store i32 0, i32* %tmp0, align 4
	store i32 0, i32* %tmp1, align 4			store i32 0, i32* %tmp1, align 4
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i64, i64* %tmp1, align 8			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i64, i64* %tmp1, align 8
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i64 0, i64* %tmp0, align 8			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i64 0, i64* %tmp0, align 8
	; VF_8-NEXT: Found an estimated cost of 160 for VF 8 For instruction: store i64 0, i64* %tmp1, align 8			; VF_8-NEXT: Found an estimated cost of 160 for VF 8 For instruction: store i64 0, i64* %tmp1, align 8
	; VF_16-LABEL: Checking a loop in "i64_factor_2"			; VF_16-LABEL: Checking a loop in "i64_factor_2"
	; VF_16: Found an estimated cost of 1088 for VF 16 For instruction: %tmp2 = load i64, i64* %tmp0, align 8			; VF_16: Found an estimated cost of 1088 for VF 16 For instruction: %tmp2 = load i64, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i64, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i64, i64* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 576 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 576 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8
				; VP_2-LABEL: Checking a loop in "i64_factor_2"
				; VP_2: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 16 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-LABEL: Checking a loop in "i64_factor_2"
				; VP_4: Found an estimated cost of 80 for VF 4 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp0
				; VP_4-NEXT: Found an estimated cost of 48 for VF 4 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-LABEL: Checking a loop in "i64_factor_2"
				; VP_8: Found an estimated cost of 288 for VF 8 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 160 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-LABEL: Checking a loop in "i64_factor_2"
				; VP_16: Found an estimated cost of 1088 for VF 16 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 576 for VF 16 For recipe: "REPLICATE store 0, %tmp1
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i64.2, %i64.2* %data, i64 %i, i32 1
	%tmp2 = load i64, i64* %tmp0, align 8			%tmp2 = load i64, i64* %tmp0, align 8
	%tmp3 = load i64, i64* %tmp1, align 8			%tmp3 = load i64, i64* %tmp1, align 8
	store i64 0, i64* %tmp0, align 8			store i64 0, i64* %tmp0, align 8
	store i64 0, i64* %tmp1, align 8			store i64 0, i64* %tmp1, align 8
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2
	; VF_16-LABEL: Checking a loop in "f16_factor_2"			; VF_16-LABEL: Checking a loop in "f16_factor_2"
	; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2
				; VP_2-LABEL: Checking a loop in "f16_factor_2"
				; VP_2: Found an estimated cost of 20 for VF 2 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_2-NEXT: Found an estimated cost of 12 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_4-LABEL: Checking a loop in "f16_factor_2"
				; VP_4: Found an estimated cost of 72 for VF 4 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_4-NEXT: Found an estimated cost of 40 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_8-LABEL: Checking a loop in "f16_factor_2"
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 4 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "f16_factor_2"
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 8 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 1
	%tmp2 = load half, half* %tmp0, align 2			%tmp2 = load half, half* %tmp0, align 2
	%tmp3 = load half, half* %tmp1, align 2			%tmp3 = load half, half* %tmp1, align 2
	store half 0.0, half* %tmp0, align 2			store half 0.0, half* %tmp0, align 2
	store half 0.0, half* %tmp1, align 2			store half 0.0, half* %tmp1, align 2
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load float, float* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load float, float* %tmp1, align 4
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp0, align 4			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp0, align 4
	; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store float 0.000000e+00, float* %tmp1, align 4			; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store float 0.000000e+00, float* %tmp1, align 4
	; VF_16-LABEL: Checking a loop in "f32_factor_2"			; VF_16-LABEL: Checking a loop in "f32_factor_2"
	; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load float, float* %tmp0, align 4			; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load float, float* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load float, float* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load float, float* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4
				; VP_2-LABEL: Checking a loop in "f32_factor_2"
				; VP_2: Found an estimated cost of 20 for VF 2 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_2-NEXT: Found an estimated cost of 12 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_4-LABEL: Checking a loop in "f32_factor_2"
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_4: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_8-LABEL: Checking a loop in "f32_factor_2"
				; VP_8: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_8: Found an estimated cost of 8 for VF 8 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
				; VP_16-LABEL: Checking a loop in "f32_factor_2"
				; VP_16: Found an estimated cost of 16 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp2, ir<%tmp0>
				; VP_16: Found an estimated cost of 16 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 2 at <badref>, ir<%tmp1>
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 1
	%tmp2 = load float, float* %tmp0, align 4			%tmp2 = load float, float* %tmp0, align 4
	%tmp3 = load float, float* %tmp1, align 4			%tmp3 = load float, float* %tmp1, align 4
	store float 0.0, float* %tmp0, align 4			store float 0.0, float* %tmp0, align 4
	store float 0.0, float* %tmp1, align 4			store float 0.0, float* %tmp1, align 4
	Show All 25 Lines
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load double, double* %tmp1, align 8			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load double, double* %tmp1, align 8
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store double 0.000000e+00, double* %tmp0, align 8			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store double 0.000000e+00, double* %tmp0, align 8
	; VF_8-NEXT: Found an estimated cost of 144 for VF 8 For instruction: store double 0.000000e+00, double* %tmp1, align 8			; VF_8-NEXT: Found an estimated cost of 144 for VF 8 For instruction: store double 0.000000e+00, double* %tmp1, align 8
	; VF_16-LABEL: Checking a loop in "f64_factor_2"			; VF_16-LABEL: Checking a loop in "f64_factor_2"
	; VF_16: Found an estimated cost of 1056 for VF 16 For instruction: %tmp2 = load double, double* %tmp0, align 8			; VF_16: Found an estimated cost of 1056 for VF 16 For instruction: %tmp2 = load double, double* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load double, double* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load double, double* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp0, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 544 for VF 16 For instruction: store double 0.000000e+00, double* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 544 for VF 16 For instruction: store double 0.000000e+00, double* %tmp1, align 8
				; VP_2-LABEL: Checking a loop in "f64_factor_2"
				; VP_2: Found an estimated cost of 20 for VF 2 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_2-NEXT: Found an estimated cost of 12 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_4-LABEL: Checking a loop in "f64_factor_2"
				; VP_4: Found an estimated cost of 72 for VF 4 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_4-NEXT: Found an estimated cost of 40 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_8-LABEL: Checking a loop in "f64_factor_2"
				; VP_8: Found an estimated cost of 272 for VF 8 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_8-NEXT: Found an estimated cost of 144 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_16-LABEL: Checking a loop in "f64_factor_2"
				; VP_16: Found an estimated cost of 1056 for VF 16 For recipe: "REPLICATE %tmp2 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_16-NEXT: Found an estimated cost of 544 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp1
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f64.2, %f64.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f64.2, %f64.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f64.2, %f64.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f64.2, %f64.2* %data, i64 %i, i32 1
	%tmp2 = load double, double* %tmp0, align 8			%tmp2 = load double, double* %tmp0, align 8
	%tmp3 = load double, double* %tmp1, align 8			%tmp3 = load double, double* %tmp1, align 8
	store double 0.0, double* %tmp0, align 8			store double 0.0, double* %tmp0, align 8
	store double 0.0, double* %tmp1, align 8			store double 0.0, double* %tmp1, align 8
	Show All 37 Lines
	; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store i8 0, i8* %tmp2, align 1			; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store i8 0, i8* %tmp2, align 1
	; VF_16-LABEL: Checking a loop in "i8_factor_3"			; VF_16-LABEL: Checking a loop in "i8_factor_3"
	; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp0, align 1			; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i8, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i8, i8* %tmp1, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i8, i8* %tmp2, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i8, i8* %tmp2, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1
	; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store i8 0, i8* %tmp2, align 1			; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store i8 0, i8* %tmp2, align 1
				; VP_2-LABEL: Checking a loop in "i8_factor_3"
				; VP_2: Found an estimated cost of 30 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 18 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_4-LABEL: Checking a loop in "i8_factor_3"
				; VP_4: Found an estimated cost of 108 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-NEXT: Found an estimated cost of 60 for VF 4 For recipe: "REPLICATE store 0, %tmp2
				; VP_8-LABEL: Checking a loop in "i8_factor_3"
				; VP_8: Found an estimated cost of 408 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 216 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-LABEL: Checking a loop in "i8_factor_3"
				; VP_16: Found an estimated cost of 1584 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 816 for VF 16 For recipe: "REPLICATE store 0, %tmp2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i8.3, %i8.3* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i8.3, %i8.3* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i8.3, %i8.3* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i8.3, %i8.3* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i8.3, %i8.3* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i8.3, %i8.3* %data, i64 %i, i32 2
	%tmp3 = load i8, i8* %tmp0, align 1			%tmp3 = load i8, i8* %tmp0, align 1
	%tmp4 = load i8, i8* %tmp1, align 1			%tmp4 = load i8, i8* %tmp1, align 1
	%tmp5 = load i8, i8* %tmp2, align 1			%tmp5 = load i8, i8* %tmp2, align 1
	Show All 36 Lines
	; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store i16 0, i16* %tmp2, align 2			; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store i16 0, i16* %tmp2, align 2
	; VF_16-LABEL: Checking a loop in "i16_factor_3"			; VF_16-LABEL: Checking a loop in "i16_factor_3"
	; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp0, align 2			; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i16, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i16, i16* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i16, i16* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i16, i16* %tmp2, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store i16 0, i16* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store i16 0, i16* %tmp2, align 2
				; VP_2-LABEL: Checking a loop in "i16_factor_3"
				; VP_2: Found an estimated cost of 30 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 18 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_4-LABEL: Checking a loop in "i16_factor_3"
				; VP_4: Found an estimated cost of 108 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-NEXT: Found an estimated cost of 60 for VF 4 For recipe: "REPLICATE store 0, %tmp2
				; VP_8-LABEL: Checking a loop in "i16_factor_3"
				; VP_8: Found an estimated cost of 408 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 216 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-LABEL: Checking a loop in "i16_factor_3"
				; VP_16: Found an estimated cost of 1584 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 816 for VF 16 For recipe: "REPLICATE store 0, %tmp2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i16.3, %i16.3* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i16.3, %i16.3* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i16.3, %i16.3* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i16.3, %i16.3* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i16.3, %i16.3* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i16.3, %i16.3* %data, i64 %i, i32 2
	%tmp3 = load i16, i16* %tmp0, align 2			%tmp3 = load i16, i16* %tmp0, align 2
	%tmp4 = load i16, i16* %tmp1, align 2			%tmp4 = load i16, i16* %tmp1, align 2
	%tmp5 = load i16, i16* %tmp2, align 2			%tmp5 = load i16, i16* %tmp2, align 2
	Show All 36 Lines
	; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store i32 0, i32* %tmp2, align 4			; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store i32 0, i32* %tmp2, align 4
	; VF_16-LABEL: Checking a loop in "i32_factor_3"			; VF_16-LABEL: Checking a loop in "i32_factor_3"
	; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp0, align 4			; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i32, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i32, i32* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i32, i32* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i32, i32* %tmp2, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store i32 0, i32* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store i32 0, i32* %tmp2, align 4
				; VP_2-LABEL: Checking a loop in "i32_factor_3"
				; VP_2: Found an estimated cost of 30 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 18 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_4-LABEL: Checking a loop in "i32_factor_3"
				; VP_4: Found an estimated cost of 24 for VF 4 For recipe: "WIDEN load ir<%tmp0>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp1>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp2>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp0>, ir<0>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp1>, ir<0>
				; VP_4-NEXT: Found an estimated cost of 24 for VF 4 For recipe: "WIDEN store ir<%tmp2>, ir<0>
				; VP_8-LABEL: Checking a loop in "i32_factor_3"
				; VP_8: Found an estimated cost of 408 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 216 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-LABEL: Checking a loop in "i32_factor_3"
				; VP_16: Found an estimated cost of 1584 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 816 for VF 16 For recipe: "REPLICATE store 0, %tmp2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i32.3, %i32.3* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i32.3, %i32.3* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i32.3, %i32.3* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i32.3, %i32.3* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i32.3, %i32.3* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i32.3, %i32.3* %data, i64 %i, i32 2
	%tmp3 = load i32, i32* %tmp0, align 4			%tmp3 = load i32, i32* %tmp0, align 4
	%tmp4 = load i32, i32* %tmp1, align 4			%tmp4 = load i32, i32* %tmp1, align 4
	%tmp5 = load i32, i32* %tmp2, align 4			%tmp5 = load i32, i32* %tmp2, align 4
	Show All 36 Lines
	; VF_8-NEXT: Found an estimated cost of 240 for VF 8 For instruction: store i64 0, i64* %tmp2, align 8			; VF_8-NEXT: Found an estimated cost of 240 for VF 8 For instruction: store i64 0, i64* %tmp2, align 8
	; VF_16-LABEL: Checking a loop in "i64_factor_3"			; VF_16-LABEL: Checking a loop in "i64_factor_3"
	; VF_16: Found an estimated cost of 1632 for VF 16 For instruction: %tmp3 = load i64, i64* %tmp0, align 8			; VF_16: Found an estimated cost of 1632 for VF 16 For instruction: %tmp3 = load i64, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i64, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load i64, i64* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i64, i64* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i64, i64* %tmp2, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 864 for VF 16 For instruction: store i64 0, i64* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 864 for VF 16 For instruction: store i64 0, i64* %tmp2, align 8
				; VP_2-LABEL: Checking a loop in "i64_factor_3"
				; VP_2: Found an estimated cost of 36 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_4-LABEL: Checking a loop in "i64_factor_3"
				; VP_4: Found an estimated cost of 120 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-NEXT: Found an estimated cost of 72 for VF 4 For recipe: "REPLICATE store 0, %tmp2
				; VP_8-LABEL: Checking a loop in "i64_factor_3"
				; VP_8: Found an estimated cost of 432 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 240 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-LABEL: Checking a loop in "i64_factor_3"
				; VP_16: Found an estimated cost of 1632 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 864 for VF 16 For recipe: "REPLICATE store 0, %tmp2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i64.3, %i64.3* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i64.3, %i64.3* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i64.3, %i64.3* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i64.3, %i64.3* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i64.3, %i64.3* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i64.3, %i64.3* %data, i64 %i, i32 2
	%tmp3 = load i64, i64* %tmp0, align 8			%tmp3 = load i64, i64* %tmp0, align 8
	%tmp4 = load i64, i64* %tmp1, align 8			%tmp4 = load i64, i64* %tmp1, align 8
	%tmp5 = load i64, i64* %tmp2, align 8			%tmp5 = load i64, i64* %tmp2, align 8
	Show All 36 Lines
	; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store half 0xH0000, half* %tmp2, align 2			; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store half 0xH0000, half* %tmp2, align 2
	; VF_16-LABEL: Checking a loop in "f16_factor_3"			; VF_16-LABEL: Checking a loop in "f16_factor_3"
	; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load half, half* %tmp0, align 2			; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load half, half* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load half, half* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load half, half* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load half, half* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load half, half* %tmp2, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store half 0xH0000, half* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store half 0xH0000, half* %tmp2, align 2
				; VP_2-LABEL: Checking a loop in "f16_factor_3"
				; VP_2: Found an estimated cost of 30 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_2-NEXT: Found an estimated cost of 18 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp2
				; VP_4-LABEL: Checking a loop in "f16_factor_3"
				; VP_4: Found an estimated cost of 108 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_4-NEXT: Found an estimated cost of 60 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp2
				; VP_8-LABEL: Checking a loop in "f16_factor_3"
				; VP_8: Found an estimated cost of 408 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_8-NEXT: Found an estimated cost of 216 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp2
				; VP_16-LABEL: Checking a loop in "f16_factor_3"
				; VP_16: Found an estimated cost of 1584 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_16-NEXT: Found an estimated cost of 816 for VF 16 For recipe: "REPLICATE store 0xH0000, %tmp2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f16.3, %f16.3* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f16.3, %f16.3* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f16.3, %f16.3* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f16.3, %f16.3* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %f16.3, %f16.3* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %f16.3, %f16.3* %data, i64 %i, i32 2
	%tmp3 = load half, half* %tmp0, align 2			%tmp3 = load half, half* %tmp0, align 2
	%tmp4 = load half, half* %tmp1, align 2			%tmp4 = load half, half* %tmp1, align 2
	%tmp5 = load half, half* %tmp2, align 2			%tmp5 = load half, half* %tmp2, align 2
	Show All 36 Lines
	; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store float 0.000000e+00, float* %tmp2, align 4			; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store float 0.000000e+00, float* %tmp2, align 4
	; VF_16-LABEL: Checking a loop in "f32_factor_3"			; VF_16-LABEL: Checking a loop in "f32_factor_3"
	; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load float, float* %tmp0, align 4			; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load float, float* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load float, float* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load float, float* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load float, float* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load float, float* %tmp2, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store float 0.000000e+00, float* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store float 0.000000e+00, float* %tmp2, align 4
				; VP_2-LABEL: Checking a loop in "f32_factor_3"
				; VP_2: Found an estimated cost of 30 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_2-NEXT: Found an estimated cost of 18 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_4-LABEL: Checking a loop in "f32_factor_3"
				; VP_4: Found an estimated cost of 24 for VF 4 For recipe: "WIDEN load ir<%tmp0>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp1>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp2>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp0>, ir<0.000000e+00>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp1>, ir<0.000000e+00>
				; VP_4-NEXT: Found an estimated cost of 24 for VF 4 For recipe: "WIDEN store ir<%tmp2>, ir<0.000000e+00>
				; VP_8-LABEL: Checking a loop in "f32_factor_3"
				; VP_8: Found an estimated cost of 408 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_8-NEXT: Found an estimated cost of 216 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_16-LABEL: Checking a loop in "f32_factor_3"
				; VP_16: Found an estimated cost of 1584 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_16-NEXT: Found an estimated cost of 816 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f32.3, %f32.3* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f32.3, %f32.3* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f32.3, %f32.3* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f32.3, %f32.3* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %f32.3, %f32.3* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %f32.3, %f32.3* %data, i64 %i, i32 2
	%tmp3 = load float, float* %tmp0, align 4			%tmp3 = load float, float* %tmp0, align 4
	%tmp4 = load float, float* %tmp1, align 4			%tmp4 = load float, float* %tmp1, align 4
	%tmp5 = load float, float* %tmp2, align 4			%tmp5 = load float, float* %tmp2, align 4
	Show All 36 Lines
	; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store double 0.000000e+00, double* %tmp2, align 8			; VF_8-NEXT: Found an estimated cost of 216 for VF 8 For instruction: store double 0.000000e+00, double* %tmp2, align 8
	; VF_16-LABEL: Checking a loop in "f64_factor_3"			; VF_16-LABEL: Checking a loop in "f64_factor_3"
	; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load double, double* %tmp0, align 8			; VF_16: Found an estimated cost of 1584 for VF 16 For instruction: %tmp3 = load double, double* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load double, double* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp4 = load double, double* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load double, double* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load double, double* %tmp2, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp0, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store double 0.000000e+00, double* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 816 for VF 16 For instruction: store double 0.000000e+00, double* %tmp2, align 8
				; VP_2-LABEL: Checking a loop in "f64_factor_3"
				; VP_2: Found an estimated cost of 30 for VF 2 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_2-NEXT: Found an estimated cost of 18 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_4-LABEL: Checking a loop in "f64_factor_3"
				; VP_4: Found an estimated cost of 108 for VF 4 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_4-NEXT: Found an estimated cost of 60 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_8-LABEL: Checking a loop in "f64_factor_3"
				; VP_8: Found an estimated cost of 408 for VF 8 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_8-NEXT: Found an estimated cost of 216 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_16-LABEL: Checking a loop in "f64_factor_3"
				; VP_16: Found an estimated cost of 1584 for VF 16 For recipe: "REPLICATE %tmp3 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_16-NEXT: Found an estimated cost of 816 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f64.3, %f64.3* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f64.3, %f64.3* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f64.3, %f64.3* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f64.3, %f64.3* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %f64.3, %f64.3* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %f64.3, %f64.3* %data, i64 %i, i32 2
	%tmp3 = load double, double* %tmp0, align 8			%tmp3 = load double, double* %tmp0, align 8
	%tmp4 = load double, double* %tmp1, align 8			%tmp4 = load double, double* %tmp1, align 8
	%tmp5 = load double, double* %tmp2, align 8			%tmp5 = load double, double* %tmp2, align 8
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i8, i8* %tmp0, align 1			; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i8, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i8, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i8, i8* %tmp1, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i8, i8* %tmp2, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i8, i8* %tmp2, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i8, i8* %tmp3, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i8, i8* %tmp3, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp2, align 1			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp2, align 1
	; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i8 0, i8* %tmp3, align 1			; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i8 0, i8* %tmp3, align 1
				; VP_2-LABEL: Checking a loop in "i8_factor_4"
				; VP_2: Found an estimated cost of 40 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_2-NEXT: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE store 0, %tmp3
				; VP_4-LABEL: Checking a loop in "i8_factor_4"
				; VP_4: Found an estimated cost of 144 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp2
				; VP_4-NEXT: Found an estimated cost of 80 for VF 4 For recipe: "REPLICATE store 0, %tmp3
				; VP_8-LABEL: Checking a loop in "i8_factor_4"
				; VP_8: Found an estimated cost of 544 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_8-NEXT: Found an estimated cost of 288 for VF 8 For recipe: "REPLICATE store 0, %tmp3
				; VP_16-LABEL: Checking a loop in "i8_factor_4"
				; VP_16: Found an estimated cost of 2112 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-NEXT: Found an estimated cost of 1088 for VF 16 For recipe: "REPLICATE store 0, %tmp3
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 2
	%tmp3 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 3			%tmp3 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 3
	%tmp4 = load i8, i8* %tmp0, align 1			%tmp4 = load i8, i8* %tmp0, align 1
	%tmp5 = load i8, i8* %tmp1, align 1			%tmp5 = load i8, i8* %tmp1, align 1
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i16, i16* %tmp0, align 2			; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i16, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i16, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i16, i16* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i16, i16* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i16, i16* %tmp2, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i16, i16* %tmp3, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i16, i16* %tmp3, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp2, align 2
	; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i16 0, i16* %tmp3, align 2			; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i16 0, i16* %tmp3, align 2
				; VP_2-LABEL: Checking a loop in "i16_factor_4"
				; VP_2: Found an estimated cost of 40 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_2-NEXT: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE store 0, %tmp3
				; VP_4-LABEL: Checking a loop in "i16_factor_4"
				; VP_4: Found an estimated cost of 144 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp2
				; VP_4-NEXT: Found an estimated cost of 80 for VF 4 For recipe: "REPLICATE store 0, %tmp3
				; VP_8-LABEL: Checking a loop in "i16_factor_4"
				; VP_8: Found an estimated cost of 544 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_8-NEXT: Found an estimated cost of 288 for VF 8 For recipe: "REPLICATE store 0, %tmp3
				; VP_16-LABEL: Checking a loop in "i16_factor_4"
				; VP_16: Found an estimated cost of 2112 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-NEXT: Found an estimated cost of 1088 for VF 16 For recipe: "REPLICATE store 0, %tmp3
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 2
	%tmp3 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 3			%tmp3 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 3
	%tmp4 = load i16, i16* %tmp0, align 2			%tmp4 = load i16, i16* %tmp0, align 2
	%tmp5 = load i16, i16* %tmp1, align 2			%tmp5 = load i16, i16* %tmp1, align 2
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i32, i32* %tmp0, align 4			; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i32, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i32, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i32, i32* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i32, i32* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i32, i32* %tmp2, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i32, i32* %tmp3, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i32, i32* %tmp3, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp2, align 4
	; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i32 0, i32* %tmp3, align 4			; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i32 0, i32* %tmp3, align 4
				; VP_2-LABEL: Checking a loop in "i32_factor_4"
				; VP_2: Found an estimated cost of 40 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_2-NEXT: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE store 0, %tmp3
				; VP_4-LABEL: Checking a loop in "i32_factor_4"
				; VP_4: Found an estimated cost of 32 for VF 4 For recipe: "WIDEN load ir<%tmp0>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp1>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp2>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp3>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp0>, ir<0>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp1>, ir<0>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp2>, ir<0>
				; VP_4-NEXT: Found an estimated cost of 32 for VF 4 For recipe: "WIDEN store ir<%tmp3>, ir<0>
				; VP_8-LABEL: Checking a loop in "i32_factor_4"
				; VP_8: Found an estimated cost of 544 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_8-NEXT: Found an estimated cost of 288 for VF 8 For recipe: "REPLICATE store 0, %tmp3
				; VP_16-LABEL: Checking a loop in "i32_factor_4"
				; VP_16: Found an estimated cost of 2112 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-NEXT: Found an estimated cost of 1088 for VF 16 For recipe: "REPLICATE store 0, %tmp3
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 2
	%tmp3 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 3			%tmp3 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 3
	%tmp4 = load i32, i32* %tmp0, align 4			%tmp4 = load i32, i32* %tmp0, align 4
	%tmp5 = load i32, i32* %tmp1, align 4			%tmp5 = load i32, i32* %tmp1, align 4
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; VF_16: Found an estimated cost of 2176 for VF 16 For instruction: %tmp4 = load i64, i64* %tmp0, align 8			; VF_16: Found an estimated cost of 2176 for VF 16 For instruction: %tmp4 = load i64, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i64, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i64, i64* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i64, i64* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i64, i64* %tmp2, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i64, i64* %tmp3, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i64, i64* %tmp3, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i64 0, i64* %tmp2, align 8
	; VF_16-NEXT: Found an estimated cost of 1152 for VF 16 For instruction: store i64 0, i64* %tmp3, align 8			; VF_16-NEXT: Found an estimated cost of 1152 for VF 16 For instruction: store i64 0, i64* %tmp3, align 8
				; VP_2-LABEL: Checking a loop in "i64_factor_4"
				; VP_2: Found an estimated cost of 48 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0, %tmp2
				; VP_2-NEXT: Found an estimated cost of 32 for VF 2 For recipe: "REPLICATE store 0, %tmp3
				; VP_4-LABEL: Checking a loop in "i64_factor_4"
				; VP_4: Found an estimated cost of 160 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0, %tmp2
				; VP_4-NEXT: Found an estimated cost of 96 for VF 4 For recipe: "REPLICATE store 0, %tmp3
				; VP_8-LABEL: Checking a loop in "i64_factor_4"
				; VP_8: Found an estimated cost of 576 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0, %tmp2
				; VP_8-NEXT: Found an estimated cost of 320 for VF 8 For recipe: "REPLICATE store 0, %tmp3
				; VP_16-LABEL: Checking a loop in "i64_factor_4"
				; VP_16: Found an estimated cost of 2176 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0, %tmp2
				; VP_16-NEXT: Found an estimated cost of 1152 for VF 16 For recipe: "REPLICATE store 0, %tmp3
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 2
	%tmp3 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 3			%tmp3 = getelementptr inbounds %i64.4, %i64.4* %data, i64 %i, i32 3
	%tmp4 = load i64, i64* %tmp0, align 8			%tmp4 = load i64, i64* %tmp0, align 8
	%tmp5 = load i64, i64* %tmp1, align 8			%tmp5 = load i64, i64* %tmp1, align 8
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load half, half* %tmp0, align 2			; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load half, half* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load half, half* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load half, half* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load half, half* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load half, half* %tmp2, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load half, half* %tmp3, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load half, half* %tmp3, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp2, align 2			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp2, align 2
	; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store half 0xH0000, half* %tmp3, align 2			; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store half 0xH0000, half* %tmp3, align 2
				; VP_2-LABEL: Checking a loop in "f16_factor_4"
				; VP_2: Found an estimated cost of 40 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp2
				; VP_2-NEXT: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE store 0xH0000, %tmp3
				; VP_4-LABEL: Checking a loop in "f16_factor_4"
				; VP_4: Found an estimated cost of 144 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp2
				; VP_4-NEXT: Found an estimated cost of 80 for VF 4 For recipe: "REPLICATE store 0xH0000, %tmp3
				; VP_8-LABEL: Checking a loop in "f16_factor_4"
				; VP_8: Found an estimated cost of 544 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp2
				; VP_8-NEXT: Found an estimated cost of 288 for VF 8 For recipe: "REPLICATE store 0xH0000, %tmp3
				; VP_16-LABEL: Checking a loop in "f16_factor_4"
				; VP_16: Found an estimated cost of 2112 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0xH0000, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0xH0000, %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0xH0000, %tmp2
				; VP_16-NEXT: Found an estimated cost of 1088 for VF 16 For recipe: "REPLICATE store 0xH0000, %tmp3
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 2
	%tmp3 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 3			%tmp3 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 3
	%tmp4 = load half, half* %tmp0, align 2			%tmp4 = load half, half* %tmp0, align 2
	%tmp5 = load half, half* %tmp1, align 2			%tmp5 = load half, half* %tmp1, align 2
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load float, float* %tmp0, align 4			; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load float, float* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load float, float* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load float, float* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load float, float* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load float, float* %tmp2, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load float, float* %tmp3, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load float, float* %tmp3, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp2, align 4			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp2, align 4
	; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store float 0.000000e+00, float* %tmp3, align 4			; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store float 0.000000e+00, float* %tmp3, align 4
				; VP_2-LABEL: Checking a loop in "f32_factor_4"
				; VP_2: Found an estimated cost of 40 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_2-NEXT: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp3
				; VP_4-LABEL: Checking a loop in "f32_factor_4"
				; VP_4: Found an estimated cost of 32 for VF 4 For recipe: "WIDEN load ir<%tmp0>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp1>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp2>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN load ir<%tmp3>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp0>, ir<0.000000e+00>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp1>, ir<0.000000e+00>
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "WIDEN store ir<%tmp2>, ir<0.000000e+00>
				; VP_4-NEXT: Found an estimated cost of 32 for VF 4 For recipe: "WIDEN store ir<%tmp3>, ir<0.000000e+00>
				; VP_8-LABEL: Checking a loop in "f32_factor_4"
				; VP_8: Found an estimated cost of 544 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_8-NEXT: Found an estimated cost of 288 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp3
				; VP_16-LABEL: Checking a loop in "f32_factor_4"
				; VP_16: Found an estimated cost of 2112 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_16-NEXT: Found an estimated cost of 1088 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp3
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 2
	%tmp3 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 3			%tmp3 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 3
	%tmp4 = load float, float* %tmp0, align 4			%tmp4 = load float, float* %tmp0, align 4
	%tmp5 = load float, float* %tmp1, align 4			%tmp5 = load float, float* %tmp1, align 4
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load double, double* %tmp0, align 8			; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load double, double* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load double, double* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load double, double* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load double, double* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load double, double* %tmp2, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load double, double* %tmp3, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load double, double* %tmp3, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp0, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp0, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp1, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp1, align 8
	; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp2, align 8			; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store double 0.000000e+00, double* %tmp2, align 8
	; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store double 0.000000e+00, double* %tmp3, align 8			; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store double 0.000000e+00, double* %tmp3, align 8
				; VP_2-LABEL: Checking a loop in "f64_factor_4"
				; VP_2: Found an estimated cost of 40 for VF 2 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_2-NEXT: Found an estimated cost of 0 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_2-NEXT: Found an estimated cost of 24 for VF 2 For recipe: "REPLICATE store 0.000000e+00, %tmp3
				; VP_4-LABEL: Checking a loop in "f64_factor_4"
				; VP_4: Found an estimated cost of 144 for VF 4 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_4-NEXT: Found an estimated cost of 0 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_4-NEXT: Found an estimated cost of 80 for VF 4 For recipe: "REPLICATE store 0.000000e+00, %tmp3
				; VP_8-LABEL: Checking a loop in "f64_factor_4"
				; VP_8: Found an estimated cost of 544 for VF 8 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_8-NEXT: Found an estimated cost of 0 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_8-NEXT: Found an estimated cost of 288 for VF 8 For recipe: "REPLICATE store 0.000000e+00, %tmp3
				; VP_16-LABEL: Checking a loop in "f64_factor_4"
				; VP_16: Found an estimated cost of 2112 for VF 16 For recipe: "REPLICATE %tmp4 = load %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp5 = load %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp6 = load %tmp2
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE %tmp7 = load %tmp3
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp0
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp1
				; VP_16-NEXT: Found an estimated cost of 0 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp2
				; VP_16-NEXT: Found an estimated cost of 1088 for VF 16 For recipe: "REPLICATE store 0.000000e+00, %tmp3
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 1
	%tmp2 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 2			%tmp2 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 2
	%tmp3 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 3			%tmp3 = getelementptr inbounds %f64.4, %f64.4* %data, i64 %i, i32 3
	%tmp4 = load double, double* %tmp0, align 8			%tmp4 = load double, double* %tmp0, align 8
	%tmp5 = load double, double* %tmp1, align 8			%tmp5 = load double, double* %tmp1, align 8
	Show All 15 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-shiftcost.ll

	; RUN: opt -loop-vectorize < %s -S -o - \| FileCheck %s --check-prefix=CHECK			; RUN: opt -loop-vectorize < %s -S -o - \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -loop-vectorize -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=CHECK-COST			; RUN: opt -loop-vectorize -debug-only=loop-vectorize -cost-using-vplan=false -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=CHECK-COST
				; RUN: opt -loop-vectorize -debug-only=loop-vectorize -cost-using-vplan -disable-output < %s 2>&1 \| FileCheck %s --check-prefix=CHECK-COST-VPLAN
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv8.1m.main-none-none-eabi"			target triple = "thumbv8.1m.main-none-none-eabi"

	; CHECK-LABEL: test			; CHECK-LABEL: test
	; CHECK-COST: LV: Found an estimated cost of 0 for VF 1 For instruction: %and515 = shl i32 %l41, 3			; CHECK-COST: LV: Found an estimated cost of 0 for VF 1 For instruction: %and515 = shl i32 %l41, 3
	; CHECK-COST: LV: Found an estimated cost of 1 for VF 1 For instruction: %l45 = and i32 %and515, 131072			; CHECK-COST: LV: Found an estimated cost of 1 for VF 1 For instruction: %l45 = and i32 %and515, 131072
	; CHECK-COST: LV: Found an estimated cost of 2 for VF 4 For instruction: %and515 = shl i32 %l41, 3			; CHECK-COST: LV: Found an estimated cost of 2 for VF 4 For instruction: %and515 = shl i32 %l41, 3
	; CHECK-COST: LV: Found an estimated cost of 2 for VF 4 For instruction: %l45 = and i32 %and515, 131072			; CHECK-COST: LV: Found an estimated cost of 2 for VF 4 For instruction: %l45 = and i32 %and515, 131072
				; CHECK-COST-VPLAN: LV: Found an estimated cost of 0 for VF 1 For recipe: "CLONE %and515 = shl %l41, 3
				; CHECK-COST-VPLAN: LV: Found an estimated cost of 1 for VF 1 For recipe: "CLONE %l45 = and %and515, 131072
				; CHECK-COST-VPLAN: LV: Found an estimated cost of 2 for VF 4 For recipe: "WIDEN\l"" %and515 = shl %l41, 3
				; CHECK-COST-VPLAN: LV: Found an estimated cost of 2 for VF 4 For recipe: "WIDEN\l"" %l45 = and %and515, 131072
	; CHECK-NOT: vector.body			; CHECK-NOT: vector.body

	define void @test([101 x i32] *%src, i32 %N) #0 {			define void @test([101 x i32] *%src, i32 %N) #0 {
	entry:			entry:
	br label %for.body386			br label %for.body386

	for.body386: ; preds = %entry, %l77			for.body386: ; preds = %entry, %l77
	%add387 = phi i32 [ %inc532, %l77 ], [ 0, %entry ]			%add387 = phi i32 [ %inc532, %l77 ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/SystemZ/branch-for-predicated-block.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize \			; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=false \
	; RUN: -force-vector-width=2 -debug-only=loop-vectorize \			; RUN: -force-vector-width=2 -debug-only=loop-vectorize \
	; RUN: -disable-output < %s 2>&1 \| FileCheck %s			; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=true \
				; RUN: -force-vector-width=2 -debug-only=loop-vectorize \
				; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	; Check costs for branches inside a vectorized loop around predicated			; Check costs for branches inside a vectorized loop around predicated
	; blocks. Each such branch will be guarded with an extractelement from the			; blocks. Each such branch will be guarded with an extractelement from the
	; vector compare plus a test under mask instruction. This cost is modelled on			; vector compare plus a test under mask instruction. This cost is modelled on
	; the extractelement of i1.			; the extractelement of i1.

	define void @fun(i32* %arr, i64 %trip.count) {			define void @fun(i32* %arr, i64 %trip.count) {
	entry:			entry:
	Show All 14 Lines
	for.inc:			for.inc:
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, %trip.count			%exitcond = icmp eq i64 %indvars.iv.next, %trip.count
	br i1 %exitcond, label %for.end.loopexit, label %for.body			br i1 %exitcond, label %for.end.loopexit, label %for.body

	for.end.loopexit:			for.end.loopexit:
	ret void			ret void

	; CHECK: LV: Found an estimated cost of 7 for VF 2 For instruction: br i1 %cmp55, label %if.then, label %for.inc			; CHECK-CM: LV: Found an estimated cost of 7 for VF 2 For instruction: br i1 %cmp55, label %if.then, label %for.inc
	; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: br label %for.inc			; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: br label %for.inc
	; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: br i1 %exitcond, label %for.end.loopexit, label %for.body			; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: br i1 %exitcond, label %for.end.loopexit, label %for.body
				; CHECK-VP: LV: Found an estimated cost of 7 for VF 2 For recipe: "BRANCH-ON-MASK ir<%cmp55>
				; CHECK-VP-NOT: LV: Found an estimated cost of {{.}} for VF 2 For recipe: {{.}} br
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For loop backedge cost (br)
	}			}

llvm/test/Transforms/LoopVectorize/SystemZ/load-scalarization-cost-0.ll

	; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize \			; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=false \
	; RUN: -force-vector-width=2 -debug-only=loop-vectorize \			; RUN: -force-vector-width=2 -debug-only=loop-vectorize \
	; RUN: -disable-output < %s 2>&1 \| FileCheck %s			; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=true \
				; RUN: -force-vector-width=2 -debug-only=loop-vectorize \
				; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
	; REQUIRES: asserts			; REQUIRES: asserts
	;			;
	; Check that a scalarized load does not get operands scalarization costs added.			; Check that a scalarized load does not get operands scalarization costs added.

	define void @fun(i64* %data, i64 %n, i64 %s, double* %Src) {			define void @fun(i64* %data, i64 %n, i64 %s, double* %Src) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%mul = mul nsw i64 %iv, %s			%mul = mul nsw i64 %iv, %s
	%gep = getelementptr inbounds double, double* %Src, i64 %mul			%gep = getelementptr inbounds double, double* %Src, i64 %mul
	%bct = bitcast double* %gep to i64*			%bct = bitcast double* %gep to i64*
	%ld = load i64, i64* %bct			%ld = load i64, i64* %bct
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%cmp110.us = icmp slt i64 %iv.next, %n			%cmp110.us = icmp slt i64 %iv.next, %n
	br i1 %cmp110.us, label %for.body, label %for.end			br i1 %cmp110.us, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void

	; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %mul = mul nsw i64 %iv, %s			; CHECK-CM: LV: Found an estimated cost of 2 for VF 2 For instruction: %mul = mul nsw i64 %iv, %s
	; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %ld = load i64, i64* %bct			; CHECK-CM: LV: Found an estimated cost of 2 for VF 2 For instruction: %ld = load i64, i64* %bct
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE %mul = mul %iv, %s
				; CHECK-VP: LV: Found an estimated cost of 2 for VF 2 For recipe: "REPLICATE %ld = load %bct
	}			}

llvm/test/Transforms/LoopVectorize/SystemZ/load-scalarization-cost-1.ll

	; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize \			; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=false \
	; RUN: -force-vector-width=4 -debug-only=loop-vectorize \			; RUN: -force-vector-width=4 -debug-only=loop-vectorize \
	; RUN: -enable-interleaved-mem-accesses=false -disable-output < %s 2>&1 \			; RUN: -enable-interleaved-mem-accesses=false -disable-output < %s 2>&1 \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=true \
				; RUN: -force-vector-width=4 -debug-only=loop-vectorize \
				; RUN: -enable-interleaved-mem-accesses=false -disable-output < %s 2>&1 \
				; RUN: \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
	; REQUIRES: asserts			; REQUIRES: asserts
	;			;
	; Check that a scalarized load does not get a zero cost in a vectorized			; Check that a scalarized load does not get a zero cost in a vectorized
	; loop. It can only be folded into the add operand in the scalar loop.			; loop. It can only be folded into the add operand in the scalar loop.

	define i32 @fun(i64* %data, i64 %n, i64 %s, i32* %Src) {			define i32 @fun(i64* %data, i64 %n, i64 %s, i32* %Src) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%acc = phi i32 [ 0, %entry ], [ %acc_next, %for.body ]			%acc = phi i32 [ 0, %entry ], [ %acc_next, %for.body ]
	%gep = getelementptr inbounds i32, i32* %Src, i64 %iv			%gep = getelementptr inbounds i32, i32* %Src, i64 %iv
	%ld = load i32, i32* %gep			%ld = load i32, i32* %gep
	%acc_next = add i32 %acc, %ld			%acc_next = add i32 %acc, %ld
	%iv.next = add nuw nsw i64 %iv, 2			%iv.next = add nuw nsw i64 %iv, 2
	%cmp110.us = icmp slt i64 %iv.next, %n			%cmp110.us = icmp slt i64 %iv.next, %n
	br i1 %cmp110.us, label %for.body, label %for.end			br i1 %cmp110.us, label %for.body, label %for.end

	for.end:			for.end:
	ret i32 %acc_next			ret i32 %acc_next

	; CHECK: Found an estimated cost of 4 for VF 4 For instruction: %ld = load i32, i32* %gep			; CHECK-CM: Found an estimated cost of 4 for VF 4 For instruction: %ld = load i32, i32* %gep
				; CHECK-VP: Found an estimated cost of 4 for VF 4 For recipe: "REPLICATE %ld = load %gep
	}			}

llvm/test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize \			; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=false \
	; RUN: -force-vector-width=4 -debug-only=loop-vectorize \			; RUN: -force-vector-width=4 -debug-only=loop-vectorize \
	; RUN: -disable-output -enable-interleaved-mem-accesses=false < %s 2>&1 \| \			; RUN: -disable-output -enable-interleaved-mem-accesses=false < %s 2>&1 \| \
	; RUN: FileCheck %s			; RUN: FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=true \
				; RUN: -force-vector-width=4 -debug-only=loop-vectorize \
				; RUN: -disable-output -enable-interleaved-mem-accesses=false < %s 2>&1 \| \
				; RUN: FileCheck %s --check-prefixes=CHECK,CHECK-VP
	;			;
	; Check that a scalarized load/store does not get a cost for insterts/			; Check that a scalarized load/store does not get a cost for insterts/
	; extracts, since z13 supports element load/store.			; extracts, since z13 supports element load/store.

	define void @fun(i32* %data, i64 %n) {			define void @fun(i32* %data, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds i32, i32* %data, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %data, i64 %i
	%tmp1 = load i32, i32* %tmp0, align 4			%tmp1 = load i32, i32* %tmp0, align 4
	%tmp2 = add i32 %tmp1, 1			%tmp2 = add i32 %tmp1, 1
	store i32 %tmp2, i32* %tmp0, align 4			store i32 %tmp2, i32* %tmp0, align 4
	%i.next = add nuw nsw i64 %i, 2			%i.next = add nuw nsw i64 %i, 2
	%cond = icmp slt i64 %i.next, %n			%cond = icmp slt i64 %i.next, %n
	br i1 %cond, label %for.body, label %for.end			br i1 %cond, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void

	; CHECK: LV: Scalarizing: %tmp1 = load i32, i32* %tmp0, align 4			; CHECK: LV: Scalarizing: %tmp1 = load i32, i32* %tmp0, align 4
	; CHECK: LV: Scalarizing: store i32 %tmp2, i32* %tmp0, align 4			; CHECK: LV: Scalarizing: store i32 %tmp2, i32* %tmp0, align 4

	; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4			; CHECK-CM: LV: Found an estimated cost of 4 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4
	; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %tmp2, i32* %tmp0, align 4			; CHECK-CM: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %tmp2, i32* %tmp0, align 4
				; CHECK-VP: LV: Found an estimated cost of 4 for VF 4 For recipe: "REPLICATE %tmp1 = load %tmp0
				; CHECK-VP: LV: Found an estimated cost of 4 for VF 4 For recipe: "REPLICATE store %tmp2, %tmp0
	}			}

llvm/test/Transforms/LoopVectorize/SystemZ/mem-interleaving-costs-02.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize \		; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=false \
; RUN: -debug-only=loop-vectorize,vectorutils -max-interleave-group-factor=64\		; RUN: -debug-only=loop-vectorize,vectorutils -max-interleave-group-factor=64\
; RUN: -disable-output < %s 2>&1 \| FileCheck %s		; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
		; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=true \
		; RUN: -debug-only=loop-vectorize,vectorutils -max-interleave-group-factor=64\
		; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
;		;
; Check that some cost estimations for interleave groups make sense.		; Check that some cost estimations for interleave groups make sense.

; This loop is loading four i16 values at indices [0, 1, 2, 3], with a stride		; This loop is loading four i16 values at indices [0, 1, 2, 3], with a stride
; of 4. At VF=4, memory interleaving means loading 4 * 4 * 16 bits = 2 vector		; of 4. At VF=4, memory interleaving means loading 4 * 4 * 16 bits = 2 vector
; registers. Each of the 4 vector values must then be constructed from the		; registers. Each of the 4 vector values must then be constructed from the
; two vector registers using one vperm each, which gives a cost of 2 + 4 = 6.		; two vector registers using one vperm each, which gives a cost of 2 + 4 = 6.
;		;
; CHECK: LV: Checking a loop in "fun0"		; CHECK: LV: Checking a loop in "fun0"
; CHECK: LV: Found an estimated cost of 6 for VF 4 For instruction: %ld0 = load i16		; CHECK-CM: LV: Found an estimated cost of 6 for VF 4 For instruction: %ld0 = load i16
; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction: %ld1 = load i16		; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: %ld1 = load i16
; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction: %ld2 = load i16		; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: %ld2 = load i16
; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction: %ld3 = load i16		; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: %ld3 = load i16
		; CHECK-VP: LV: Found an estimated cost of 6 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 4 at %ld0
define void @fun0(i16 %ptr, i16 %dst) {		define void @fun0(i16 %ptr, i16 %dst) {
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%ivptr = phi i16* [ %ptr.next, %for.body ], [ %ptr, %entry ]		%ivptr = phi i16* [ %ptr.next, %for.body ], [ %ptr, %entry ]
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%inc = add i64 %iv, 4		%inc = add i64 %iv, 4
Show All 18 Lines	for.end:
ret void		ret void
}		}

; This loop loads one i8 value in a stride of 3. At VF=16, this means loading		; This loop loads one i8 value in a stride of 3. At VF=16, this means loading
; 3 vector registers, and then constructing the vector value with two vperms,		; 3 vector registers, and then constructing the vector value with two vperms,
; which gives a cost of 5.		; which gives a cost of 5.
;		;
; CHECK: LV: Checking a loop in "fun1"		; CHECK: LV: Checking a loop in "fun1"
; CHECK: LV: Found an estimated cost of 5 for VF 16 For instruction: %ld0 = load i8		; CHECK-CM: LV: Found an estimated cost of 5 for VF 16 For instruction: %ld0 = load i8
		; CHECK-VP: LV: Found an estimated cost of 5 for VF 16 For recipe: "INTERLEAVE-GROUP with factor 3 at %ld0
define void @fun1(i8 %ptr, i8 %dst) {		define void @fun1(i8 %ptr, i8 %dst) {
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%ivptr = phi i8* [ %ptr.next, %for.body ], [ %ptr, %entry ]		%ivptr = phi i8* [ %ptr.next, %for.body ], [ %ptr, %entry ]
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%inc = add i64 %iv, 4		%inc = add i64 %iv, 4
Show All 9 Lines	for.end:
ret void		ret void
}		}

; This loop is loading 4 i8 values at indexes [0, 1, 2, 3], with a stride of		; This loop is loading 4 i8 values at indexes [0, 1, 2, 3], with a stride of
; 32. At VF=2, this means loading 2 vector registers, and using 4 vperms to		; 32. At VF=2, this means loading 2 vector registers, and using 4 vperms to
; produce the vector values, which gives a cost of 6.		; produce the vector values, which gives a cost of 6.
;		;
; CHECK: LV: Checking a loop in "fun2"		; CHECK: LV: Checking a loop in "fun2"
; CHECK: LV: Found an estimated cost of 6 for VF 2 For instruction: %ld0 = load i8		; CHECK-CM: LV: Found an estimated cost of 6 for VF 2 For instruction: %ld0 = load i8
; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld1 = load i8		; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld1 = load i8
; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld2 = load i8		; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld2 = load i8
; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld3 = load i8		; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld3 = load i8
		; CHECK-VP: LV: Found an estimated cost of 6 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 32 at %ld0
define void @fun2(i8 %ptr, i8 %dst) {		define void @fun2(i8 %ptr, i8 %dst) {
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%ivptr = phi i8* [ %ptr.next, %for.body ], [ %ptr, %entry ]		%ivptr = phi i8* [ %ptr.next, %for.body ], [ %ptr, %entry ]
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%inc = add i64 %iv, 4		%inc = add i64 %iv, 4
Show All 20 Lines

; This loop is loading 4 i8 values at indexes [0, 1, 2, 3], with a stride of		; This loop is loading 4 i8 values at indexes [0, 1, 2, 3], with a stride of
; 30. At VF=2, this means loading 3 vector registers, and using 4 vperms to		; 30. At VF=2, this means loading 3 vector registers, and using 4 vperms to
; produce the vector values, which gives a cost of 7. This is the same loop		; produce the vector values, which gives a cost of 7. This is the same loop
; as in fun2, except the stride makes the second iterations values overlap a		; as in fun2, except the stride makes the second iterations values overlap a
; vector register boundary.		; vector register boundary.
;		;
; CHECK: LV: Checking a loop in "fun3"		; CHECK: LV: Checking a loop in "fun3"
; CHECK: LV: Found an estimated cost of 7 for VF 2 For instruction: %ld0 = load i8		; CHECK-CM: LV: Found an estimated cost of 7 for VF 2 For instruction: %ld0 = load i8
; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld1 = load i8		; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld1 = load i8
; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld2 = load i8		; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld2 = load i8
; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld3 = load i8		; CHECK-CM: LV: Found an estimated cost of 0 for VF 2 For instruction: %ld3 = load i8
		; CHECK-VP: LV: Found an estimated cost of 7 for VF 2 For recipe: "INTERLEAVE-GROUP with factor 30 at %ld0
define void @fun3(i8 %ptr, i8 %dst) {		define void @fun3(i8 %ptr, i8 %dst) {
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%ivptr = phi i8* [ %ptr.next, %for.body ], [ %ptr, %entry ]		%ivptr = phi i8* [ %ptr.next, %for.body ], [ %ptr, %entry ]
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%inc = add i64 %iv, 4		%inc = add i64 %iv, 4
Show All 20 Lines

llvm/test/Transforms/LoopVectorize/SystemZ/mem-interleaving-costs.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize \		; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=false \
; RUN: -force-vector-width=4 -debug-only=loop-vectorize,vectorutils \		; RUN: -force-vector-width=4 -debug-only=loop-vectorize,vectorutils \
; RUN: -disable-output < %s 2>&1 \| FileCheck %s		; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
		; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize -cost-using-vplan=true \
		; RUN: -force-vector-width=4 -debug-only=loop-vectorize,vectorutils \
		; RUN: -disable-output < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
;		;
; Check that the loop vectorizer performs memory interleaving with accurate		; Check that the loop vectorizer performs memory interleaving with accurate
; cost estimations.		; cost estimations.


; Simple case where just the load is interleaved, because the store group		; Simple case where just the load is interleaved, because the store group
; would have gaps.		; would have gaps.
define void @fun0(i32* %data, i64 %n) {		define void @fun0(i32* %data, i64 %n) {
Show All 9 Lines	for.body:
%i.next = add nuw nsw i64 %i, 2		%i.next = add nuw nsw i64 %i, 2
%cond = icmp slt i64 %i.next, %n		%cond = icmp slt i64 %i.next, %n
br i1 %cond, label %for.body, label %for.end		br i1 %cond, label %for.body, label %for.end

for.end:		for.end:
ret void		ret void

; CHECK: LV: Creating an interleave group with: %tmp1 = load i32, i32* %tmp0, align 4		; CHECK: LV: Creating an interleave group with: %tmp1 = load i32, i32* %tmp0, align 4
; CHECK: LV: Found an estimated cost of 3 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4		; CHECK-CM: LV: Found an estimated cost of 3 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4
		; CHECK-VP: LV: Found an estimated cost of 3 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp1
; (vl; vl; vperm)		; (vl; vl; vperm)
}		}

; Interleaving of both load and stores.		; Interleaving of both load and stores.
define void @fun1(i32* %data, i64 %n) {		define void @fun1(i32* %data, i64 %n) {
entry:		entry:
br label %for.body		br label %for.body

Show All 15 Lines

; CHECK: LV: Creating an interleave group with: store i32 %tmp3, i32* %tmp0, align 4		; CHECK: LV: Creating an interleave group with: store i32 %tmp3, i32* %tmp0, align 4
; CHECK: LV: Inserted: store i32 %tmp1, i32* %tmp2, align 4		; CHECK: LV: Inserted: store i32 %tmp1, i32* %tmp2, align 4
; CHECK: into the interleave group with store i32 %tmp3, i32* %tmp0, align 4		; CHECK: into the interleave group with store i32 %tmp3, i32* %tmp0, align 4
; CHECK: LV: Creating an interleave group with: %tmp3 = load i32, i32* %tmp2, align 4		; CHECK: LV: Creating an interleave group with: %tmp3 = load i32, i32* %tmp2, align 4
; CHECK: LV: Inserted: %tmp1 = load i32, i32* %tmp0, align 4		; CHECK: LV: Inserted: %tmp1 = load i32, i32* %tmp0, align 4
; CHECK: into the interleave group with %tmp3 = load i32, i32* %tmp2, align 4		; CHECK: into the interleave group with %tmp3 = load i32, i32* %tmp2, align 4

; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4		; CHECK-CM: LV: Found an estimated cost of 4 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4
; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i32, i32* %tmp2, align 4		; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i32, i32* %tmp2, align 4
		; CHECK-VP: LV: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2 at %tmp1
; (vl; vl; vperm, vpkg)		; (vl; vl; vperm, vpkg)

; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction: store i32 %tmp1, i32* %tmp2, align 4		; CHECK-CM: LV: Found an estimated cost of 0 for VF 4 For instruction: store i32 %tmp1, i32* %tmp2, align 4
; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %tmp3, i32* %tmp0, align 4		; CHECK-CM: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %tmp3, i32* %tmp0, align 4
		; CHECK-VP: LV: Found an estimated cost of 4 for VF 4 For recipe: "INTERLEAVE-GROUP with factor 2
; (vmrlf; vmrhf; vst; vst)		; (vmrlf; vmrhf; vst; vst)
}		}

llvm/test/Transforms/LoopVectorize/X86/fneg-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt %s -loop-vectorize -debug-only=loop-vectorize -S 2>&1 \| FileCheck %s			; RUN: opt %s -loop-vectorize -debug-only=loop-vectorize -S 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	; CHECK: Found an estimated cost of 4 for VF 1 For instruction: %neg = fneg float %{{.*}}			; CHECK: Found an estimated cost of 4 for VF 1 For {{.*}} %neg = fneg
	; CHECK: Found an estimated cost of 4 for VF 2 For instruction: %neg = fneg float %{{.*}}			; CHECK: Found an estimated cost of 4 for VF 2 For {{.*}} %neg = fneg
	; CHECK: Found an estimated cost of 4 for VF 4 For instruction: %neg = fneg float %{{.*}}			; CHECK: Found an estimated cost of 4 for VF 4 For {{.*}} %neg = fneg
	define void @fneg_cost(float* %a, i64 %n) {			define void @fneg_cost(float* %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body
	for.body: ; preds = %for.body.preheader, %for.body			for.body: ; preds = %for.body.preheader, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%neg = fneg float %0			%neg = fneg float %0
	store float %neg, float* %arrayidx, align 4			store float %neg, float* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%cmp = icmp eq i64 %indvars.iv.next, %n			%cmp = icmp eq i64 %indvars.iv.next, %n
	br i1 %cmp, label %for.end, label %for.body			br i1 %cmp, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

llvm/test/Transforms/LoopVectorize/X86/fp_to_sint8-cost-model.ll

	; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"


	; CHECK: cost of 4 for VF 8 For instruction: %conv = fptosi float %tmp to i8			; CHECK: cost of 4 for VF 8 For {{.*}} %conv = fptosi
	define void @float_to_sint8_cost(i8* noalias nocapture %a, float* noalias nocapture readonly %b) nounwind {			define void @float_to_sint8_cost(i8* noalias nocapture %a, float* noalias nocapture readonly %b) nounwind {
	entry:			entry:
	br label %for.body			br label %for.body
	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
	%tmp = load float, float* %arrayidx, align 4			%tmp = load float, float* %arrayidx, align 4
	%conv = fptosi float %tmp to i8			%conv = fptosi float %tmp to i8
	Show All 9 Lines

llvm/test/Transforms/LoopVectorize/X86/mul_slm_16bit.ll

Show All 26 Lines	for.body: ; preds = %for.body.preheader, %for.body
%arrayidx = getelementptr inbounds i8, i8* %dataA, i64 %indvars.iv		%arrayidx = getelementptr inbounds i8, i8* %dataA, i64 %indvars.iv
%0 = load i8, i8* %arrayidx, align 1		%0 = load i8, i8* %arrayidx, align 1
%conv = sext i8 %0 to i32		%conv = sext i8 %0 to i32
%arrayidx2 = getelementptr inbounds i8, i8* %dataB, i64 %indvars.iv		%arrayidx2 = getelementptr inbounds i8, i8* %dataB, i64 %indvars.iv
%1 = load i8, i8* %arrayidx2, align 1		%1 = load i8, i8* %arrayidx2, align 1
%conv3 = sext i8 %1 to i32		%conv3 = sext i8 %1 to i32
; sources of the mul is sext\sext from i8		; sources of the mul is sext\sext from i8
; use pmullw\sext seq.		; use pmullw\sext seq.
; SLM: cost of 3 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 3 for VF 4 {{.*}} mul
%mul = mul nsw i32 %conv3, %conv		%mul = mul nsw i32 %conv3, %conv
; sources of the mul is zext\sext from i8		; sources of the mul is zext\sext from i8
; use pmulhw\pmullw\pshuf		; use pmulhw\pmullw\pshuf
; SLM: cost of 5 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 5 for VF 4 {{.*}} mul
%conv4 = zext i8 %1 to i32		%conv4 = zext i8 %1 to i32
%mul2 = mul nsw i32 %conv4, %conv		%mul2 = mul nsw i32 %conv4, %conv
%sum0 = add i32 %mul, %mul2		%sum0 = add i32 %mul, %mul2
; sources of the mul is zext\zext from i8		; sources of the mul is zext\zext from i8
; use pmullw\zext		; use pmullw\zext
; SLM: cost of 3 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 3 for VF 4 {{.*}} mul
%conv5 = zext i8 %0 to i32		%conv5 = zext i8 %0 to i32
%mul3 = mul nsw i32 %conv5, %conv4		%mul3 = mul nsw i32 %conv5, %conv4
%sum1 = add i32 %sum0, %mul3		%sum1 = add i32 %sum0, %mul3
; sources of the mul is sext\-120		; sources of the mul is sext\-120
; use pmullw\sext		; use pmullw\sext
; SLM: cost of 3 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 3 for VF 4 {{.*}} mul
%mul4 = mul nsw i32 -120, %conv3		%mul4 = mul nsw i32 -120, %conv3
%sum2 = add i32 %sum1, %mul4		%sum2 = add i32 %sum1, %mul4
; sources of the mul is sext\250		; sources of the mul is sext\250
; use pmulhw\pmullw\pshuf		; use pmulhw\pmullw\pshuf
; SLM: cost of 5 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 5 for VF 4 {{.*}} mul
%mul5 = mul nsw i32 250, %conv3		%mul5 = mul nsw i32 250, %conv3
%sum3 = add i32 %sum2, %mul5		%sum3 = add i32 %sum2, %mul5
; sources of the mul is zext\-120		; sources of the mul is zext\-120
; use pmulhw\pmullw\pshuf		; use pmulhw\pmullw\pshuf
; SLM: cost of 5 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 5 for VF 4 {{.*}} mul
%mul6 = mul nsw i32 -120, %conv4		%mul6 = mul nsw i32 -120, %conv4
%sum4 = add i32 %sum3, %mul6		%sum4 = add i32 %sum3, %mul6
; sources of the mul is zext\250		; sources of the mul is zext\250
; use pmullw\zext		; use pmullw\zext
; SLM: cost of 3 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 3 for VF 4 {{.*}} mul
%mul7 = mul nsw i32 250, %conv4		%mul7 = mul nsw i32 250, %conv4
%sum5 = add i32 %sum4, %mul7		%sum5 = add i32 %sum4, %mul7
%add = add i32 %acc.013, 5		%add = add i32 %acc.013, 5
%add4 = add i32 %add, %sum5		%add4 = add i32 %add, %sum5
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count		%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}		}
Show All 21 Lines	for.body: ; preds = %for.body.preheader, %for.body
%arrayidx = getelementptr inbounds i16, i16* %dataA, i64 %indvars.iv		%arrayidx = getelementptr inbounds i16, i16* %dataA, i64 %indvars.iv
%0 = load i16, i16* %arrayidx, align 1		%0 = load i16, i16* %arrayidx, align 1
%conv = sext i16 %0 to i32		%conv = sext i16 %0 to i32
%arrayidx2 = getelementptr inbounds i16, i16* %dataB, i64 %indvars.iv		%arrayidx2 = getelementptr inbounds i16, i16* %dataB, i64 %indvars.iv
%1 = load i16, i16* %arrayidx2, align 1		%1 = load i16, i16* %arrayidx2, align 1
%conv3 = sext i16 %1 to i32		%conv3 = sext i16 %1 to i32
; sources of the mul is sext\sext from i16		; sources of the mul is sext\sext from i16
; use pmulhw\pmullw\pshuf seq.		; use pmulhw\pmullw\pshuf seq.
; SLM: cost of 5 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 5 for VF 4 {{.*}} mul
%mul = mul nsw i32 %conv3, %conv		%mul = mul nsw i32 %conv3, %conv
; sources of the mul is zext\sext from i16		; sources of the mul is zext\sext from i16
; use pmulld		; use pmulld
; SLM: cost of 11 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 11 for VF 4 {{.*}} mul
%conv4 = zext i16 %1 to i32		%conv4 = zext i16 %1 to i32
%mul2 = mul nsw i32 %conv4, %conv		%mul2 = mul nsw i32 %conv4, %conv
%sum0 = add i32 %mul, %mul2		%sum0 = add i32 %mul, %mul2
; sources of the mul is zext\zext from i16		; sources of the mul is zext\zext from i16
; use pmulhw\pmullw\zext		; use pmulhw\pmullw\zext
; SLM: cost of 5 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 5 for VF 4 {{.*}} mul
%conv5 = zext i16 %0 to i32		%conv5 = zext i16 %0 to i32
%mul3 = mul nsw i32 %conv5, %conv4		%mul3 = mul nsw i32 %conv5, %conv4
%sum1 = add i32 %sum0, %mul3		%sum1 = add i32 %sum0, %mul3
; sources of the mul is sext\-32000		; sources of the mul is sext\-32000
; use pmulhw\pmullw\sext		; use pmulhw\pmullw\sext
; SLM: cost of 5 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 5 for VF 4 {{.*}} mul
%mul4 = mul nsw i32 -32000, %conv3		%mul4 = mul nsw i32 -32000, %conv3
%sum2 = add i32 %sum1, %mul4		%sum2 = add i32 %sum1, %mul4
; sources of the mul is sext\64000		; sources of the mul is sext\64000
; use pmulld		; use pmulld
; SLM: cost of 11 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 11 for VF 4 {{.*}} mul
%mul5 = mul nsw i32 64000, %conv3		%mul5 = mul nsw i32 64000, %conv3
%sum3 = add i32 %sum2, %mul5		%sum3 = add i32 %sum2, %mul5
; sources of the mul is zext\-32000		; sources of the mul is zext\-32000
; use pmulld		; use pmulld
; SLM: cost of 11 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 11 for VF 4 {{.*}} mul
%mul6 = mul nsw i32 -32000, %conv4		%mul6 = mul nsw i32 -32000, %conv4
%sum4 = add i32 %sum3, %mul6		%sum4 = add i32 %sum3, %mul6
; sources of the mul is zext\64000		; sources of the mul is zext\64000
; use pmulhw\pmullw\zext		; use pmulhw\pmullw\zext
; SLM: cost of 5 for VF 4 {{.*}} mul nsw i32		; SLM: cost of 5 for VF 4 {{.*}} mul
%mul7 = mul nsw i32 250, %conv4		%mul7 = mul nsw i32 250, %conv4
%sum5 = add i32 %sum4, %mul7		%sum5 = add i32 %sum4, %mul7
%add = add i32 %acc.013, 5		%add = add i32 %acc.013, 5
%add4 = add i32 %add, %sum5		%add4 = add i32 %add, %sum5
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count		%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}		}

llvm/test/Transforms/LoopVectorize/X86/reduction-small-size.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -mcpu=core-axv2 -force-vector-interleave=1 -dce -instcombine -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -mcpu=core-axv2 -force-vector-interleave=1 -dce -instcombine -cost-using-vplan=false -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt < %s -loop-vectorize -mcpu=core-axv2 -force-vector-interleave=1 -dce -instcombine -cost-using-vplan=true -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; Make sure we ignore the costs of the redundant reduction casts			; Make sure we ignore the costs of the redundant reduction casts
	; char reduction_i8(char a, char b, int n) {			; char reduction_i8(char a, char b, int n) {
	; char sum = 0;			; char sum = 0;
	; for (int i = 0; i < n; ++i)			; for (int i = 0; i < n; ++i)
	; sum += (a[i] + b[i]);			; sum += (a[i] + b[i]);
	; return sum;			; return sum;
	; }			; }
	;			;

	; CHECK-LABEL: reduction_i8			; CHECK-LABEL: reduction_i8
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = phi			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = phi
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = phi			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = phi
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = getelementptr			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = getelementptr
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = load			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = load
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.}} = zext i8 %{{.}} to i32			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.}} = zext i8 %{{.}} to i32
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = getelementptr			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = getelementptr
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = load			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = load
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.}} = zext i8 %{{.}} to i32			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.}} = zext i8 %{{.}} to i32
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.}} = and i32 %{{.}}, 255			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.}} = and i32 %{{.}}, 255
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = add			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = add
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = add			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = add
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = add			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = add
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = trunc			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = trunc
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = icmp			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: %{{.*}} = icmp
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: br			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For instruction: br
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = phi			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = phi
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = phi			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = phi
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = getelementptr			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = getelementptr
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = load			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = load
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.}} = zext i8 %{{.}} to i32			; CHECK-CM-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.}} = zext i8 %{{.}} to i32
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = getelementptr			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = getelementptr
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = load			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = load
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.}} = zext i8 %{{.}} to i32			; CHECK-CM-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.}} = zext i8 %{{.}} to i32
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.}} = and i32 %{{.}}, 255			; CHECK-CM-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.}} = and i32 %{{.}}, 255
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = add			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = add
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = add			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = add
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = add			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = add
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = trunc			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = trunc
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = icmp			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{.*}} = icmp
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: br			; CHECK-CM: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: br
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "WIDEN-INDUCTION %{{.*}} = phi
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "WIDEN-PHI %{{.*}} = phi
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = getelementptr
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = load
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = zext
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = getelementptr
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = load
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = zext
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = and
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = add
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For recipe: "CLONE %{{.*}} = add
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 1 For loop induction check (add + icmp)
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN-INDUCTION %{{.*}} = phi
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN-PHI %{{.*}} = phi
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "CLONE %{{.*}} = getelementptr
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN load
				; CHECK-VP-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN\l"" %{{.*}} = zext
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "CLONE %{{.*}} = getelementptr
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN load
				; CHECK-VP-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN\l"" %{{.*}} = zext
				; CHECK-VP-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN\l"" %{{.*}} = and
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN\l"" %{{.*}} = add
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For recipe: "WIDEN\l"" %{{.*}} = add
				; CHECK-VP: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For loop induction check (add + icmp)
	;			;
	define i8 @reduction_i8(i8* nocapture readonly %a, i8* nocapture readonly %b, i32 %n) {			define i8 @reduction_i8(i8* nocapture readonly %a, i8* nocapture readonly %b, i32 %n) {
	entry:			entry:
	%cmp.12 = icmp sgt i32 %n, 0			%cmp.12 = icmp sgt i32 %n, 0
	br i1 %cmp.12, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp.12, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:			for.body.preheader:
	br label %for.body			br label %for.body
	Show All 27 Lines

llvm/test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll

	; RUN: opt < %s -loop-vectorize -mtriple x86_64 -debug -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -mtriple x86_64 -cost-using-vplan=false -debug -disable-output 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	; Check that cost model is not executed twice for VF=2 when vectorization is			; Check that cost model is not executed twice for VF=2 when vectorization is
	; forced for a particular loop.			; forced for a particular loop.

	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{[0-9]+}} = load i32			; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{[0-9]+}} = load i32
	; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: store i32			; CHECK: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: store i32
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{[0-9]+}} = load i32			; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF 2 For instruction: %{{[0-9]+}} = load i32
	Show All 25 Lines

llvm/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

	; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -cost-using-vplan=false -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -cost-using-vplan=true -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"


	; CHECK: cost of 4 for VF 1 For instruction: %conv = uitofp i64 %tmp to double			; CHECK-CM: cost of 4 for VF 1 For instruction: %conv = uitofp i64 %tmp to double
	; CHECK: cost of 5 for VF 2 For instruction: %conv = uitofp i64 %tmp to double			; CHECK-CM: cost of 5 for VF 2 For instruction: %conv = uitofp i64 %tmp to double
	; CHECK: cost of 6 for VF 4 For instruction: %conv = uitofp i64 %tmp to double			; CHECK-CM: cost of 6 for VF 4 For instruction: %conv = uitofp i64 %tmp to double
				; CHECK-VP: cost of 4 for VF 1 For recipe: "CLONE %conv = uitofp %tmp
				; CHECK-VP: cost of 5 for VF 2 For recipe: "WIDEN\l"" %conv = uitofp %tmp
				; CHECK-VP: cost of 6 for VF 4 For recipe: "WIDEN\l"" %conv = uitofp %tmp
	define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {			define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {
	entry:			entry:
	br label %for.body			br label %for.body
	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv
	%tmp = load i64, i64* %arrayidx, align 4			%tmp = load i64, i64* %arrayidx, align 4
	%conv = uitofp i64 %tmp to double			%conv = uitofp i64 %tmp to double
	Show All 9 Lines

llvm/test/Transforms/LoopVectorize/X86/uniformshift.ll

	; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+sse2 -loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s			; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+sse2 -loop-vectorize -cost-using-vplan=false -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
				; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+sse2 -loop-vectorize -cost-using-vplan=true -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP
	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK: "foo"			; CHECK: "foo"
	; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %shift = ashr i32 %val, %k			; CHECK-CM: LV: Found an estimated cost of 1 for VF 4 For instruction: %shift = ashr i32 %val, %k
				; CHECK-VP: LV: Found an estimated cost of 1 for VF 4 For recipe: "WIDEN\l"" %shift = ashr %val, %k
	define void @foo(i32* nocapture %p, i32 %k) local_unnamed_addr #0 {			define void @foo(i32* nocapture %p, i32 %k) local_unnamed_addr #0 {
	entry:			entry:
	br label %body			br label %body

	body:			body:
	%i = phi i64 [ 0, %entry ], [ %next, %body ]			%i = phi i64 [ 0, %entry ], [ %next, %body ]
	%ptr = getelementptr inbounds i32, i32* %p, i64 %i			%ptr = getelementptr inbounds i32, i32* %p, i64 %i
	%val = load i32, i32* %ptr, align 4			%val = load i32, i32* %ptr, align 4
	Show All 10 Lines

llvm/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll

Show All 17 Lines	; <label>:1
%2 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %indvars.iv		%2 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %indvars.iv
%3 = load i32, i32* %2, align 4		%3 = load i32, i32* %2, align 4
%4 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 %indvars.iv		%4 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 %indvars.iv
%5 = load i32, i32* %4, align 4		%5 = load i32, i32* %4, align 4
%6 = add nsw i32 %5, %3		%6 = add nsw i32 %5, %3
%7 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 %indvars.iv		%7 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 %indvars.iv

; A scalar select has a cost of 1 on core2		; A scalar select has a cost of 1 on core2
; CHECK: cost of 1 for VF 2 {{.*}} select i1 %cond, i32 %6, i32 0		; CHECK: cost of 1 for VF 2 {{.*}} select

%sel = select i1 %cond, i32 %6, i32 zeroinitializer		%sel = select i1 %cond, i32 %6, i32 zeroinitializer
store i32 %sel, i32* %7, align 4		store i32 %sel, i32* %7, align 4
%indvars.iv.next = add i64 %indvars.iv, 1		%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, 256		%exitcond = icmp eq i32 %lftr.wideiv, 256
br i1 %exitcond, label %8, label %1		br i1 %exitcond, label %8, label %1

Show All 11 Lines	; <label>:1
%3 = load i32, i32* %2, align 4		%3 = load i32, i32* %2, align 4
%4 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 %indvars.iv		%4 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 %indvars.iv
%5 = load i32, i32* %4, align 4		%5 = load i32, i32* %4, align 4
%6 = add nsw i32 %5, %3		%6 = add nsw i32 %5, %3
%7 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 %indvars.iv		%7 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 %indvars.iv
%8 = icmp ult i64 %indvars.iv, 8		%8 = icmp ult i64 %indvars.iv, 8

; A vector select has a cost of 1 on core2		; A vector select has a cost of 1 on core2
; CHECK: cost of 1 for VF 2 {{.*}} select i1 %8, i32 %6, i32 0		; CHECK: cost of 1 for VF 2 {{.*}} select

%sel = select i1 %8, i32 %6, i32 zeroinitializer		%sel = select i1 %8, i32 %6, i32 zeroinitializer
store i32 %sel, i32* %7, align 4		store i32 %sel, i32* %7, align 4
%indvars.iv.next = add i64 %indvars.iv, 1		%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, 256		%exitcond = icmp eq i32 %lftr.wideiv, 256
br i1 %exitcond, label %9, label %1		br i1 %exitcond, label %9, label %1

; <label>:9		; <label>:9
ret void		ret void
}		}

llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s --check-prefix=INTER			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s --check-prefix=INTER

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

	%pair = type { i32, i32 }			%pair = type { i32, i32 }

	; CHECK-LABEL: consecutive_ptr_forward			; CHECK-LABEL: consecutive_ptr_forward
	;			;
	; Check that a forward consecutive pointer is recognized as uniform and remains			; Check that a forward consecutive pointer is recognized as uniform and remains
	; uniform after vectorization.			; uniform after vectorization.
	;			;
	; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i			; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i
				; CHECK-LABEL: @consecutive_ptr_forward(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: getelementptr inbounds i32, i32* %a, i64 %index			; CHECK: getelementptr inbounds i32, i32* %a, i64 %index
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define i32 @consecutive_ptr_forward(i32* %a, i64 %n) {			define i32 @consecutive_ptr_forward(i32* %a, i64 %n) {
	Show All 16 Lines
	}			}

	; CHECK-LABEL: consecutive_ptr_reverse			; CHECK-LABEL: consecutive_ptr_reverse
	;			;
	; Check that a reverse consecutive pointer is recognized as uniform and remains			; Check that a reverse consecutive pointer is recognized as uniform and remains
	; uniform after vectorization.			; uniform after vectorization.
	;			;
	; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i			; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i
				; CHECK-LABEL: @consecutive_ptr_reverse(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 %n, %index			; CHECK: %offset.idx = sub i64 %n, %index
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 -3			; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 -3
	; CHECK: getelementptr inbounds i32, i32* %[[G0]], i64 %offset.idx			; CHECK: getelementptr inbounds i32, i32* %[[G0]], i64 %offset.idx
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	Show All 22 Lines
	;			;
	; Check that a consecutive-like pointer used by a forward interleaved group is			; Check that a consecutive-like pointer used by a forward interleaved group is
	; recognized as uniform and remains uniform after vectorization. When			; recognized as uniform and remains uniform after vectorization. When
	; interleaved memory accesses aren't enabled, the pointer should not be			; interleaved memory accesses aren't enabled, the pointer should not be
	; recognized as uniform, and it should not be uniform after vectorization.			; recognized as uniform, and it should not be uniform after vectorization.
	;			;
	; CHECK-NOT: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0			; CHECK-NOT: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
	; CHECK-NOT: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1			; CHECK-NOT: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
				; CHECK-LABEL: @interleaved_access_forward(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %[[I1:.+]] = or i64 %index, 1			; CHECK: %[[I1:.+]] = or i64 %index, 1
	; CHECK: %[[I2:.+]] = or i64 %index, 2			; CHECK: %[[I2:.+]] = or i64 %index, 2
	; CHECK: %[[I3:.+]] = or i64 %index, 3			; CHECK: %[[I3:.+]] = or i64 %index, 3
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %index, i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %index, i32 1
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 1
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 1
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 1
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	; INTER: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0			; INTER: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
	; INTER: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1			; INTER: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
				; INTER-LABEL: @interleaved_access_forward(
	; INTER: vector.body			; INTER: vector.body
	; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; INTER-NOT: getelementptr			; INTER-NOT: getelementptr
	; INTER: getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0			; INTER: getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0
	; INTER-NOT: getelementptr			; INTER-NOT: getelementptr
	; INTER: br i1 {{.*}}, label %middle.block, label %vector.body			; INTER: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define i32 @interleaved_access_forward(%pair* %p, i64 %n) {			define i32 @interleaved_access_forward(%pair* %p, i64 %n) {
	Show All 24 Lines
	; Check that a consecutive-like pointer used by a reverse interleaved group is			; Check that a consecutive-like pointer used by a reverse interleaved group is
	; recognized as uniform and remains uniform after vectorization. When			; recognized as uniform and remains uniform after vectorization. When
	; interleaved memory accesses aren't enabled, the pointer should not be			; interleaved memory accesses aren't enabled, the pointer should not be
	; recognized as uniform, and it should not be uniform after vectorization.			; recognized as uniform, and it should not be uniform after vectorization.
	;			;
	; recognized as uniform, and it should not be uniform after vectorization.			; recognized as uniform, and it should not be uniform after vectorization.
	; CHECK-NOT: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0			; CHECK-NOT: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
	; CHECK-NOT: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1			; CHECK-NOT: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
				; CHECK-LABEL: @interleaved_access_reverse(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 %n, %index			; CHECK: %offset.idx = sub i64 %n, %index
	; CHECK: %[[I1:.+]] = add i64 %offset.idx, -1			; CHECK: %[[I1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[I2:.+]] = add i64 %offset.idx, -2			; CHECK: %[[I2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[I3:.+]] = add i64 %offset.idx, -3			; CHECK: %[[I3:.+]] = add i64 %offset.idx, -3
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %offset.idx, i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %offset.idx, i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 0			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 0
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %offset.idx, i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %offset.idx, i32 1
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 1
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 1
	; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 1			; CHECK: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 1
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	; INTER: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0			; INTER: LV: Found uniform instruction: %tmp1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
	; INTER: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1			; INTER: LV: Found uniform instruction: %tmp2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
				; INTER-LABEL: @interleaved_access_reverse(
	; INTER: vector.body			; INTER: vector.body
	; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; INTER: %offset.idx = sub i64 %n, %index			; INTER: %offset.idx = sub i64 %n, %index
	; INTER-NOT: getelementptr			; INTER-NOT: getelementptr
	; INTER: %[[G0:.+]] = getelementptr inbounds %pair, %pair* %p, i64 %offset.idx, i32 0			; INTER: %[[G0:.+]] = getelementptr inbounds %pair, %pair* %p, i64 %offset.idx, i32 0
	; INTER: getelementptr inbounds i32, i32* %[[G0]], i64 -6			; INTER: getelementptr inbounds i32, i32* %[[G0]], i64 -6
	; INTER-NOT: getelementptr			; INTER-NOT: getelementptr
	; INTER: br i1 {{.*}}, label %middle.block, label %vector.body			; INTER: br i1 {{.*}}, label %middle.block, label %vector.body
	Show All 25 Lines
	; Check that a consecutive-like pointer used by a forward interleaved group and			; Check that a consecutive-like pointer used by a forward interleaved group and
	; scalarized store is not recognized as uniform and is not uniform after			; scalarized store is not recognized as uniform and is not uniform after
	; vectorization. The store is scalarized because it's in a predicated block.			; vectorization. The store is scalarized because it's in a predicated block.
	; Even though the load in this example is vectorized and only uses the pointer			; Even though the load in this example is vectorized and only uses the pointer
	; as if it were uniform, the store is scalarized, making the pointer			; as if it were uniform, the store is scalarized, making the pointer
	; non-uniform.			; non-uniform.
	;			;
	; INTER-NOT: LV: Found uniform instruction: %tmp0 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0			; INTER-NOT: LV: Found uniform instruction: %tmp0 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
				; CHECK-LABEL: @predicated_store(
	; INTER: vector.body			; INTER: vector.body
	; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, {{.*}} ]			; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, {{.*}} ]
	; INTER: %[[G0:.+]] = getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0			; INTER: %[[G0:.+]] = getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0
	; INTER: %[[B0:.+]] = bitcast i32* %[[G0]] to <8 x i32>*			; INTER: %[[B0:.+]] = bitcast i32* %[[G0]] to <8 x i32>*
	; INTER: %wide.vec = load <8 x i32>, <8 x i32>* %[[B0]], align 8			; INTER: %wide.vec = load <8 x i32>, <8 x i32>* %[[B0]], align 8
	; INTER: %[[I1:.+]] = or i64 %index, 1			; INTER: %[[I1:.+]] = or i64 %index, 1
	; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0			; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0
	; INTER: %[[I2:.+]] = or i64 %index, 2			; INTER: %[[I2:.+]] = or i64 %index, 2
	Show All 28 Lines

	; CHECK-LABEL: irregular_type			; CHECK-LABEL: irregular_type
	;			;
	; Check that a consecutive pointer used by a scalarized store is not recognized			; Check that a consecutive pointer used by a scalarized store is not recognized
	; as uniform and is not uniform after vectorization. The store is scalarized			; as uniform and is not uniform after vectorization. The store is scalarized
	; because the stored type may required padding.			; because the stored type may required padding.
	;			;
	; CHECK-NOT: LV: Found uniform instruction: %tmp1 = getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %i			; CHECK-NOT: LV: Found uniform instruction: %tmp1 = getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %i
				; CHECK-LABEL: @irregular_type(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %[[I1:.+]] = or i64 %index, 1			; CHECK: %[[I1:.+]] = or i64 %index, 1
	; CHECK: %[[I2:.+]] = or i64 %index, 2			; CHECK: %[[I2:.+]] = or i64 %index, 2
	; CHECK: %[[I3:.+]] = or i64 %index, 3			; CHECK: %[[I3:.+]] = or i64 %index, 3
	; CHECK: getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %index			; CHECK: getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %index
	; CHECK: getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %[[I1]]			; CHECK: getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %[[I1]]
	; CHECK: getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %[[I2]]			; CHECK: getelementptr inbounds x86_fp80, x86_fp80* %a, i64 %[[I2]]
	Show All 18 Lines
	}			}

	; CHECK-LABEL: pointer_iv_uniform			; CHECK-LABEL: pointer_iv_uniform
	;			;
	; Check that a pointer induction variable is recognized as uniform and remains			; Check that a pointer induction variable is recognized as uniform and remains
	; uniform after vectorization.			; uniform after vectorization.
	;			;
	; CHECK: LV: Found uniform instruction: %p = phi i32* [ %tmp03, %for.body ], [ %a, %entry ]			; CHECK: LV: Found uniform instruction: %p = phi i32* [ %tmp03, %for.body ], [ %a, %entry ]
				; CHECK-LABEL: @pointer_iv_uniform(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: %next.gep = getelementptr i32, i32* %a, i64 %index			; CHECK: %next.gep = getelementptr i32, i32* %a, i64 %index
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define void @pointer_iv_uniform(i32* %a, i32 %x, i64 %n) {			define void @pointer_iv_uniform(i32* %a, i32 %x, i64 %n) {
	Show All 16 Lines
	; INTER-LABEL: pointer_iv_non_uniform_0			; INTER-LABEL: pointer_iv_non_uniform_0
	;			;
	; Check that a pointer induction variable with a non-uniform user is not			; Check that a pointer induction variable with a non-uniform user is not
	; recognized as uniform and is not uniform after vectorization. The pointer			; recognized as uniform and is not uniform after vectorization. The pointer
	; induction variable is used by getelementptr instructions that are non-uniform			; induction variable is used by getelementptr instructions that are non-uniform
	; due to scalarization of the stores.			; due to scalarization of the stores.
	;			;
	; INTER-NOT: LV: Found uniform instruction: %p = phi i32* [ %tmp03, %for.body ], [ %a, %entry ]			; INTER-NOT: LV: Found uniform instruction: %p = phi i32* [ %tmp03, %for.body ], [ %a, %entry ]
				; CHECK-LABEL: @pointer_iv_non_uniform_0(
	; INTER: vector.body			; INTER: vector.body
	; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; INTER: %[[I0:.+]] = shl i64 %index, 2			; INTER: %[[I0:.+]] = shl i64 %index, 2
	; INTER: %next.gep = getelementptr i32, i32* %a, i64 %[[I0]]			; INTER: %next.gep = getelementptr i32, i32* %a, i64 %[[I0]]
	; INTER: %[[S1:.+]] = shl i64 %index, 2			; INTER: %[[S1:.+]] = shl i64 %index, 2
	; INTER: %[[I1:.+]] = or i64 %[[S1]], 4			; INTER: %[[I1:.+]] = or i64 %[[S1]], 4
	; INTER: %next.gep2 = getelementptr i32, i32* %a, i64 %[[I1]]			; INTER: %next.gep2 = getelementptr i32, i32* %a, i64 %[[I1]]
	; INTER: %[[S2:.+]] = shl i64 %index, 2			; INTER: %[[S2:.+]] = shl i64 %index, 2
	Show All 34 Lines

	; CHECK-LABEL: pointer_iv_non_uniform_1			; CHECK-LABEL: pointer_iv_non_uniform_1
	;			;
	; Check that a pointer induction variable with a non-uniform user is not			; Check that a pointer induction variable with a non-uniform user is not
	; recognized as uniform and is not uniform after vectorization. The pointer			; recognized as uniform and is not uniform after vectorization. The pointer
	; induction variable is used by a store that will be scalarized.			; induction variable is used by a store that will be scalarized.
	;			;
	; CHECK-NOT: LV: Found uniform instruction: %p = phi x86_fp80* [%tmp1, %for.body], [%a, %entry]			; CHECK-NOT: LV: Found uniform instruction: %p = phi x86_fp80* [%tmp1, %for.body], [%a, %entry]
				; CHECK-LABEL: @pointer_iv_non_uniform_1(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %next.gep = getelementptr x86_fp80, x86_fp80* %a, i64 %index			; CHECK: %next.gep = getelementptr x86_fp80, x86_fp80* %a, i64 %index
	; CHECK: %[[I1:.+]] = or i64 %index, 1			; CHECK: %[[I1:.+]] = or i64 %index, 1
	; CHECK: %next.gep2 = getelementptr x86_fp80, x86_fp80* %a, i64 %[[I1]]			; CHECK: %next.gep2 = getelementptr x86_fp80, x86_fp80* %a, i64 %[[I1]]
	; CHECK: %[[I2:.+]] = or i64 %index, 2			; CHECK: %[[I2:.+]] = or i64 %index, 2
	; CHECK: %next.gep3 = getelementptr x86_fp80, x86_fp80* %a, i64 %[[I2]]			; CHECK: %next.gep3 = getelementptr x86_fp80, x86_fp80* %a, i64 %[[I2]]
	; CHECK: %[[I3:.+]] = or i64 %index, 3			; CHECK: %[[I3:.+]] = or i64 %index, 3
	Show All 22 Lines
	;			;
	; Check multiple pointer induction variables where only one is recognized as			; Check multiple pointer induction variables where only one is recognized as
	; uniform and remains uniform after vectorization. The other pointer induction			; uniform and remains uniform after vectorization. The other pointer induction
	; variable is not recognized as uniform and is not uniform after vectorization			; variable is not recognized as uniform and is not uniform after vectorization
	; because it is stored to memory.			; because it is stored to memory.
	;			;
	; CHECK-NOT: LV: Found uniform instruction: %p = phi i32* [ %tmp3, %for.body ], [ %a, %entry ]			; CHECK-NOT: LV: Found uniform instruction: %p = phi i32* [ %tmp3, %for.body ], [ %a, %entry ]
	; CHECK: LV: Found uniform instruction: %q = phi i32** [ %tmp4, %for.body ], [ %b, %entry ]			; CHECK: LV: Found uniform instruction: %q = phi i32** [ %tmp4, %for.body ], [ %b, %entry ]
				; CHECK-LABEL: @pointer_iv_mixed(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %next.gep = getelementptr i32, i32* %a, i64 %index			; CHECK: %next.gep = getelementptr i32, i32* %a, i64 %index
	; CHECK: %[[I1:.+]] = or i64 %index, 1			; CHECK: %[[I1:.+]] = or i64 %index, 1
	; CHECK: %next.gep10 = getelementptr i32, i32* %a, i64 %[[I1]]			; CHECK: %next.gep10 = getelementptr i32, i32* %a, i64 %[[I1]]
	; CHECK: %[[I2:.+]] = or i64 %index, 2			; CHECK: %[[I2:.+]] = or i64 %index, 2
	; CHECK: %next.gep11 = getelementptr i32, i32* %a, i64 %[[I2]]			; CHECK: %next.gep11 = getelementptr i32, i32* %a, i64 %[[I2]]
	; CHECK: %[[I3:.+]] = or i64 %index, 3			; CHECK: %[[I3:.+]] = or i64 %index, 3
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	;			;
	; INTER: LV: Found uniform instruction: %cond = icmp slt i64 %i.next, %n			; INTER: LV: Found uniform instruction: %cond = icmp slt i64 %i.next, %n
	; INTER-NEXT: LV: Found uniform instruction: %tmp2 = getelementptr inbounds i8, i8* %tmp1, i64 3			; INTER-NEXT: LV: Found uniform instruction: %tmp2 = getelementptr inbounds i8, i8* %tmp1, i64 3
	; INTER-NEXT: LV: Found uniform instruction: %tmp6 = getelementptr inbounds i8, i8* %B, i64 %i			; INTER-NEXT: LV: Found uniform instruction: %tmp6 = getelementptr inbounds i8, i8* %B, i64 %i
	; INTER-NEXT: LV: Found uniform instruction: %tmp1 = bitcast i64* %tmp0 to i8*			; INTER-NEXT: LV: Found uniform instruction: %tmp1 = bitcast i64* %tmp0 to i8*
	; INTER-NEXT: LV: Found uniform instruction: %tmp0 = getelementptr inbounds i64, i64* %A, i64 %i			; INTER-NEXT: LV: Found uniform instruction: %tmp0 = getelementptr inbounds i64, i64* %A, i64 %i
	; INTER-NEXT: LV: Found uniform instruction: %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			; INTER-NEXT: LV: Found uniform instruction: %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	; INTER-NEXT: LV: Found uniform instruction: %i.next = add nuw nsw i64 %i, 1			; INTER-NEXT: LV: Found uniform instruction: %i.next = add nuw nsw i64 %i, 1
				; INTER-LABEL: @bitcast_pointer_operand(
	; INTER: vector.body:			; INTER: vector.body:
	; INTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]			; INTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; INTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, i64 %A, i64 [[INDEX]]			; INTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, i64 %A, i64 [[INDEX]]
	; INTER-NEXT: [[TMP5:%.]] = bitcast i64 [[TMP4]] to <32 x i8>*			; INTER-NEXT: [[TMP5:%.]] = bitcast i64 [[TMP4]] to <32 x i8>*
	; INTER-NEXT: [[WIDE_VEC:%.]] = load <32 x i8>, <32 x i8> [[TMP5]], align 1			; INTER-NEXT: [[WIDE_VEC:%.]] = load <32 x i8>, <32 x i8> [[TMP5]], align 1
	; INTER-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <32 x i8> [[WIDE_VEC]], <32 x i8> undef, <4 x i32> <i32 0, i32 8, i32 16, i32 24>			; INTER-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <32 x i8> [[WIDE_VEC]], <32 x i8> undef, <4 x i32> <i32 0, i32 8, i32 16, i32 24>
	; INTER-NEXT: [[STRIDED_VEC5:%.*]] = shufflevector <32 x i8> [[WIDE_VEC]], <32 x i8> undef, <4 x i32> <i32 3, i32 11, i32 19, i32 27>			; INTER-NEXT: [[STRIDED_VEC5:%.*]] = shufflevector <32 x i8> [[WIDE_VEC]], <32 x i8> undef, <4 x i32> <i32 3, i32 11, i32 19, i32 27>
	; INTER-NEXT: [[TMP6:%.*]] = xor <4 x i8> [[STRIDED_VEC5]], [[STRIDED_VEC]]			; INTER-NEXT: [[TMP6:%.*]] = xor <4 x i8> [[STRIDED_VEC5]], [[STRIDED_VEC]]
	Show All 27 Lines

llvm/test/Transforms/LoopVectorize/loop-scalars.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s		; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s

target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

; CHECK-LABEL: vector_gep		; CHECK-LABEL: vector_gep
; CHECK-NOT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i32, i32* %b, i64 %i		; CHECK-NOT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i32, i32* %b, i64 %i
		; CHECK-LABEL: @vector_gep(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]		; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 %b, <2 x i64> [[VEC_IND]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 %b, <2 x i64> [[VEC_IND]]
; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32** %a, i64 [[INDEX]]		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32** %a, i64 [[INDEX]]
; CHECK-NEXT: [[TMP3:%.]] = bitcast i32* [[TMP2]] to <2 x i32>		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32* [[TMP2]] to <2 x i32>
; CHECK-NEXT: store <2 x i32> [[TMP1]], <2 x i32>* [[TMP3]], align 8		; CHECK-NEXT: store <2 x i32> [[TMP1]], <2 x i32>* [[TMP3]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
Show All 17 Lines	for.end:
ret void		ret void
}		}

; CHECK-LABEL: scalar_store		; CHECK-LABEL: scalar_store
; CHECK: LV: Found scalar instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i		; CHECK: LV: Found scalar instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i
; CHECK-NEXT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i32, i32* %b, i64 %i		; CHECK-NEXT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i32, i32* %b, i64 %i
; CHECK-NEXT: LV: Found scalar instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]		; CHECK-NEXT: LV: Found scalar instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
; CHECK-NEXT: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2		; CHECK-NEXT: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
		; CHECK-LABEL: @scalar_store(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1		; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 2		; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 2
; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 %b, i64 [[OFFSET_IDX]]		; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 %b, i64 [[OFFSET_IDX]]
; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 %b, i64 [[TMP4]]		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 %b, i64 [[TMP4]]
; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32** %a, i64 [[OFFSET_IDX]]		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32** %a, i64 [[OFFSET_IDX]]
; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32** %a, i64 [[TMP4]]		; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32** %a, i64 [[TMP4]]
Show All 21 Lines

; CHECK-LABEL: expansion		; CHECK-LABEL: expansion
; CHECK: LV: Found scalar instruction: %tmp3 = getelementptr inbounds i32, i32* %tmp2, i64 %i		; CHECK: LV: Found scalar instruction: %tmp3 = getelementptr inbounds i32, i32* %tmp2, i64 %i
; CHECK-NEXT: LV: Found scalar instruction: %tmp1 = bitcast i64* %tmp0 to i32*		; CHECK-NEXT: LV: Found scalar instruction: %tmp1 = bitcast i64* %tmp0 to i32*
; CHECK-NEXT: LV: Found scalar instruction: %tmp2 = getelementptr inbounds i32, i32* %a, i64 0		; CHECK-NEXT: LV: Found scalar instruction: %tmp2 = getelementptr inbounds i32, i32* %a, i64 0
; CHECK-NEXT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i64, i64* %b, i64 %i		; CHECK-NEXT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i64, i64* %b, i64 %i
; CHECK-NEXT: LV: Found scalar instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]		; CHECK-NEXT: LV: Found scalar instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
; CHECK-NEXT: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2		; CHECK-NEXT: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
		; CHECK-LABEL: @expansion(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1		; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 2		; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 2
; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i64, i64 %b, i64 [[OFFSET_IDX]]		; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i64, i64 %b, i64 [[OFFSET_IDX]]
; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, i64 %b, i64 [[TMP4]]		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, i64 %b, i64 [[TMP4]]
; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32** %a, i64 [[OFFSET_IDX]]		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32** %a, i64 [[OFFSET_IDX]]
; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32** %a, i64 [[TMP4]]		; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32** %a, i64 [[TMP4]]
Show All 22 Lines
for.end:		for.end:
ret void		ret void
}		}

; CHECK-LABEL: no_gep_or_bitcast		; CHECK-LABEL: no_gep_or_bitcast
; CHECK-NOT: LV: Found scalar instruction: %tmp1 = load i32, i32* %tmp0, align 8		; CHECK-NOT: LV: Found scalar instruction: %tmp1 = load i32, i32* %tmp0, align 8
; CHECK: LV: Found scalar instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]		; CHECK: LV: Found scalar instruction: %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
; CHECK-NEXT: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 1		; CHECK-NEXT: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 1
		; CHECK-LABEL: @no_gep_or_bitcast(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32** %a, i64 [[INDEX]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32** %a, i64 [[INDEX]]
; CHECK-NEXT: [[TMP2:%.]] = bitcast i32* [[TMP1]] to <2 x i32>		; CHECK-NEXT: [[TMP2:%.]] = bitcast i32* [[TMP1]] to <2 x i32>
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 8		; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 8
; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 0		; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 0
; CHECK-NEXT: store i32 0, i32* [[TMP3]], align 8		; CHECK-NEXT: store i32 0, i32* [[TMP3]], align 8
; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 1		; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 1
Show All 20 Lines

llvm/test/Transforms/LoopVectorize/phi-cost.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s		; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -instcombine -cost-using-vplan=false -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-CM
		; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -instcombine -cost-using-vplan=true -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-VP

target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

; CHECK-LABEL: phi_two_incoming_values		; CHECK-LABEL: phi_two_incoming_values
; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %i = phi i64 [ %i.next, %if.end ], [ 0, %entry ]		; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %i = phi i64 [ %i.next, %if.end ], [ 0, %entry ]
; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %tmp5 = phi i32 [ %tmp1, %for.body ], [ %tmp4, %if.then ]		; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %tmp5 = phi i32 [ %tmp1, %for.body ], [ %tmp4, %if.then ]
		; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For recipe: "WIDEN-INDUCTION %i = phi %i.next, 0
		; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For recipe: "BLEND %tmp5 = ir<%tmp1>/vp<%0> ir<%tmp4>/ir<%tmp3>
		; CHECK-LABEL: @phi_two_incoming_values(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
; CHECK: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> {{.*}}		; CHECK: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> {{.*}}
; CHECK: [[TMP5:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer		; CHECK: [[TMP5:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer
; CHECK-NEXT: [[TMP6:%.*]] = zext <2 x i1> [[TMP5]] to <2 x i32>		; CHECK-NEXT: [[TMP6:%.*]] = zext <2 x i1> [[TMP5]] to <2 x i32>
; CHECK-NEXT: [[PREDPHI:%.*]] = add <2 x i32> [[WIDE_LOAD]], [[TMP6]]		; CHECK-NEXT: [[PREDPHI:%.*]] = add <2 x i32> [[WIDE_LOAD]], [[TMP6]]
; CHECK: store <2 x i32> [[PREDPHI]], <2 x i32>* {{.*}}		; CHECK: store <2 x i32> [[PREDPHI]], <2 x i32>* {{.*}}
; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
Show All 21 Lines	if.end:
%cond = icmp eq i64 %i, %n		%cond = icmp eq i64 %i, %n
br i1 %cond, label %for.end, label %for.body		br i1 %cond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

; CHECK-LABEL: phi_three_incoming_values		; CHECK-LABEL: phi_three_incoming_values
; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %i = phi i64 [ %i.next, %if.end ], [ 0, %entry ]		; CHECK-CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %i = phi i64 [ %i.next, %if.end ], [ 0, %entry ]
; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %tmp8 = phi i32 [ 9, %for.body ], [ 3, %if.then ], [ %tmp7, %if.else ]		; CHECK-CM: LV: Found an estimated cost of 2 for VF 2 For instruction: %tmp8 = phi i32 [ 9, %for.body ], [ 3, %if.then ], [ %tmp7, %if.else ]
		; CHECK-VP: LV: Found an estimated cost of 1 for VF 2 For recipe: "WIDEN-INDUCTION %i = phi %i.next, 0
		; CHECK-VP: LV: Found an estimated cost of 2 for VF 2 For recipe: "BLEND %tmp8 = ir<9>/vp<%0> ir<3>/vp<%1> ir<%tmp7>/vp<%3>
		; CHECK-LABEL: @phi_three_incoming_values(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
; CHECK: [[PREDPHI:%.]] = select <2 x i1> {{.}}, <2 x i32> <i32 3, i32 3>, <2 x i32> <i32 9, i32 9>		; CHECK: [[PREDPHI:%.]] = select <2 x i1> {{.}}, <2 x i32> <i32 3, i32 3>, <2 x i32> <i32 9, i32 9>
; CHECK: [[PREDPHI7:%.]] = select <2 x i1> {{.}}, <2 x i32> {{.*}}, <2 x i32> [[PREDPHI]]		; CHECK: [[PREDPHI7:%.]] = select <2 x i1> {{.}}, <2 x i32> {{.*}}, <2 x i32> [[PREDPHI]]
; CHECK: store <2 x i32> [[PREDPHI7]], <2 x i32>* {{.*}}		; CHECK: store <2 x i32> [[PREDPHI7]], <2 x i32>* {{.*}}
; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
;		;
define void @phi_three_incoming_values(i32* %a, i32* %b, i64 %n) {		define void @phi_three_incoming_values(i32* %a, i32* %b, i64 %n) {
Show All 31 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Initial VPlan cost modellingAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 298832

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/test/Analysis/CostModel/X86/interleave-load-i32.ll

llvm/test/Analysis/CostModel/X86/interleave-store-i32.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-float.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-store-double.ll

llvm/test/Analysis/CostModel/X86/interleaved-load-store-i64.ll

llvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll

llvm/test/Analysis/CostModel/X86/strided-load-i16.ll

llvm/test/Analysis/CostModel/X86/strided-load-i32.ll

llvm/test/Analysis/CostModel/X86/strided-load-i64.ll

llvm/test/Analysis/CostModel/X86/strided-load-i8.ll

llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll

llvm/test/Transforms/LoopVectorize/AArch64/costmodel.ll

llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll

llvm/test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll

llvm/test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll

llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll

llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll

llvm/test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-interleaved-cost.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-shiftcost.ll

llvm/test/Transforms/LoopVectorize/SystemZ/branch-for-predicated-block.ll

llvm/test/Transforms/LoopVectorize/SystemZ/load-scalarization-cost-0.ll

llvm/test/Transforms/LoopVectorize/SystemZ/load-scalarization-cost-1.ll

llvm/test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

llvm/test/Transforms/LoopVectorize/SystemZ/mem-interleaving-costs-02.ll

llvm/test/Transforms/LoopVectorize/SystemZ/mem-interleaving-costs.ll

llvm/test/Transforms/LoopVectorize/X86/fneg-cost.ll

llvm/test/Transforms/LoopVectorize/X86/fp_to_sint8-cost-model.ll

llvm/test/Transforms/LoopVectorize/X86/mul_slm_16bit.ll

llvm/test/Transforms/LoopVectorize/X86/reduction-small-size.ll

llvm/test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll

llvm/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

llvm/test/Transforms/LoopVectorize/X86/uniformshift.ll

llvm/test/Transforms/LoopVectorize/X86/vector-scalar-select-cost.ll

llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

llvm/test/Transforms/LoopVectorize/loop-scalars.ll

llvm/test/Transforms/LoopVectorize/phi-cost.ll

[LV] Initial VPlan cost modelling
AbandonedPublic