This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/5
LoopVectorizationPlanner.h
1/2
LoopVectorize.cpp
-
VPlan.h
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
-
avx512.ll
-
intrinsiccost.ll

Differential D100121

[LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
ClosedPublic

Authored by sdesmalen on Apr 8 2021, 9:05 AM.

Download Raw Diff

Details

Reviewers

bmahjour
ctetreau
david-arm
dmgreen

Commits

rG86729538bdbd: [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.

Summary

Rather than maintaining two separate values, a float for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from float to unsigned to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sdesmalen created this revision.Apr 8 2021, 9:05 AM

Herald added subscribers: bmahjour, hiraditya. · View Herald TranscriptApr 8 2021, 9:05 AM

sdesmalen requested review of this revision.Apr 8 2021, 9:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2021, 9:05 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sdesmalen added reviewers: bmahjour, ctetreau, david-arm.Apr 8 2021, 9:06 AM

sdesmalen added a reviewer: dmgreen.

The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs is no longer the total cost, but rather the cost per lane which is also what is used to determine the VF for the vector body loop.

I'm a bit confused by this...before we saved the per-lane cost in ProfitableVFs and compared that with the Result variable in selectEpilogueVectorizationFactor. The Result variable gets initialized as Disabled and then gets refined as we go through the ProfitableVFs, getting assigned a value from ProfitableVFs. That means that the Result variable could only get assigned per-lane costs. Since we are now comparing per-lane costs as well, I don't see this change being non-NFC.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1607	I think this should become a member of the `VectorizationFactor` struct....then code like `A.isMoreProfitableThan(B)` reads better.

Harbormaster completed remote builds in B97759: Diff 336138.Apr 8 2021, 10:25 AM

You're right, that comment wasn't correct. I wrote the code last week and incorrectly remembered the change when writing up the comment today.

The issue has to do with the truncation. It stores the floating-point cost in the ProfitableVFs list as unsigned.

For llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll, it then compares the costs as follows:

1 < 2 ? true
0 < 1 ? true
0 < 0 ? false

Where it now compares the costs properly as:

5/4 < 5/2 ? true
5/8 < 5/4 ? true
5/16 < 5/8 ? true

In D100121#2677358, @sdesmalen wrote:
You're right, that comment wasn't correct. I wrote the code last week and incorrectly remembered the change when writing up the comment today.

The issue has to do with the truncation. It stores the floating-point cost in the ProfitableVFs list as unsigned.

For llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll, it then compares the costs as follows:
1 < 2 ? true
0 < 1 ? true
0 < 0 ? false
Where it now compares the costs properly as:
5/4 < 5/2 ? true
5/8 < 5/4 ? true
5/16 < 5/8 ? true

I see. That makes sense.

sdesmalen added inline comments.Apr 8 2021, 1:04 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1607	I'm happy to implement your suggestion, although I think in the future a `LoopVectorizationCostModel::isMoreProfitable` method will be needed anyway, so that the method can query the LoopHints to understand how to interpret the costs of a scalable VF, and possibly favour it over fixed-width VFs of similar cost. But perhaps that's just something to worry about in a future patch?

Moved LoopVectorizationCostModel::isMoreProfitable -> VectorizationFactor::isMoreProfitableThan.

Harbormaster completed remote builds in B97807: Diff 336209.Apr 8 2021, 2:09 PM

dmgreen added inline comments.Apr 12 2021, 6:28 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
203	rhs -> RHS If the backend knew that the SVE vector length was 256, as opposed 128, how would it best communicate that information to here?

sdesmalen added inline comments.Apr 12 2021, 8:15 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
203	Doing it here without context would be a bit difficult, as it needs information from CostModel/TargetTransformInfo in order to know which one to favour. Hence my reason to initially create this method in the LV CostModel (see the previous revision https://reviews.llvm.org/D100121?id=336138), so it would mean either passing that information separately, or moving it to the other class again.

• maria.bonita.huetamo added a subscriber: • maria.bonita.huetamo.Apr 15 2021, 12:32 AM

dmgreen added inline comments.Apr 15 2021, 7:53 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
203	Yeah, I think that makes sense. If we have a good reason to put it into the cost model (and @bmahjour doesn't object), then keeping it in the cost model so it could have access to TTI etc would make sense to me.

bmahjour added inline comments.Apr 15 2021, 7:59 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
203	I don't have any objections to moving it back, if it makes information more accessible for future extensions.

Changed back to having isMoreProfitable in LV CostModel.

Thanks. Seems like a useful step forward. LGTM.

This revision is now accepted and ready to land.Apr 16 2021, 1:07 AM

Harbormaster completed remote builds in B99100: Diff 338013.Apr 16 2021, 2:04 AM

Closed by commit rG86729538bdbd: [LV] Let selectVectorizationFactor reason directly on VectorizationFactor. (authored by sdesmalen). · Explain WhyApr 20 2021, 1:55 AM

This revision was automatically updated to reflect the committed changes.

sdesmalen added a commit: rG86729538bdbd: [LV] Let selectVectorizationFactor reason directly on VectorizationFactor..

Herald added subscribers: vkmr, rogfer01. · View Herald TranscriptApr 20 2021, 1:55 AM

Thanks for the review @bmahjour and @dmgreen!

rogfer01 added inline comments.Apr 23 2021, 6:25 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
191	Hi Sander, a minor question here: for scalable vectorization do you plan to change the definition of `Disabled`? Perhaps this may not be needed? We were considering something like `return {ElementCount:getNull(), 0};` so this `VectorizationFactor` value is effectively not a valid vectorization factor at all. But maybe this is not the intent of `Disabled`?

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorizationPlanner.h

5 lines

LoopVectorize.cpp

64 lines

VPlan.h

1 line

test/

Transforms/

LoopVectorize/

X86/

avx512.ll

6 lines

intrinsiccost.ll

22 lines

Diff 338773

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

	Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
	/// VectorizerParams::VectorizationFactor and VectorizationCostTy.			/// VectorizerParams::VectorizationFactor and VectorizationCostTy.
	/// We need to streamline them.			/// We need to streamline them.

	/// Information about vectorization costs			/// Information about vectorization costs
	struct VectorizationFactor {			struct VectorizationFactor {
	// Vector width with best cost			// Vector width with best cost
	ElementCount Width;			ElementCount Width;
	// Cost of the loop with that width			// Cost of the loop with that width
	unsigned Cost;			InstructionCost Cost;

				VectorizationFactor(ElementCount Width, InstructionCost Cost)
				: Width(Width), Cost(Cost) {}

	// Width 1 means no vectorization, cost 0 means uncomputed cost.			// Width 1 means no vectorization, cost 0 means uncomputed cost.
	static VectorizationFactor Disabled() {			static VectorizationFactor Disabled() {
	return {ElementCount::getFixed(1), 0};			return {ElementCount::getFixed(1), 0};
				rogfer01Unsubmitted Not Done Reply Inline Actions Hi Sander, a minor question here: for scalable vectorization do you plan to change the definition of `Disabled`? Perhaps this may not be needed? We were considering something like `return {ElementCount:getNull(), 0};` so this `VectorizationFactor` value is effectively not a valid vectorization factor at all. But maybe this is not the intent of `Disabled`? rogfer01: Hi Sander, a minor question here: for scalable vectorization do you plan to change the…
	}			}

	bool operator==(const VectorizationFactor &rhs) const {			bool operator==(const VectorizationFactor &rhs) const {
	return Width == rhs.Width && Cost == rhs.Cost;			return Width == rhs.Width && Cost == rhs.Cost;
	}			}

	bool operator!=(const VectorizationFactor &rhs) const {			bool operator!=(const VectorizationFactor &rhs) const {
	return !(*this == rhs);			return !(*this == rhs);
	}			}
	};			};

	/// Planner drives the vectorization process after having passed			/// Planner drives the vectorization process after having passed
				dmgreenUnsubmitted Not Done Reply Inline Actions rhs -> RHS If the backend knew that the SVE vector length was 256, as opposed 128, how would it best communicate that information to here? dmgreen: rhs -> RHS If the backend knew that the SVE vector length was 256, as opposed 128, how would…
				sdesmalenAuthorUnsubmitted Done Reply Inline Actions Doing it here without context would be a bit difficult, as it needs information from CostModel/TargetTransformInfo in order to know which one to favour. Hence my reason to initially create this method in the LV CostModel (see the previous revision https://reviews.llvm.org/D100121?id=336138), so it would mean either passing that information separately, or moving it to the other class again. sdesmalen: Doing it here without context would be a bit difficult, as it needs information from…
				dmgreenUnsubmitted Not Done Reply Inline Actions Yeah, I think that makes sense. If we have a good reason to put it into the cost model (and @bmahjour doesn't object), then keeping it in the cost model so it could have access to TTI etc would make sense to me. dmgreen: Yeah, I think that makes sense. If we have a good reason to put it into the cost model (and…
				bmahjourUnsubmitted Not Done Reply Inline Actions I don't have any objections to moving it back, if it makes information more accessible for future extensions. bmahjour: I don't have any objections to moving it back, if it makes information more accessible for…
	/// Legality checks.			/// Legality checks.
	class LoopVectorizationPlanner {			class LoopVectorizationPlanner {
	/// The loop that we evaluate.			/// The loop that we evaluate.
	Loop *OrigLoop;			Loop *OrigLoop;

	/// Loop Info analysis.			/// Loop Info analysis.
	LoopInfo *LI;			LoopInfo *LI;

	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,596 Lines • ▼ Show 20 Lines	public:
/// Estimate cost of a call instruction CI if it were vectorized with factor		/// Estimate cost of a call instruction CI if it were vectorized with factor
/// VF. Return the cost of the instruction, including scalarization overhead		/// VF. Return the cost of the instruction, including scalarization overhead
/// if it's needed. The flag NeedToScalarize shows if the call needs to be		/// if it's needed. The flag NeedToScalarize shows if the call needs to be
/// scalarized -		/// scalarized -
/// i.e. either vector version isn't available, or is too expensive.		/// i.e. either vector version isn't available, or is too expensive.
InstructionCost getVectorCallCost(CallInst *CI, ElementCount VF,		InstructionCost getVectorCallCost(CallInst *CI, ElementCount VF,
bool &NeedToScalarize) const;		bool &NeedToScalarize) const;

		/// Returns true if the per-lane cost of VectorizationFactor A is lower than
		/// that of B.
		bool isMoreProfitable(const VectorizationFactor &A,
		bmahjourUnsubmitted Not Done Reply Inline Actions I think this should become a member of the `VectorizationFactor` struct....then code like `A.isMoreProfitableThan(B)` reads better. bmahjour: I think this should become a member of the `VectorizationFactor` struct....then code like `A.
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions I'm happy to implement your suggestion, although I think in the future a `LoopVectorizationCostModel::isMoreProfitable` method will be needed anyway, so that the method can query the LoopHints to understand how to interpret the costs of a scalable VF, and possibly favour it over fixed-width VFs of similar cost. But perhaps that's just something to worry about in a future patch? sdesmalen: I'm happy to implement your suggestion, although I think in the future a…
		const VectorizationFactor &B) const;

/// Invalidates decisions already taken by the cost model.		/// Invalidates decisions already taken by the cost model.
void invalidateCostModelingDecisions() {		void invalidateCostModelingDecisions() {
WideningDecisions.clear();		WideningDecisions.clear();
Uniforms.clear();		Uniforms.clear();
Scalars.clear();		Scalars.clear();
}		}

private:		private:
▲ Show 20 Lines • Show All 4,258 Lines • ▼ Show 20 Lines	if (ElementCount MinVF =
<< ") with target's minimum: " << MinVF << '\n');		<< ") with target's minimum: " << MinVF << '\n');
MaxVF = MinVF;		MaxVF = MinVF;
}		}
}		}
}		}
return MaxVF;		return MaxVF;
}		}

		bool LoopVectorizationCostModel::isMoreProfitable(
		const VectorizationFactor &A, const VectorizationFactor &B) const {
		InstructionCost::CostType CostA = *A.Cost.getValue();
		InstructionCost::CostType CostB = *B.Cost.getValue();

		// To avoid the need for FP division:
		// (CostA / A.Width) < (CostB / B.Width)
		// <=> (CostA * B.Width) < (CostB * A.Width)
		return (CostA * B.Width.getKnownMinValue()) <
		(CostB * A.Width.getKnownMinValue());
		}

VectorizationFactor		VectorizationFactor
LoopVectorizationCostModel::selectVectorizationFactor(ElementCount MaxVF) {		LoopVectorizationCostModel::selectVectorizationFactor(ElementCount MaxVF) {
// FIXME: This can be fixed for scalable vectors later, because at this stage		// FIXME: This can be fixed for scalable vectors later, because at this stage
// the LoopVectorizer will only consider vectorizing a loop with scalable		// the LoopVectorizer will only consider vectorizing a loop with scalable
// vectors when the loop has a hint to enable vectorization for a given VF.		// vectors when the loop has a hint to enable vectorization for a given VF.
assert(!MaxVF.isScalable() && "scalable vectors not yet supported");		assert(!MaxVF.isScalable() && "scalable vectors not yet supported");

InstructionCost ExpectedCost = expectedCost(ElementCount::getFixed(1)).first;		InstructionCost ExpectedCost = expectedCost(ElementCount::getFixed(1)).first;
LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << ExpectedCost << ".\n");		LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << ExpectedCost << ".\n");
assert(ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop");		assert(ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop");

auto Width = ElementCount::getFixed(1);		const VectorizationFactor ScalarCost(ElementCount::getFixed(1), ExpectedCost);
const float ScalarCost = *ExpectedCost.getValue();		VectorizationFactor ChosenFactor = ScalarCost;
float Cost = ScalarCost;

bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;		bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;
if (ForceVectorization && MaxVF.isVector()) {		if (ForceVectorization && MaxVF.isVector()) {
// Ignore scalar width, because the user explicitly wants vectorization.		// Ignore scalar width, because the user explicitly wants vectorization.
// Initialize cost to max so that VF = 2 is, at least, chosen during cost		// Initialize cost to max so that VF = 2 is, at least, chosen during cost
// evaluation.		// evaluation.
Cost = std::numeric_limits<float>::max();		ChosenFactor.Cost = std::numeric_limits<InstructionCost::CostType>::max();
}		}

for (auto i = ElementCount::getFixed(2); ElementCount::isKnownLE(i, MaxVF);		for (auto i = ElementCount::getFixed(2); ElementCount::isKnownLE(i, MaxVF);
i *= 2) {		i *= 2) {
// Notice that the vector loop needs to be executed less times, so		// Notice that the vector loop needs to be executed less times, so
// we need to divide the cost of the vector loops by the width of		// we need to divide the cost of the vector loops by the width of
// the vector elements.		// the vector elements.
VectorizationCostTy C = expectedCost(i);		VectorizationCostTy C = expectedCost(i);

assert(C.first.isValid() && "Unexpected invalid cost for vector loop");		assert(C.first.isValid() && "Unexpected invalid cost for vector loop");
float VectorCost = *C.first.getValue() / (float)i.getFixedValue();		VectorizationFactor Candidate(i, C.first);
LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i		LLVM_DEBUG(
<< " costs: " << (int)VectorCost << ".\n");		dbgs() << "LV: Vector loop of width " << i << " costs: "
		<< (*Candidate.Cost.getValue() / Candidate.Width.getFixedValue())
		<< ".\n");

if (!C.second && !ForceVectorization) {		if (!C.second && !ForceVectorization) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not considering vector loop of width " << i		dbgs() << "LV: Not considering vector loop of width " << i
<< " because it will not generate any vector instructions.\n");		<< " because it will not generate any vector instructions.\n");
continue;		continue;
}		}

// If profitable add it to ProfitableVF list.		// If profitable add it to ProfitableVF list.
if (VectorCost < ScalarCost) {		if (isMoreProfitable(Candidate, ScalarCost))
ProfitableVFs.push_back(VectorizationFactor(		ProfitableVFs.push_back(Candidate);
{i, (unsigned)VectorCost}));
}

if (VectorCost < Cost) {		if (isMoreProfitable(Candidate, ChosenFactor))
Cost = VectorCost;		ChosenFactor = Candidate;
Width = i;
}
}		}

if (!EnableCondStoresVectorization && NumPredStores) {		if (!EnableCondStoresVectorization && NumPredStores) {
reportVectorizationFailure("There are conditional stores.",		reportVectorizationFailure("There are conditional stores.",
"store that is conditionally executed prevents vectorization",		"store that is conditionally executed prevents vectorization",
"ConditionalStore", ORE, TheLoop);		"ConditionalStore", ORE, TheLoop);
Width = ElementCount::getFixed(1);		ChosenFactor = ScalarCost;
Cost = ScalarCost;
}		}

LLVM_DEBUG(if (ForceVectorization && !Width.isScalar() && Cost >= ScalarCost) dbgs()		LLVM_DEBUG(if (ForceVectorization && !ChosenFactor.Width.isScalar() &&
		ChosenFactor.Cost.getValue() >= ScalarCost.Cost.getValue())
		dbgs()
<< "LV: Vectorization seems to be not beneficial, "		<< "LV: Vectorization seems to be not beneficial, "
<< "but was forced by a user.\n");		<< "but was forced by a user.\n");
LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");		LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << ChosenFactor.Width << ".\n");
VectorizationFactor Factor = {Width,		return ChosenFactor;
(unsigned)(Width.getKnownMinValue() * Cost)};
return Factor;
}		}

bool LoopVectorizationCostModel::isCandidateForEpilogueVectorization(		bool LoopVectorizationCostModel::isCandidateForEpilogueVectorization(
const Loop &L, ElementCount VF) const {		const Loop &L, ElementCount VF) const {
// Cross iteration phis such as reductions need special handling and are		// Cross iteration phis such as reductions need special handling and are
// currently unsupported.		// currently unsupported.
if (any_of(L.getHeader()->phis(), [&](PHINode &Phi) {		if (any_of(L.getHeader()->phis(), [&](PHINode &Phi) {
return Legal->isFirstOrderRecurrence(&Phi) \|\|		return Legal->isFirstOrderRecurrence(&Phi) \|\|
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	if (TheLoop->getHeader()->getParent()->hasOptSize() \|\|
return Result;		return Result;
}		}

if (!isEpilogueVectorizationProfitable(MainLoopVF))		if (!isEpilogueVectorizationProfitable(MainLoopVF))
return Result;		return Result;

for (auto &NextVF : ProfitableVFs)		for (auto &NextVF : ProfitableVFs)
if (ElementCount::isKnownLT(NextVF.Width, MainLoopVF) &&		if (ElementCount::isKnownLT(NextVF.Width, MainLoopVF) &&
(Result.Width.getFixedValue() == 1 \|\| NextVF.Cost < Result.Cost) &&		(Result.Width.getFixedValue() == 1 \|\|
		isMoreProfitable(NextVF, Result)) &&
LVP.hasPlanWithVFs({MainLoopVF, NextVF.Width}))		LVP.hasPlanWithVFs({MainLoopVF, NextVF.Width}))
Result = NextVF;		Result = NextVF;

if (Result != VectorizationFactor::Disabled())		if (Result != VectorizationFactor::Disabled())
LLVM_DEBUG(dbgs() << "LEV: Vectorizing epilogue loop with VF = "		LLVM_DEBUG(dbgs() << "LEV: Vectorizing epilogue loop with VF = "
<< Result.Width.getFixedValue() << "\n";);		<< Result.Width.getFixedValue() << "\n";);
return Result;		return Result;
}		}
▲ Show 20 Lines • Show All 3,701 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
Optional<VectorizationFactor> MaybeVF = LVP.plan(UserVF, UserIC);		Optional<VectorizationFactor> MaybeVF = LVP.plan(UserVF, UserIC);

VectorizationFactor VF = VectorizationFactor::Disabled();		VectorizationFactor VF = VectorizationFactor::Disabled();
unsigned IC = 1;		unsigned IC = 1;

if (MaybeVF) {		if (MaybeVF) {
VF = *MaybeVF;		VF = *MaybeVF;
// Select the interleave count.		// Select the interleave count.
IC = CM.selectInterleaveCount(VF.Width, VF.Cost);		IC = CM.selectInterleaveCount(VF.Width, *VF.Cost.getValue());
}		}

// Identify the diagnostic messages that should be produced.		// Identify the diagnostic messages that should be produced.
std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;		std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;
bool VectorizeLoop = true, InterleaveLoop = true;		bool VectorizeLoop = true, InterleaveLoop = true;
if (VF.Width.isScalar()) {		if (VF.Width.isScalar()) {
LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");		LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
VecDiagMsg = std::make_pair(		VecDiagMsg = std::make_pair(
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

	Show All 34 Lines
	#include "llvm/ADT/SmallPtrSet.h"			#include "llvm/ADT/SmallPtrSet.h"
	#include "llvm/ADT/SmallSet.h"			#include "llvm/ADT/SmallSet.h"
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
	#include "llvm/ADT/Twine.h"			#include "llvm/ADT/Twine.h"
	#include "llvm/ADT/ilist.h"			#include "llvm/ADT/ilist.h"
	#include "llvm/ADT/ilist_node.h"			#include "llvm/ADT/ilist_node.h"
	#include "llvm/Analysis/VectorUtils.h"			#include "llvm/Analysis/VectorUtils.h"
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
				#include "llvm/Support/InstructionCost.h"
	#include <algorithm>			#include <algorithm>
	#include <cassert>			#include <cassert>
	#include <cstddef>			#include <cstddef>
	#include <map>			#include <map>
	#include <string>			#include <string>

	namespace llvm {			namespace llvm {

	▲ Show 20 Lines • Show All 2,160 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/avx512.ll

	; RUN: opt -mattr=+avx512f --loop-vectorize -S < %s \| llc -mattr=+avx512f \| FileCheck %s			; RUN: opt -mattr=+avx512f --loop-vectorize -S < %s \| llc -mattr=+avx512f \| FileCheck %s
	; RUN: opt -mattr=+avx512vl,+prefer-256-bit --loop-vectorize -S < %s \| llc -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK-PREFER-AVX256			; RUN: opt -mattr=+avx512vl,+prefer-256-bit --loop-vectorize -S < %s \| llc -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK-PREFER-AVX256

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.9.0"			target triple = "x86_64-apple-macosx10.9.0"

	; Verify that we generate 512-bit wide vectors for a basic integer memset			; Verify that we generate 512-bit wide vectors for a basic integer memset
	; loop.			; loop.

	; CHECK-LABEL: f:			; CHECK-LABEL: f:
	; CHECK: vmovdqu64 %zmm{{.}},			; CHECK: vmovdqu64 %zmm{{.}},
	; CHECK-NOT: %ymm			; CHECK-NOT: %ymm
				; CHECK: epilog
				; CHECK: %ymm

	; Verify that we don't generate 512-bit wide vectors when subtarget feature says not to			; Verify that we don't generate 512-bit wide vectors when subtarget feature says not to

	; CHECK-PREFER-AVX256-LABEL: f:			; CHECK-PREFER-AVX256-LABEL: f:
	; CHECK-PREFER-AVX256: vmovdqu %ymm{{.}},			; CHECK-PREFER-AVX256: vmovdqu %ymm{{.}},
	; CHECK-PREFER-AVX256-NOT: %zmm			; CHECK-PREFER-AVX256-NOT: %zmm

	define void @f(i32* %a, i32 %n) {			define void @f(i32* %a, i32 %n) {
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	}			}

	; Verify that the "prefer-vector-width=512" attribute override the subtarget			; Verify that the "prefer-vector-width=512" attribute override the subtarget
	; vectors			; vectors

	; CHECK-LABEL: h:			; CHECK-LABEL: h:
	; CHECK: vmovdqu64 %zmm{{.}},			; CHECK: vmovdqu64 %zmm{{.}},
	; CHECK-NOT: %ymm			; CHECK-NOT: %ymm
				; CHECK: epilog
				; CHECK: %ymm

	; CHECK-PREFER-AVX256-LABEL: h:			; CHECK-PREFER-AVX256-LABEL: h:
	; CHECK-PREFER-AVX256: vmovdqu64 %zmm{{.}},			; CHECK-PREFER-AVX256: vmovdqu64 %zmm{{.}},
	; CHECK-PREFER-AVX256-NOT: %ymm			; CHECK-PREFER-AVX256-NOT: %ymm
				; CHECK-PREFER-AVX256: epilog
				; CHECK-PREFER-AVX256: %ymm

	define void @h(i32* %a, i32 %n) "prefer-vector-width"="512" {			define void @h(i32* %a, i32 %n) "prefer-vector-width"="512" {
	entry:			entry:
	%cmp4 = icmp sgt i32 %n, 0			%cmp4 = icmp sgt i32 %n, 0
	br i1 %cmp4, label %for.body.preheader, label %for.end			br i1 %cmp4, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	br label %for.body			br label %for.body
	Show All 16 Lines

llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @cttz(			; CHECK-LABEL: @cttz(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP_NOT6:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0			; CHECK-NEXT: [[CMP_NOT6:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0
	; CHECK-NEXT: br i1 [[CMP_NOT6]], label [[WHILE_END:%.]], label [[ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[CMP_NOT6]], label [[WHILE_END:%.]], label [[ITER_CHECK:%.]]
	; CHECK: iter.check:			; CHECK: iter.check:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[BLOCKSIZE]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[BLOCKSIZE]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 7			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 15
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]
	; CHECK: vector.main.loop.iter.check:			; CHECK: vector.main.loop.iter.check:
	; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i32 [[TMP0]], 127			; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i32 [[TMP0]], 127
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK1]], label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK1]], label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934464			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934464
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <32 x i8> poison, i8 [[OFFSET:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <32 x i8> poison, i8 [[OFFSET:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <32 x i8> [[BROADCAST_SPLATINSERT]], <32 x i8> poison, <32 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <32 x i8> [[BROADCAST_SPLATINSERT]], <32 x i8> poison, <32 x i32> zeroinitializer
	Show All 40 Lines
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[VEC_EPILOG_ITER_CHECK:%.*]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[VEC_EPILOG_ITER_CHECK:%.*]]
	; CHECK: vec.epilog.iter.check:			; CHECK: vec.epilog.iter.check:
	; CHECK-NEXT: [[IND_END29:%.]] = getelementptr i8, i8 [[PDST]], i64 [[N_VEC]]			; CHECK-NEXT: [[IND_END29:%.]] = getelementptr i8, i8 [[PDST]], i64 [[N_VEC]]
	; CHECK-NEXT: [[IND_END26:%.]] = getelementptr i8, i8 [[PSRC]], i64 [[N_VEC]]			; CHECK-NEXT: [[IND_END26:%.]] = getelementptr i8, i8 [[PSRC]], i64 [[N_VEC]]
	; CHECK-NEXT: [[CAST_CRD22:%.*]] = trunc i64 [[N_VEC]] to i32			; CHECK-NEXT: [[CAST_CRD22:%.*]] = trunc i64 [[N_VEC]] to i32
	; CHECK-NEXT: [[IND_END23:%.*]] = sub i32 [[BLOCKSIZE]], [[CAST_CRD22]]			; CHECK-NEXT: [[IND_END23:%.*]] = sub i32 [[BLOCKSIZE]], [[CAST_CRD22]]
	; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = and i64 [[TMP2]], 120			; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = and i64 [[TMP2]], 112
	; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp eq i64 [[N_VEC_REMAINING]], 0			; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp eq i64 [[N_VEC_REMAINING]], 0
	; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; CHECK: vec.epilog.ph:			; CHECK: vec.epilog.ph:
	; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[BLOCKSIZE]], -1			; CHECK-NEXT: [[TMP22:%.*]] = add i32 [[BLOCKSIZE]], -1
	; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP22]] to i64			; CHECK-NEXT: [[TMP23:%.*]] = zext i32 [[TMP22]] to i64
	; CHECK-NEXT: [[TMP24:%.*]] = add nuw nsw i64 [[TMP23]], 1			; CHECK-NEXT: [[TMP24:%.*]] = add nuw nsw i64 [[TMP23]], 1
	; CHECK-NEXT: [[N_VEC19:%.*]] = and i64 [[TMP24]], 8589934584			; CHECK-NEXT: [[N_VEC19:%.*]] = and i64 [[TMP24]], 8589934576
	; CHECK-NEXT: [[CAST_CRD:%.*]] = trunc i64 [[N_VEC19]] to i32			; CHECK-NEXT: [[CAST_CRD:%.*]] = trunc i64 [[N_VEC19]] to i32
	; CHECK-NEXT: [[IND_END:%.*]] = sub i32 [[BLOCKSIZE]], [[CAST_CRD]]			; CHECK-NEXT: [[IND_END:%.*]] = sub i32 [[BLOCKSIZE]], [[CAST_CRD]]
	; CHECK-NEXT: [[IND_END25:%.]] = getelementptr i8, i8 [[PSRC]], i64 [[N_VEC19]]			; CHECK-NEXT: [[IND_END25:%.]] = getelementptr i8, i8 [[PSRC]], i64 [[N_VEC19]]
	; CHECK-NEXT: [[IND_END28:%.]] = getelementptr i8, i8 [[PDST]], i64 [[N_VEC19]]			; CHECK-NEXT: [[IND_END28:%.]] = getelementptr i8, i8 [[PDST]], i64 [[N_VEC19]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT35:%.*]] = insertelement <8 x i8> poison, i8 [[OFFSET]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT35:%.*]] = insertelement <16 x i8> poison, i8 [[OFFSET]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT36:%.*]] = shufflevector <8 x i8> [[BROADCAST_SPLATINSERT35]], <8 x i8> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT36:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT35]], <16 x i8> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; CHECK: vec.epilog.vector.body:			; CHECK: vec.epilog.vector.body:
	; CHECK-NEXT: [[INDEX20:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT21:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX20:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT21:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; CHECK-NEXT: [[NEXT_GEP32:%.]] = getelementptr i8, i8 [[PSRC]], i64 [[INDEX20]]			; CHECK-NEXT: [[NEXT_GEP32:%.]] = getelementptr i8, i8 [[PSRC]], i64 [[INDEX20]]
	; CHECK-NEXT: [[NEXT_GEP33:%.]] = getelementptr i8, i8 [[PDST]], i64 [[INDEX20]]			; CHECK-NEXT: [[NEXT_GEP33:%.]] = getelementptr i8, i8 [[PDST]], i64 [[INDEX20]]
	; CHECK-NEXT: [[TMP25:%.]] = bitcast i8 [[NEXT_GEP32]] to <8 x i8>*			; CHECK-NEXT: [[TMP25:%.]] = bitcast i8 [[NEXT_GEP32]] to <16 x i8>*
	; CHECK-NEXT: [[WIDE_LOAD34:%.]] = load <8 x i8>, <8 x i8> [[TMP25]], align 2			; CHECK-NEXT: [[WIDE_LOAD34:%.]] = load <16 x i8>, <16 x i8> [[TMP25]], align 2
	; CHECK-NEXT: [[TMP26:%.*]] = call <8 x i8> @llvm.fshl.v8i8(<8 x i8> [[WIDE_LOAD34]], <8 x i8> [[WIDE_LOAD34]], <8 x i8> [[BROADCAST_SPLAT36]])			; CHECK-NEXT: [[TMP26:%.*]] = call <16 x i8> @llvm.fshl.v16i8(<16 x i8> [[WIDE_LOAD34]], <16 x i8> [[WIDE_LOAD34]], <16 x i8> [[BROADCAST_SPLAT36]])
	; CHECK-NEXT: [[TMP27:%.]] = bitcast i8 [[NEXT_GEP33]] to <8 x i8>*			; CHECK-NEXT: [[TMP27:%.]] = bitcast i8 [[NEXT_GEP33]] to <16 x i8>*
	; CHECK-NEXT: store <8 x i8> [[TMP26]], <8 x i8>* [[TMP27]], align 2			; CHECK-NEXT: store <16 x i8> [[TMP26]], <16 x i8>* [[TMP27]], align 2
	; CHECK-NEXT: [[INDEX_NEXT21]] = add i64 [[INDEX20]], 8			; CHECK-NEXT: [[INDEX_NEXT21]] = add i64 [[INDEX20]], 16
	; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT21]], [[N_VEC19]]			; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT21]], [[N_VEC19]]
	; CHECK-NEXT: br i1 [[TMP28]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.]], label [[VEC_EPILOG_VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP28]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.]], label [[VEC_EPILOG_VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
	; CHECK: vec.epilog.middle.block:			; CHECK: vec.epilog.middle.block:
	; CHECK-NEXT: [[CMP_N30:%.*]] = icmp eq i64 [[TMP24]], [[N_VEC19]]			; CHECK-NEXT: [[CMP_N30:%.*]] = icmp eq i64 [[TMP24]], [[N_VEC19]]
	; CHECK-NEXT: br i1 [[CMP_N30]], label [[WHILE_END]], label [[VEC_EPILOG_SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N30]], label [[WHILE_END]], label [[VEC_EPILOG_SCALAR_PH]]
	; CHECK: vec.epilog.scalar.ph:			; CHECK: vec.epilog.scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[IND_END23]], [[VEC_EPILOG_ITER_CHECK]] ], [ [[BLOCKSIZE]], [[ITER_CHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[IND_END23]], [[VEC_EPILOG_ITER_CHECK]] ], [ [[BLOCKSIZE]], [[ITER_CHECK]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL24:%.]] = phi i8 [ [[IND_END25]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[IND_END26]], [[VEC_EPILOG_ITER_CHECK]] ], [ [[PSRC]], [[ITER_CHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL24:%.]] = phi i8 [ [[IND_END25]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[IND_END26]], [[VEC_EPILOG_ITER_CHECK]] ], [ [[PSRC]], [[ITER_CHECK]] ]
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines