This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
21/25
LoopVectorize.cpp
2/2
VPlan.h
2/2
VPlanRecipes.cpp
-
VPlanTransforms.h
1
VPlanTransforms.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
widen-call-with-intrinsic-or-libfunc.ll
-
vplan-dot-printing.ll
-
unittests/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
VPlanHCFGTest.cpp
2
VPlanTest.cpp

Differential D132585

[VPlan] Add field to track if intrinsic should be used for call. (NFC)
ClosedPublic

Authored by fhahn on Aug 24 2022, 12:03 PM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
rengolin

Commits

rGfc444ddc7720: [VPlan] Add field to track if intrinsic should be used for call. (NFC)

Summary

This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Aug 24 2022, 12:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 12:03 PM

Herald added subscribers: tschuett, psnobl, rogfer01 and 2 others. · View Herald Transcript

fhahn requested review of this revision.Aug 24 2022, 12:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 12:03 PM

Herald added a subscriber: vkmr. · View Herald Transcript

fhahn added a child revision: D132586: [VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC)..Aug 24 2022, 12:05 PM

Harbormaster completed remote builds in B183183: Diff 455317.Aug 24 2022, 1:44 PM

Ayal added inline comments.Aug 28 2022, 11:10 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8302	We already get the vector intrinsic ID here, reuse it instead of getting it again repeatedly below? (Independent of patch.)
8308–8309	`WillWiden` >> `CanUseVectorCall`? Using an intrinsic is also widening.
8308–8309	Above comment deserves an update.
8316	Avoid considering CallCost if NeedToScalarize is true? Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)? The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptAug 28 2022, 11:10 PM

fhahn mentioned this in rGc78696813f1a: [LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC)..Aug 29 2022, 2:20 AM

fhahn mentioned this in rG005d1a8ff533: [LV] Add test where either a libfunc or intrinsic is chosen..Aug 29 2022, 2:51 AM

Address comments, replace boolen flag with Intrinsic::ID, which will either be the chosen ID or Intrinsic::not_intrinsic. This removes the need for TLI in D132586.

fhahn marked 3 inline comments as done.Aug 29 2022, 3:06 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8302	Done in c78696813f1ac1b8253d06a2abf38d6647f9d7ae.
8308–8309	Updated, thanks!
8308–8309	The comment should be updated in the latest version.
8316	Avoid considering CallCost if NeedToScalarize is true? I am not sure if we need to handle this explicitly, as the cost comparison should either chose the vector intrinsic (if it is cheaper than the lib call which may get scalarized) or `CanUseVectorCall` will be also false. Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)? Added a check, thanks! The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC. I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533 It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not.

Harbormaster completed remote builds in B183889: Diff 456296.Aug 29 2022, 4:02 AM

Ayal added inline comments.Aug 29 2022, 7:16 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4166–4167	This is now also redundant given VectorIntrinsicID?
4174–4175	nit: can ask if (!VectorIntrinsicID \|\| ...) given that Intrinsic::not_intrinsic is fixed to zero.
4185	nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.
8316	The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC. I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533 Hmm, getDecisionAndClampRange() works with boolean decisions rather than 3-way ones. May result in excessive clamping, which is ok albeit potentially conservative. E.g., say first VF=2 of range can make a vector call but next VF=4 cannot, where both can more efficiently make an intrinsic call, range would clamp after VF=2 needlessly. One way to optimize the clamping is to figure out the compound decision for first VF of range and then getDecisionAndClampRange() accordingly - worth the hassle? bool ScalarBetterThanVectorAtStart; InstructionCost CallCostAtStart = CM.getVectorCallCost(CI, Range.Start, ScalarBetterThanVectorAtStart); bool IntrinsicBestAtStart = ID && CM.getVectorIntrinsicCost(CI, Range.Start) < CallCostAtStart; LoopVectorizationPlanner::getDecisionAndClampRange( [&](ElementCount VF) -> bool { bool ScalarBetterThanVectorAtVF; // Is it beneficial to perform intrinsic call compared to lib call? InstructionCost CallCostAtVF = CM.getVectorCallCost(CI, VF, ScalarBetterThanVectorAtVF); bool IntrinsicBestAtVF = ID && CM.getVectorIntrinsicCost(CI, VF) < CallCostAtVF; return (IntrinsicBestAtStart == IntrinsicBestAtVF) && (IntrinsicBestAtStart \|\| ScalarBetterThanVectorAtVF == ScalarBetterThanVectorAtVF); }, Range); CM.getVectorCallCost() already compares vector call cost with scalar call cost, returning the cheaper along with an indicator which is it. Perhaps worth extending this API to compare the three alternatives, returning the cheapest along with an indicator(s) which is it(?) It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not.
8322	nit: can ask if ID given that Intrinsic::not_intrinsic is fixed to zero.
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
454	nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
80–81	Pass some Intrinsic::ID instead of `true`?
llvm/unittests/Transforms/Vectorize/VPlanTest.cpp
809	`false` is synonymous with Intrinsic::not_intrinsic being zero?
1069	ditto

fhahn mentioned this in D132586: [VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC)..Aug 29 2022, 8:58 AM

Address latest comments, thanks!

fhahn added inline comments.Aug 31 2022, 9:57 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4166–4167	Yes, should be removed!
4174–4175	Though explicitly checking `== Intrinsic::not_intrinsic` may be clearer, but it seems too verbose. I simplified it.
4185	Simplified, thanks!
8316	Hm I tried to restructure to code to make things a bit clearer. If we can use an intrinsic call, clamp the decision to the range of intrinsic calls and return the recipe. If the intrinsic call is profitable at the start, we clamp the range until it becomes unprofitable. If it is not profitable at the beginning, we should clamp the range until it becomes profitable. If it is not profitable to use an intrinsic call at the start, it must be profitable to use a lib call. Now clamp to the range until lib calls are not profitable. I think that should avoid excessive clamping in most cases in practice and the code seems easier to follow. WDYT?
8322	Simplified, thanks!
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
454	Simplified thanks!

Harbormaster completed remote builds in B184390: Diff 456994.Aug 31 2022, 10:55 AM

Thanks for addressing, looks good to me, adding minor last nits.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8316	Hm I tried to restructure to code to make things a bit clearer. If we can use an intrinsic call, clamp the decision to the range of intrinsic calls and return the recipe. If the intrinsic call is profitable at the start, we clamp the range until it becomes unprofitable. If it is not profitable at the beginning, we should clamp the range until it becomes profitable. Agreed! "profitable" here means "most profitable/best", i.e., better than scalarizing and better than calling a vector library function. If it is not profitable to use an intrinsic call at the start, it must be profitable to use a lib call. Now clamp to the range until lib calls are not profitable. It is also possible that scalarizing is most profitable at start. In any case it's indeed fine to now clamp based on the better between scalarizing and using a lib call (which is best, i.e., also better than using an intrinsic), as done below. I think that should avoid excessive clamping in most cases in practice and the code seems easier to follow. WDYT? Agreed, excessive clamping is avoided and code is clearer, LGTM!
8325	nits: can drop `Should`, {}
8333	Maybe the following: // The flag shows whether it is better to scalarize the call than to call a vectorized version of the function. is a bit more accurate?
8340	nits: can drop `Should`, {}
llvm/lib/Transforms/Vectorize/VPlan.h
953	nit: comment that not_intrinsic/false indicates that a library call is used instead of an intrinsic.

This revision is now accepted and ready to land.Aug 31 2022, 1:01 PM

Closed by commit rGfc444ddc7720: [VPlan] Add field to track if intrinsic should be used for call. (NFC) (authored by fhahn). · Explain WhySep 1 2022, 5:15 AM

This revision was automatically updated to reflect the committed changes.

fhahn marked 3 inline comments as done.

fhahn added a commit: rGfc444ddc7720: [VPlan] Add field to track if intrinsic should be used for call. (NFC).

fhahn added inline comments.Sep 1 2022, 5:23 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8325	Done in the committed version, thanks!
8333	I updated the comment to + // Is better to call a vectorized version of the function than to to scalarize + // the call? in the committed version/
8340	Done in the committed version, thanks!
llvm/lib/Transforms/Vectorize/VPlan.h
953	Added a comment in the committed version, thanks!

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

87 lines

13 lines

5 lines

3 lines

8 lines

test/

Transforms/

LoopVectorize/

AArch64/

widen-call-with-intrinsic-or-libfunc.ll

31 lines

vplan-dot-printing.ll

2 lines

unittests/

Transforms/

Vectorize/

VPlanHCFGTest.cpp

11 lines

VPlanTest.cpp

5 lines

Diff 457232

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	public:
/// loop and the start value for the canonical induction, if it is != 0. The		/// loop and the start value for the canonical induction, if it is != 0. The
/// latter is the case when vectorizing the epilogue loop. In the case of		/// latter is the case when vectorizing the epilogue loop. In the case of
/// epilogue vectorization, this function is overriden to handle the more		/// epilogue vectorization, this function is overriden to handle the more
/// complex control flow around the loops.		/// complex control flow around the loops.
virtual std::pair<BasicBlock , Value > createVectorizedLoopSkeleton();		virtual std::pair<BasicBlock , Value > createVectorizedLoopSkeleton();

/// Widen a single call instruction within the innermost loop.		/// Widen a single call instruction within the innermost loop.
void widenCallInstruction(CallInst &CI, VPValue *Def, VPUser &ArgOperands,		void widenCallInstruction(CallInst &CI, VPValue *Def, VPUser &ArgOperands,
VPTransformState &State);		VPTransformState &State,
		Intrinsic::ID VectorIntrinsicID);

/// Fix the vectorized code, taking care of header phi's, live-outs, and more.		/// Fix the vectorized code, taking care of header phi's, live-outs, and more.
void fixVectorizedLoop(VPTransformState &State, VPlan &Plan);		void fixVectorizedLoop(VPTransformState &State, VPlan &Plan);

// Return true if any runtime check is added.		// Return true if any runtime check is added.
bool areSafetyChecksAdded() { return AddedSafetyChecks; }		bool areSafetyChecksAdded() { return AddedSafetyChecks; }

/// A type for vectorized values in the new loop. Each value from the		/// A type for vectorized values in the new loop. Each value from the
▲ Show 20 Lines • Show All 3,660 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixNonInductionPHIs(VPlan &Plan,
}		}
}		}

bool InnerLoopVectorizer::useOrderedReductions(		bool InnerLoopVectorizer::useOrderedReductions(
const RecurrenceDescriptor &RdxDesc) {		const RecurrenceDescriptor &RdxDesc) {
return Cost->useOrderedReductions(RdxDesc);		return Cost->useOrderedReductions(RdxDesc);
}		}

void InnerLoopVectorizer::widenCallInstruction(CallInst &CI, VPValue *Def,		void InnerLoopVectorizer::widenCallInstruction(
VPUser &ArgOperands,		CallInst &CI, VPValue *Def, VPUser &ArgOperands, VPTransformState &State,
VPTransformState &State) {		Intrinsic::ID VectorIntrinsicID) {
assert(!isa<DbgInfoIntrinsic>(CI) &&		assert(!isa<DbgInfoIntrinsic>(CI) &&
"DbgInfoIntrinsic should have been dropped during VPlan construction");		"DbgInfoIntrinsic should have been dropped during VPlan construction");
State.setDebugLocFromInst(&CI);		State.setDebugLocFromInst(&CI);

SmallVector<Type *, 4> Tys;		SmallVector<Type *, 4> Tys;
for (Value *ArgOperand : CI.args())		for (Value *ArgOperand : CI.args())
Tys.push_back(ToVectorTy(ArgOperand->getType(), VF.getKnownMinValue()));		Tys.push_back(ToVectorTy(ArgOperand->getType(), VF.getKnownMinValue()));

Intrinsic::ID ID = getVectorIntrinsicIDForCall(&CI, TLI);

// The flag shows whether we use Intrinsic or a usual Call for vectorized
// version of the instruction.
// Is it beneficial to perform intrinsic call compared to lib call?
bool NeedToScalarize = false;
InstructionCost CallCost = Cost->getVectorCallCost(&CI, VF, NeedToScalarize);
InstructionCost IntrinsicCost =
ID ? Cost->getVectorIntrinsicCost(&CI, VF) : 0;
bool UseVectorIntrinsic = ID && IntrinsicCost <= CallCost;
assert((UseVectorIntrinsic \|\| !NeedToScalarize) &&
"Instruction should be scalarized elsewhere.");
assert((IntrinsicCost.isValid() \|\| CallCost.isValid()) &&
"Either the intrinsic cost or vector call cost must be valid");

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
		AyalUnsubmitted Done Reply Inline Actions This is now also redundant given VectorIntrinsicID? Ayal: This is now also redundant given VectorIntrinsicID?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, should be removed! fhahn: Yes, should be removed!
SmallVector<Type *, 2> TysForDecl = {CI.getType()};		SmallVector<Type *, 2> TysForDecl = {CI.getType()};
SmallVector<Value *, 4> Args;		SmallVector<Value *, 4> Args;
for (const auto &I : enumerate(ArgOperands.operands())) {		for (const auto &I : enumerate(ArgOperands.operands())) {
// Some intrinsics have a scalar argument - don't replace it with a		// Some intrinsics have a scalar argument - don't replace it with a
// vector.		// vector.
Value *Arg;		Value *Arg;
if (!UseVectorIntrinsic \|\|		if (!VectorIntrinsicID \|\|
!isVectorIntrinsicWithScalarOpAtArg(ID, I.index()))		!isVectorIntrinsicWithScalarOpAtArg(VectorIntrinsicID, I.index()))
		AyalUnsubmitted Done Reply Inline Actions nit: can ask if (!VectorIntrinsicID \|\| ...) given that Intrinsic::not_intrinsic is fixed to zero. Ayal: nit: can ask if (!VectorIntrinsicID \|\| ...) given that Intrinsic::not_intrinsic is fixed to…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Though explicitly checking `== Intrinsic::not_intrinsic` may be clearer, but it seems too verbose. I simplified it. fhahn: Though explicitly checking `== Intrinsic::not_intrinsic` may be clearer, but it seems too…
Arg = State.get(I.value(), Part);		Arg = State.get(I.value(), Part);
else		else
Arg = State.get(I.value(), VPIteration(0, 0));		Arg = State.get(I.value(), VPIteration(0, 0));
if (isVectorIntrinsicWithOverloadTypeAtArg(ID, I.index()))		if (isVectorIntrinsicWithOverloadTypeAtArg(VectorIntrinsicID, I.index()))
TysForDecl.push_back(Arg->getType());		TysForDecl.push_back(Arg->getType());
Args.push_back(Arg);		Args.push_back(Arg);
}		}

Function *VectorF;		Function *VectorF;
if (UseVectorIntrinsic) {		if (VectorIntrinsicID) {
		AyalUnsubmitted Done Reply Inline Actions nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero. Ayal: nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Simplified, thanks! fhahn: Simplified, thanks!
// Use vector version of the intrinsic.		// Use vector version of the intrinsic.
if (VF.isVector())		if (VF.isVector())
TysForDecl[0] = VectorType::get(CI.getType()->getScalarType(), VF);		TysForDecl[0] = VectorType::get(CI.getType()->getScalarType(), VF);
Module *M = State.Builder.GetInsertBlock()->getModule();		Module *M = State.Builder.GetInsertBlock()->getModule();
VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);		VectorF = Intrinsic::getDeclaration(M, VectorIntrinsicID, TysForDecl);
assert(VectorF && "Can't retrieve vector intrinsic.");		assert(VectorF && "Can't retrieve vector intrinsic.");
} else {		} else {
// Use vector version of the function call.		// Use vector version of the function call.
const VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred/);		const VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred/);
#ifndef NDEBUG		#ifndef NDEBUG
assert(VFDatabase(CI).getVectorizedFunction(Shape) != nullptr &&		assert(VFDatabase(CI).getVectorizedFunction(Shape) != nullptr &&
"Can't create vector function.");		"Can't create vector function.");
#endif		#endif
▲ Show 20 Lines • Show All 4,095 Lines • ▼ Show 20 Lines	bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
[this, CI](ElementCount VF) {		[this, CI](ElementCount VF) {
return CM.isScalarWithPredication(CI, VF);		return CM.isScalarWithPredication(CI, VF);
},		},
Range);		Range);

if (IsPredicated)		if (IsPredicated)
return nullptr;		return nullptr;

Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
		AyalUnsubmitted Done Reply Inline Actions We already get the vector intrinsic ID here, reuse it instead of getting it again repeatedly below? (Independent of patch.) Ayal: We already get the vector intrinsic ID here, reuse it instead of getting it again repeatedly…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Done in c78696813f1ac1b8253d06a2abf38d6647f9d7ae. fhahn: Done in c78696813f1ac1b8253d06a2abf38d6647f9d7ae.
if (ID && (ID == Intrinsic::assume \|\| ID == Intrinsic::lifetime_end \|\|		if (ID && (ID == Intrinsic::assume \|\| ID == Intrinsic::lifetime_end \|\|
ID == Intrinsic::lifetime_start \|\| ID == Intrinsic::sideeffect \|\|		ID == Intrinsic::lifetime_start \|\| ID == Intrinsic::sideeffect \|\|
ID == Intrinsic::pseudoprobe \|\|		ID == Intrinsic::pseudoprobe \|\|
ID == Intrinsic::experimental_noalias_scope_decl))		ID == Intrinsic::experimental_noalias_scope_decl))
return nullptr;		return nullptr;

auto willWiden = [&](ElementCount VF) -> bool {		ArrayRef<VPValue *> Ops = Operands.take_front(CI->arg_size());
		AyalUnsubmitted Done Reply Inline Actions `WillWiden` >> `CanUseVectorCall`? Using an intrinsic is also widening. Ayal: `WillWiden` >> `CanUseVectorCall`? Using an intrinsic is also widening.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Updated, thanks! fhahn: Updated, thanks!
		AyalUnsubmitted Done Reply Inline Actions Above comment deserves an update. Ayal: Above comment deserves an update.
		fhahnAuthorUnsubmitted Done Reply Inline Actions The comment should be updated in the latest version. fhahn: The comment should be updated in the latest version.

		// Is it beneficial to perform intrinsic call compared to lib call?
		bool ShouldUseVectorIntrinsic =
		ID && LoopVectorizationPlanner::getDecisionAndClampRange(
		[&](ElementCount VF) -> bool {
		bool NeedToScalarize = false;
		// Is it beneficial to perform intrinsic call compared to lib
		AyalUnsubmitted Not Done Reply Inline Actions Avoid considering CallCost if NeedToScalarize is true? Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)? The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC. Ayal: Avoid considering CallCost if NeedToScalarize is true? Avoid getting decision and clamping…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Avoid considering CallCost if NeedToScalarize is true? I am not sure if we need to handle this explicitly, as the cost comparison should either chose the vector intrinsic (if it is cheaper than the lib call which may get scalarized) or `CanUseVectorCall` will be also false. Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)? Added a check, thanks! The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC. I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533 It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not. fhahn: > Avoid considering CallCost if NeedToScalarize is true? I am not sure if we need to handle…
		AyalUnsubmitted Not Done Reply Inline Actions The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC. I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533 Hmm, getDecisionAndClampRange() works with boolean decisions rather than 3-way ones. May result in excessive clamping, which is ok albeit potentially conservative. E.g., say first VF=2 of range can make a vector call but next VF=4 cannot, where both can more efficiently make an intrinsic call, range would clamp after VF=2 needlessly. One way to optimize the clamping is to figure out the compound decision for first VF of range and then getDecisionAndClampRange() accordingly - worth the hassle? bool ScalarBetterThanVectorAtStart; InstructionCost CallCostAtStart = CM.getVectorCallCost(CI, Range.Start, ScalarBetterThanVectorAtStart); bool IntrinsicBestAtStart = ID && CM.getVectorIntrinsicCost(CI, Range.Start) < CallCostAtStart; LoopVectorizationPlanner::getDecisionAndClampRange( [&](ElementCount VF) -> bool { bool ScalarBetterThanVectorAtVF; // Is it beneficial to perform intrinsic call compared to lib call? InstructionCost CallCostAtVF = CM.getVectorCallCost(CI, VF, ScalarBetterThanVectorAtVF); bool IntrinsicBestAtVF = ID && CM.getVectorIntrinsicCost(CI, VF) < CallCostAtVF; return (IntrinsicBestAtStart == IntrinsicBestAtVF) && (IntrinsicBestAtStart \|\| ScalarBetterThanVectorAtVF == ScalarBetterThanVectorAtVF); }, Range); CM.getVectorCallCost() already compares vector call cost with scalar call cost, returning the cheaper along with an indicator which is it. Perhaps worth extending this API to compare the three alternatives, returning the cheapest along with an indicator(s) which is it(?) It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not. Ayal: >> The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Hm I tried to restructure to code to make things a bit clearer. If we can use an intrinsic call, clamp the decision to the range of intrinsic calls and return the recipe. If the intrinsic call is profitable at the start, we clamp the range until it becomes unprofitable. If it is not profitable at the beginning, we should clamp the range until it becomes profitable. If it is not profitable to use an intrinsic call at the start, it must be profitable to use a lib call. Now clamp to the range until lib calls are not profitable. I think that should avoid excessive clamping in most cases in practice and the code seems easier to follow. WDYT? fhahn: Hm I tried to restructure to code to make things a bit clearer. If we can use an intrinsic…
		AyalUnsubmitted Not Done Reply Inline Actions Hm I tried to restructure to code to make things a bit clearer. If we can use an intrinsic call, clamp the decision to the range of intrinsic calls and return the recipe. If the intrinsic call is profitable at the start, we clamp the range until it becomes unprofitable. If it is not profitable at the beginning, we should clamp the range until it becomes profitable. Agreed! "profitable" here means "most profitable/best", i.e., better than scalarizing and better than calling a vector library function. If it is not profitable to use an intrinsic call at the start, it must be profitable to use a lib call. Now clamp to the range until lib calls are not profitable. It is also possible that scalarizing is most profitable at start. In any case it's indeed fine to now clamp based on the better between scalarizing and using a lib call (which is best, i.e., also better than using an intrinsic), as done below. I think that should avoid excessive clamping in most cases in practice and the code seems easier to follow. WDYT? Agreed, excessive clamping is avoided and code is clearer, LGTM! Ayal: > Hm I tried to restructure to code to make things a bit clearer. > > If we can use an…
		// call?
		InstructionCost CallCost =
		CM.getVectorCallCost(CI, VF, NeedToScalarize);
		InstructionCost IntrinsicCost =
		CM.getVectorIntrinsicCost(CI, VF);
		return IntrinsicCost <= CallCost;
		AyalUnsubmitted Done Reply Inline Actions nit: can ask if ID given that Intrinsic::not_intrinsic is fixed to zero. Ayal: nit: can ask if ID given that Intrinsic::not_intrinsic is fixed to zero.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Simplified, thanks! fhahn: Simplified, thanks!
		},
		Range);
		if (ShouldUseVectorIntrinsic)
		AyalUnsubmitted Done Reply Inline Actions nits: can drop `Should`, {} Ayal: nits: can drop `Should`, {}
		fhahnAuthorUnsubmitted Done Reply Inline Actions Done in the committed version, thanks! fhahn: Done in the committed version, thanks!
		return new VPWidenCallRecipe(*CI, make_range(Ops.begin(), Ops.end()), ID);

		// Is better to call a vectorized version of the function than to to scalarize
		// the call?
		auto ShouldUseVectorCall = LoopVectorizationPlanner::getDecisionAndClampRange(
		[&](ElementCount VF) -> bool {
// The following case may be scalarized depending on the VF.		// The following case may be scalarized depending on the VF.
// The flag shows whether we use Intrinsic or a usual Call for vectorized		// The flag shows whether we can use a usual Call for vectorized
		AyalUnsubmitted Not Done Reply Inline Actions Maybe the following: // The flag shows whether it is better to scalarize the call than to call a vectorized version of the function. is a bit more accurate? Ayal: Maybe the following: ``` // The flag shows whether it is better to scalarize the call than to…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I updated the comment to + // Is better to call a vectorized version of the function than to to scalarize + // the call? in the committed version/ fhahn: I updated the comment to ``` + // Is better to call a vectorized version of the function than…
// version of the instruction.		// version of the instruction.
// Is it beneficial to perform intrinsic call compared to lib call?
bool NeedToScalarize = false;		bool NeedToScalarize = false;
InstructionCost CallCost = CM.getVectorCallCost(CI, VF, NeedToScalarize);		CM.getVectorCallCost(CI, VF, NeedToScalarize);
InstructionCost IntrinsicCost = ID ? CM.getVectorIntrinsicCost(CI, VF) : 0;		return !NeedToScalarize;
bool UseVectorIntrinsic = ID && IntrinsicCost <= CallCost;		},
return UseVectorIntrinsic \|\| !NeedToScalarize;		Range);
};		if (ShouldUseVectorCall)
		AyalUnsubmitted Done Reply Inline Actions nits: can drop `Should`, {} Ayal: nits: can drop `Should`, {}
		fhahnAuthorUnsubmitted Done Reply Inline Actions Done in the committed version, thanks! fhahn: Done in the committed version, thanks!
		return new VPWidenCallRecipe(*CI, make_range(Ops.begin(), Ops.end()),
		Intrinsic::not_intrinsic);

if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))
return nullptr;		return nullptr;

ArrayRef<VPValue *> Ops = Operands.take_front(CI->arg_size());
return new VPWidenCallRecipe(*CI, make_range(Ops.begin(), Ops.end()));
}		}

bool VPRecipeBuilder::shouldWiden(Instruction *I, VFRange &Range) const {		bool VPRecipeBuilder::shouldWiden(Instruction *I, VFRange &Range) const {
assert(!isa<BranchInst>(I) && !isa<PHINode>(I) && !isa<LoadInst>(I) &&		assert(!isa<BranchInst>(I) && !isa<PHINode>(I) && !isa<LoadInst>(I) &&
!isa<StoreInst>(I) && "Instruction should have been handled earlier");		!isa<StoreInst>(I) && "Instruction should have been handled earlier");
// Instruction should be widened, unless it is scalar after vectorization,		// Instruction should be widened, unless it is scalar after vectorization,
// scalarization is profitable or it is predicated.		// scalarization is profitable or it is predicated.
auto WillScalarize = [this, I](ElementCount VF) -> bool {		auto WillScalarize = [this, I](ElementCount VF) -> bool {
▲ Show 20 Lines • Show All 807 Lines • ▼ Show 20 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
for (ElementCount VF = Range.Start; ElementCount::isKnownLT(VF, Range.End);		for (ElementCount VF = Range.Start; ElementCount::isKnownLT(VF, Range.End);
VF *= 2)		VF *= 2)
Plan->addVF(VF);		Plan->addVF(VF);

SmallPtrSet<Instruction *, 1> DeadInstructions;		SmallPtrSet<Instruction *, 1> DeadInstructions;
VPlanTransforms::VPInstructionsToVPRecipes(		VPlanTransforms::VPInstructionsToVPRecipes(
OrigLoop, Plan,		OrigLoop, Plan,
[this](PHINode *P) { return Legal->getIntOrFpInductionDescriptor(P); },		[this](PHINode *P) { return Legal->getIntOrFpInductionDescriptor(P); },
DeadInstructions, *PSE.getSE());		DeadInstructions, PSE.getSE(), TLI);

// Remove the existing terminator of the exiting block of the top-most region.		// Remove the existing terminator of the exiting block of the top-most region.
// A BranchOnCount will be added instead when adding the canonical IV recipes.		// A BranchOnCount will be added instead when adding the canonical IV recipes.
auto *Term =		auto *Term =
Plan->getVectorLoopRegion()->getExitingBasicBlock()->getTerminator();		Plan->getVectorLoopRegion()->getExitingBasicBlock()->getTerminator();
Term->eraseFromParent();		Term->eraseFromParent();

addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DebugLoc(),		addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DebugLoc(),
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < IG->getFactor(); ++i) {
}		}
++OpIdx;		++OpIdx;
}		}
}		}
#endif		#endif

void VPWidenCallRecipe::execute(VPTransformState &State) {		void VPWidenCallRecipe::execute(VPTransformState &State) {
State.ILV->widenCallInstruction(*cast<CallInst>(getUnderlyingInstr()), this,		State.ILV->widenCallInstruction(*cast<CallInst>(getUnderlyingInstr()), this,
*this, State);		*this, State, VectorIntrinsicID);
}		}

void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {		void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Int or FP induction being replicated.");		assert(!State.Instance && "Int or FP induction being replicated.");

Value *Start = getStartValue()->getLiveInIRValue();		Value *Start = getStartValue()->getLiveInIRValue();
const InductionDescriptor &ID = getInductionDescriptor();		const InductionDescriptor &ID = getInductionDescriptor();
TruncInst *Trunc = getTruncInst();		TruncInst *Trunc = getTruncInst();
▲ Show 20 Lines • Show All 1,336 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
class RecurrenceDescriptor;		class RecurrenceDescriptor;
class Value;		class Value;
class VPBasicBlock;		class VPBasicBlock;
class VPRegionBlock;		class VPRegionBlock;
class VPlan;		class VPlan;
class VPReplicateRecipe;		class VPReplicateRecipe;
class VPlanSlp;		class VPlanSlp;

		namespace Intrinsic {
		typedef unsigned ID;
		}

/// Returns a calculation for the total number of elements for a given \p VF.		/// Returns a calculation for the total number of elements for a given \p VF.
/// For fixed width vectors this value is a constant, whereas for scalable		/// For fixed width vectors this value is a constant, whereas for scalable
/// vectors it is an expression determined at runtime.		/// vectors it is an expression determined at runtime.
Value getRuntimeVF(IRBuilderBase &B, Type Ty, ElementCount VF);		Value getRuntimeVF(IRBuilderBase &B, Type Ty, ElementCount VF);

/// Return a value for Step multiplied by VF.		/// Return a value for Step multiplied by VF.
Value createStepForVF(IRBuilderBase &B, Type Ty, ElementCount VF,		Value createStepForVF(IRBuilderBase &B, Type Ty, ElementCount VF,
int64_t Step);		int64_t Step);
▲ Show 20 Lines • Show All 866 Lines • ▼ Show 20 Lines	#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
#endif		#endif
};		};

/// A recipe for widening Call instructions.		/// A recipe for widening Call instructions.
class VPWidenCallRecipe : public VPRecipeBase, public VPValue {		class VPWidenCallRecipe : public VPRecipeBase, public VPValue {
		/// ID of the vector intrinsic to call when widening the call. If set the
		AyalUnsubmitted Done Reply Inline Actions nit: comment that not_intrinsic/false indicates that a library call is used instead of an intrinsic. Ayal: nit: comment that not_intrinsic/false indicates that a library call is used instead of an…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Added a comment in the committed version, thanks! fhahn: Added a comment in the committed version, thanks!
		/// Intrinsic::not_intrinsic, a library call will be used instead.
		Intrinsic::ID VectorIntrinsicID;

public:		public:
template <typename IterT>		template <typename IterT>
VPWidenCallRecipe(CallInst &I, iterator_range<IterT> CallArguments)		VPWidenCallRecipe(CallInst &I, iterator_range<IterT> CallArguments,
		Intrinsic::ID VectorIntrinsicID)
: VPRecipeBase(VPRecipeBase::VPWidenCallSC, CallArguments),		: VPRecipeBase(VPRecipeBase::VPWidenCallSC, CallArguments),
VPValue(VPValue::VPVWidenCallSC, &I, this) {}		VPValue(VPValue::VPVWidenCallSC, &I, this),
		VectorIntrinsicID(VectorIntrinsicID) {}

~VPWidenCallRecipe() override = default;		~VPWidenCallRecipe() override = default;

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPDef *D) {		static inline bool classof(const VPDef *D) {
return D->getVPDefID() == VPRecipeBase::VPWidenCallSC;		return D->getVPDefID() == VPRecipeBase::VPWidenCallSC;
}		}

▲ Show 20 Lines • Show All 2,101 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Show First 20 Lines • Show All 444 Lines • ▼ Show 20 Lines	void VPWidenCallRecipe::print(raw_ostream &O, const Twine &Indent,
else {		else {
printAsOperand(O, SlotTracker);		printAsOperand(O, SlotTracker);
O << " = ";		O << " = ";
}		}

O << "call @" << CI->getCalledFunction()->getName() << "(";		O << "call @" << CI->getCalledFunction()->getName() << "(";
printOperands(O, SlotTracker);		printOperands(O, SlotTracker);
O << ")";		O << ")";

		if (VectorIntrinsicID)
		AyalUnsubmitted Done Reply Inline Actions nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero. Ayal: nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Simplified thanks! fhahn: Simplified thanks!
		O << " (using vector intrinsic)";
		else
		O << " (using library function)";
}		}

void VPWidenSelectRecipe::print(raw_ostream &O, const Twine &Indent,		void VPWidenSelectRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
O << Indent << "WIDEN-SELECT ";		O << Indent << "WIDEN-SELECT ";
printAsOperand(O, SlotTracker);		printAsOperand(O, SlotTracker);
O << " = select ";		O << " = select ";
getOperand(0)->printAsOperand(O, SlotTracker);		getOperand(0)->printAsOperand(O, SlotTracker);
▲ Show 20 Lines • Show All 779 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

	Show All 17 Lines

	namespace llvm {			namespace llvm {

	class InductionDescriptor;			class InductionDescriptor;
	class Instruction;			class Instruction;
	class PHINode;			class PHINode;
	class ScalarEvolution;			class ScalarEvolution;
	class Loop;			class Loop;
				class TargetLibraryInfo;

	struct VPlanTransforms {			struct VPlanTransforms {
	/// Replaces the VPInstructions in \p Plan with corresponding			/// Replaces the VPInstructions in \p Plan with corresponding
	/// widen recipes.			/// widen recipes.
	static void			static void
	VPInstructionsToVPRecipes(Loop *OrigLoop, VPlanPtr &Plan,			VPInstructionsToVPRecipes(Loop *OrigLoop, VPlanPtr &Plan,
	function_ref<const InductionDescriptor (PHINode )>			function_ref<const InductionDescriptor (PHINode )>
	GetIntOrFpInductionDescriptor,			GetIntOrFpInductionDescriptor,
	SmallPtrSetImpl<Instruction *> &DeadInstructions,			SmallPtrSetImpl<Instruction *> &DeadInstructions,
	ScalarEvolution &SE);			ScalarEvolution &SE, const TargetLibraryInfo &TLI);

	static bool sinkScalarOperands(VPlan &Plan);			static bool sinkScalarOperands(VPlan &Plan);

	static bool mergeReplicateRegions(VPlan &Plan);			static bool mergeReplicateRegions(VPlan &Plan);

	/// Remove redundant casts of inductions.			/// Remove redundant casts of inductions.
	///			///
	/// Such redundant casts are casts of induction variables that can be ignored,			/// Such redundant casts are casts of induction variables that can be ignored,
	Show All 25 Lines

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Show All 9 Lines
/// This file implements a set of utility VPlan to VPlan transformations.		/// This file implements a set of utility VPlan to VPlan transformations.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "VPlanTransforms.h"		#include "VPlanTransforms.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/Analysis/IVDescriptors.h"		#include "llvm/Analysis/IVDescriptors.h"
		#include "llvm/Analysis/VectorUtils.h"
		#include "llvm/IR/Intrinsics.h"

using namespace llvm;		using namespace llvm;

void VPlanTransforms::VPInstructionsToVPRecipes(		void VPlanTransforms::VPInstructionsToVPRecipes(
Loop *OrigLoop, VPlanPtr &Plan,		Loop *OrigLoop, VPlanPtr &Plan,
function_ref<const InductionDescriptor (PHINode )>		function_ref<const InductionDescriptor (PHINode )>
GetIntOrFpInductionDescriptor,		GetIntOrFpInductionDescriptor,
SmallPtrSetImpl<Instruction *> &DeadInstructions, ScalarEvolution &SE) {		SmallPtrSetImpl<Instruction *> &DeadInstructions, ScalarEvolution &SE,
		const TargetLibraryInfo &TLI) {

ReversePostOrderTraversal<VPBlockRecursiveTraversalWrapper<VPBlockBase *>>		ReversePostOrderTraversal<VPBlockRecursiveTraversalWrapper<VPBlockBase *>>
RPOT(Plan->getEntry());		RPOT(Plan->getEntry());
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {		for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
VPRecipeBase *Term = VPBB->getTerminator();		VPRecipeBase *Term = VPBB->getTerminator();
auto EndIter = Term ? Term->getIterator() : VPBB->end();		auto EndIter = Term ? Term->getIterator() : VPBB->end();
// Introduce each ingredient into VPlan.		// Introduce each ingredient into VPlan.
for (VPRecipeBase &Ingredient :		for (VPRecipeBase &Ingredient :
Show All 35 Lines	for (VPRecipeBase &Ingredient :
*Store, Plan->getOrAddVPValue(getLoadStorePointerOperand(Inst)),		*Store, Plan->getOrAddVPValue(getLoadStorePointerOperand(Inst)),
Plan->getOrAddVPValue(Store->getValueOperand()), nullptr /Mask/,		Plan->getOrAddVPValue(Store->getValueOperand()), nullptr /Mask/,
false /Consecutive/, false /Reverse/);		false /Consecutive/, false /Reverse/);
} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {		} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {
NewRecipe = new VPWidenGEPRecipe(		NewRecipe = new VPWidenGEPRecipe(
GEP, Plan->mapToVPValues(GEP->operands()), OrigLoop);		GEP, Plan->mapToVPValues(GEP->operands()), OrigLoop);
} else if (CallInst *CI = dyn_cast<CallInst>(Inst)) {		} else if (CallInst *CI = dyn_cast<CallInst>(Inst)) {
NewRecipe =		NewRecipe =
new VPWidenCallRecipe(*CI, Plan->mapToVPValues(CI->args()));		new VPWidenCallRecipe(*CI, Plan->mapToVPValues(CI->args()),
		getVectorIntrinsicIDForCall(CI, &TLI));
		AyalUnsubmitted Not Done Reply Inline Actions Pass some Intrinsic::ID instead of `true`? Ayal: Pass some Intrinsic::ID instead of `true`?
} else if (SelectInst *SI = dyn_cast<SelectInst>(Inst)) {		} else if (SelectInst *SI = dyn_cast<SelectInst>(Inst)) {
bool InvariantCond =		bool InvariantCond =
SE.isLoopInvariant(SE.getSCEV(SI->getOperand(0)), OrigLoop);		SE.isLoopInvariant(SE.getSCEV(SI->getOperand(0)), OrigLoop);
NewRecipe = new VPWidenSelectRecipe(		NewRecipe = new VPWidenSelectRecipe(
*SI, Plan->mapToVPValues(SI->operands()), InvariantCond);		*SI, Plan->mapToVPValues(SI->operands()), InvariantCond);
} else {		} else {
NewRecipe =		NewRecipe =
new VPWidenRecipe(*Inst, Plan->mapToVPValues(Inst->operands()));		new VPWidenRecipe(*Inst, Plan->mapToVPValues(Inst->operands()));
▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/widen-call-with-intrinsic-or-libfunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; REQUIRES: asserts			; REQUIRES: asserts

	; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -vectorizer-maximize-bandwidth -mtriple=arm64-apple-ios -debug -S %s 2>&1 \| FileCheck %s			; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -vectorizer-maximize-bandwidth -mtriple=arm64-apple-ios -debug -S %s 2>&1 \| FileCheck %s

	target triple = "arm64-apple-ios"			target triple = "arm64-apple-ios"

	; CHECK-LABEL: LV: Checking a loop in 'test'			; CHECK-LABEL: LV: Checking a loop in 'test'
	; CHECK: VPlan 'Initial VPlan for VF={2,4},UF>=1' {			; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {
	; CHECK-NEXT: Live-in vp<%1> = vector-trip-count			; CHECK-NEXT: Live-in vp<%1> = vector-trip-count
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: vector.ph:			; CHECK-NEXT: vector.ph:
	; CHECK-NEXT: Successor(s): vector loop			; CHECK-NEXT: Successor(s): vector loop
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <x1> vector loop: {			; CHECK-NEXT: <x1> vector loop: {
	; CHECK-NEXT: vector.body:			; CHECK-NEXT: vector.body:
	; CHECK-NEXT: EMIT vp<%2> = CANONICAL-INDUCTION			; CHECK-NEXT: EMIT vp<%2> = CANONICAL-INDUCTION
	; CHECK-NEXT: vp<%3> = SCALAR-STEPS vp<%2>, ir<0>, ir<1>			; CHECK-NEXT: vp<%3> = SCALAR-STEPS vp<%2>, ir<0>, ir<1>
	; CHECK-NEXT: CLONE ir<%gep.src> = getelementptr ir<%src>, vp<%3>			; CHECK-NEXT: CLONE ir<%gep.src> = getelementptr ir<%src>, vp<%3>
	; CHECK-NEXT: WIDEN ir<%l> = load ir<%gep.src>			; CHECK-NEXT: WIDEN ir<%l> = load ir<%gep.src>
	; CHECK-NEXT: WIDEN ir<%conv> = fpext ir<%l>			; CHECK-NEXT: WIDEN ir<%conv> = fpext ir<%l>
	; CHECK-NEXT: WIDEN-CALL ir<%s> = call @llvm.sin.f64(ir<%conv>)			; CHECK-NEXT: WIDEN-CALL ir<%s> = call @llvm.sin.f64(ir<%conv>) (using library function)
	; CHECK-NEXT: REPLICATE ir<%gep.dst> = getelementptr ir<%dst>, vp<%3>			; CHECK-NEXT: REPLICATE ir<%gep.dst> = getelementptr ir<%dst>, vp<%3>
	; CHECK-NEXT: REPLICATE store ir<%s>, ir<%gep.dst>			; CHECK-NEXT: REPLICATE store ir<%s>, ir<%gep.dst>
	; CHECK-NEXT: EMIT vp<%10> = VF * UF +(nuw) vp<%2>			; CHECK-NEXT: EMIT vp<%10> = VF * UF +(nuw) vp<%2>
	; CHECK-NEXT: EMIT branch-on-count vp<%10> vp<%1>			; CHECK-NEXT: EMIT branch-on-count vp<%10> vp<%1>
	; CHECK-NEXT: No successors			; CHECK-NEXT: No successors
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: Successor(s): middle.block			; CHECK-NEXT: Successor(s): middle.block
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: middle.block:			; CHECK-NEXT: middle.block:
	; CHECK-NEXT: No successors			; CHECK-NEXT: No successors
	; CHECK-NEXT: }			; CHECK-NEXT: }

				; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' {
				; CHECK-NEXT: Live-in vp<%1> = vector-trip-count
				; CHECK-EMPTY:
				; CHECK-NEXT: vector.ph:
				; CHECK-NEXT: Successor(s): vector loop
				; CHECK-EMPTY:
				; CHECK-NEXT: <x1> vector loop: {
				; CHECK-NEXT: vector.body:
				; CHECK-NEXT: EMIT vp<%2> = CANONICAL-INDUCTION
				; CHECK-NEXT: vp<%3> = SCALAR-STEPS vp<%2>, ir<0>, ir<1>
				; CHECK-NEXT: CLONE ir<%gep.src> = getelementptr ir<%src>, vp<%3>
				; CHECK-NEXT: WIDEN ir<%l> = load ir<%gep.src>
				; CHECK-NEXT: WIDEN ir<%conv> = fpext ir<%l>
				; CHECK-NEXT: WIDEN-CALL ir<%s> = call @llvm.sin.f64(ir<%conv>) (using vector intrinsic)
				; CHECK-NEXT: REPLICATE ir<%gep.dst> = getelementptr ir<%dst>, vp<%3>
				; CHECK-NEXT: REPLICATE store ir<%s>, ir<%gep.dst>
				; CHECK-NEXT: EMIT vp<%10> = VF * UF +(nuw) vp<%2>
				; CHECK-NEXT: EMIT branch-on-count vp<%10> vp<%1>
				; CHECK-NEXT: No successors
				; CHECK-NEXT: }
				; CHECK-NEXT: Successor(s): middle.block
				; CHECK-EMPTY:
				; CHECK-NEXT: middle.block:
				; CHECK-NEXT: No successors
				; CHECK-NEXT: }
				;
	;			;
	define void @test(ptr noalias %src, ptr noalias %dst) {			define void @test(ptr noalias %src, ptr noalias %dst) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, ptr [[SRC:%.]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, ptr [[SRC:%.]], i64 [[TMP0]]
	Show All 38 Lines

llvm/test/Transforms/LoopVectorize/vplan-dot-printing.ll

	Show All 20 Lines
	; CHECK-NEXT: fontname=Courier			; CHECK-NEXT: fontname=Courier
	; CHECK-NEXT: label="\<x1\> vector loop"			; CHECK-NEXT: label="\<x1\> vector loop"
	; CHECK-NEXT: N1 [label =			; CHECK-NEXT: N1 [label =
	; CHECK-NEXT: "vector.body:\l" +			; CHECK-NEXT: "vector.body:\l" +
	; CHECK-NEXT: " EMIT vp\<[[CAN_IV:%.+]]\> = CANONICAL-INDUCTION\l" +			; CHECK-NEXT: " EMIT vp\<[[CAN_IV:%.+]]\> = CANONICAL-INDUCTION\l" +
	; CHECK-NEXT: " vp\<[[STEPS:%.+]]\> = SCALAR-STEPS vp\<[[CAN_IV]]\>, ir\<0\>, ir\<1\>\l" +			; CHECK-NEXT: " vp\<[[STEPS:%.+]]\> = SCALAR-STEPS vp\<[[CAN_IV]]\>, ir\<0\>, ir\<1\>\l" +
	; CHECK-NEXT: " CLONE ir\<%arrayidx\> = getelementptr ir\<%y\>, vp\<[[STEPS]]\>\l" +			; CHECK-NEXT: " CLONE ir\<%arrayidx\> = getelementptr ir\<%y\>, vp\<[[STEPS]]\>\l" +
	; CHECK-NEXT: " WIDEN ir\<%lv\> = load ir\<%arrayidx\>\l" +			; CHECK-NEXT: " WIDEN ir\<%lv\> = load ir\<%arrayidx\>\l" +
	; CHECK-NEXT: " WIDEN-CALL ir\<%call\> = call @llvm.sqrt.f32(ir\<%lv\>)\l" +			; CHECK-NEXT: " WIDEN-CALL ir\<%call\> = call @llvm.sqrt.f32(ir\<%lv\>) (using vector intrinsic)\l" +
	; CHECK-NEXT: " CLONE ir\<%arrayidx2\> = getelementptr ir\<%x\>, vp\<[[STEPS]]\>\l" +			; CHECK-NEXT: " CLONE ir\<%arrayidx2\> = getelementptr ir\<%x\>, vp\<[[STEPS]]\>\l" +
	; CHECK-NEXT: " WIDEN store ir\<%arrayidx2\>, ir\<%call\>\l" +			; CHECK-NEXT: " WIDEN store ir\<%arrayidx2\>, ir\<%call\>\l" +
	; CHECK-NEXT: " EMIT vp\<[[CAN_IV_NEXT:%.+]]\> = VF * UF +(nuw) vp\<[[CAN_IV]]\>\l" +			; CHECK-NEXT: " EMIT vp\<[[CAN_IV_NEXT:%.+]]\> = VF * UF +(nuw) vp\<[[CAN_IV]]\>\l" +
	; CHECK-NEXT: " EMIT branch-on-count vp\<[[CAN_IV_NEXT]]\> vp\<{{.+}}\>\l" +			; CHECK-NEXT: " EMIT branch-on-count vp\<[[CAN_IV_NEXT]]\> vp\<{{.+}}\>\l" +
	; CHECK-NEXT: "No successors\l"			; CHECK-NEXT: "No successors\l"
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	;			;
	entry:			entry:
	Show All 19 Lines

llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp

//===- llvm/unittest/Transforms/Vectorize/VPlanHCFGTest.cpp ---------------===//		//===- llvm/unittest/Transforms/Vectorize/VPlanHCFGTest.cpp ---------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "../lib/Transforms/Vectorize/VPlan.h"		#include "../lib/Transforms/Vectorize/VPlan.h"
#include "../lib/Transforms/Vectorize/VPlanTransforms.h"		#include "../lib/Transforms/Vectorize/VPlanTransforms.h"
#include "VPlanTestBase.h"		#include "VPlanTestBase.h"
		#include "llvm/ADT/Triple.h"
		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
#include <string>		#include <string>

namespace llvm {		namespace llvm {
namespace {		namespace {

class VPlanHCFGTest : public VPlanTestBase {};		class VPlanHCFGTest : public VPlanTestBase {};

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	compound=true
N3 [label =		N3 [label =
"for.end:\l" +		"for.end:\l" +
"No successors\l"		"No successors\l"
]		]
}		}
)";		)";
EXPECT_EQ(ExpectedStr, FullDump);		EXPECT_EQ(ExpectedStr, FullDump);
#endif		#endif
		TargetLibraryInfoImpl TLII(Triple(M.getTargetTriple()));
		TargetLibraryInfo TLI(TLII);
SmallPtrSet<Instruction *, 1> DeadInstructions;		SmallPtrSet<Instruction *, 1> DeadInstructions;
VPlanTransforms::VPInstructionsToVPRecipes(		VPlanTransforms::VPInstructionsToVPRecipes(
LI->getLoopFor(LoopHeader), Plan, [](PHINode *P) { return nullptr; },		LI->getLoopFor(LoopHeader), Plan, [](PHINode *P) { return nullptr; },
DeadInstructions, *SE);		DeadInstructions, *SE, TLI);
}		}

TEST_F(VPlanHCFGTest, testVPInstructionToVPRecipesInner) {		TEST_F(VPlanHCFGTest, testVPInstructionToVPRecipesInner) {
const char *ModuleString =		const char *ModuleString =
"define void @f(i32* %A, i64 %N) {\n"		"define void @f(i32* %A, i64 %N) {\n"
"entry:\n"		"entry:\n"
" br label %for.body\n"		" br label %for.body\n"
"for.body:\n"		"for.body:\n"
Show All 11 Lines	TEST_F(VPlanHCFGTest, testVPInstructionToVPRecipesInner) {

Module &M = parseModule(ModuleString);		Module &M = parseModule(ModuleString);

Function *F = M.getFunction("f");		Function *F = M.getFunction("f");
BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();		BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();
auto Plan = buildHCFG(LoopHeader);		auto Plan = buildHCFG(LoopHeader);

SmallPtrSet<Instruction *, 1> DeadInstructions;		SmallPtrSet<Instruction *, 1> DeadInstructions;
		TargetLibraryInfoImpl TLII(Triple(M.getTargetTriple()));
		TargetLibraryInfo TLI(TLII);
VPlanTransforms::VPInstructionsToVPRecipes(		VPlanTransforms::VPInstructionsToVPRecipes(
LI->getLoopFor(LoopHeader), Plan, [](PHINode *P) { return nullptr; },		LI->getLoopFor(LoopHeader), Plan, [](PHINode *P) { return nullptr; },
DeadInstructions, *SE);		DeadInstructions, *SE, TLI);

VPBlockBase *Entry = Plan->getEntry()->getEntryBasicBlock();		VPBlockBase *Entry = Plan->getEntry()->getEntryBasicBlock();
EXPECT_NE(nullptr, Entry->getSingleSuccessor());		EXPECT_NE(nullptr, Entry->getSingleSuccessor());
EXPECT_EQ(0u, Entry->getNumPredecessors());		EXPECT_EQ(0u, Entry->getNumPredecessors());
EXPECT_EQ(1u, Entry->getNumSuccessors());		EXPECT_EQ(1u, Entry->getNumSuccessors());

// Check that the region following the preheader is a single basic-block		// Check that the region following the preheader is a single basic-block
// region (loop).		// region (loop).
Show All 21 Lines

llvm/unittests/Transforms/Vectorize/VPlanTest.cpp

Show First 20 Lines • Show All 800 Lines • ▼ Show 20 Lines	TEST(VPRecipeTest, CastVPWidenCallRecipeToVPUserAndVPDef) {
IntegerType *Int32 = IntegerType::get(C, 32);		IntegerType *Int32 = IntegerType::get(C, 32);
FunctionType *FTy = FunctionType::get(Int32, false);		FunctionType *FTy = FunctionType::get(Int32, false);
auto *Call = CallInst::Create(FTy, UndefValue::get(FTy));		auto *Call = CallInst::Create(FTy, UndefValue::get(FTy));
VPValue Op1;		VPValue Op1;
VPValue Op2;		VPValue Op2;
SmallVector<VPValue *, 2> Args;		SmallVector<VPValue *, 2> Args;
Args.push_back(&Op1);		Args.push_back(&Op1);
Args.push_back(&Op2);		Args.push_back(&Op2);
VPWidenCallRecipe Recipe(*Call, make_range(Args.begin(), Args.end()));		VPWidenCallRecipe Recipe(*Call, make_range(Args.begin(), Args.end()), false);
		AyalUnsubmitted Not Done Reply Inline Actions `false` is synonymous with Intrinsic::not_intrinsic being zero? Ayal: `false` is synonymous with Intrinsic::not_intrinsic being zero?
EXPECT_TRUE(isa<VPUser>(&Recipe));		EXPECT_TRUE(isa<VPUser>(&Recipe));
VPRecipeBase *BaseR = &Recipe;		VPRecipeBase *BaseR = &Recipe;
EXPECT_TRUE(isa<VPUser>(BaseR));		EXPECT_TRUE(isa<VPUser>(BaseR));
EXPECT_EQ(&Recipe, BaseR);		EXPECT_EQ(&Recipe, BaseR);

VPValue *VPV = &Recipe;		VPValue *VPV = &Recipe;
EXPECT_TRUE(isa<VPRecipeBase>(VPV->getDef()));		EXPECT_TRUE(isa<VPRecipeBase>(VPV->getDef()));
EXPECT_EQ(&Recipe, dyn_cast<VPRecipeBase>(VPV->getDef()));		EXPECT_EQ(&Recipe, dyn_cast<VPRecipeBase>(VPV->getDef()));
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	TEST(VPRecipeTest, MayHaveSideEffectsAndMayReadWriteMemory) {
{		{
FunctionType *FTy = FunctionType::get(Int32, false);		FunctionType *FTy = FunctionType::get(Int32, false);
auto *Call = CallInst::Create(FTy, UndefValue::get(FTy));		auto *Call = CallInst::Create(FTy, UndefValue::get(FTy));
VPValue Op1;		VPValue Op1;
VPValue Op2;		VPValue Op2;
SmallVector<VPValue *, 2> Args;		SmallVector<VPValue *, 2> Args;
Args.push_back(&Op1);		Args.push_back(&Op1);
Args.push_back(&Op2);		Args.push_back(&Op2);
VPWidenCallRecipe Recipe(*Call, make_range(Args.begin(), Args.end()));		VPWidenCallRecipe Recipe(*Call, make_range(Args.begin(), Args.end()),
		false);
		AyalUnsubmitted Not Done Reply Inline Actions ditto Ayal: ditto
EXPECT_TRUE(Recipe.mayHaveSideEffects());		EXPECT_TRUE(Recipe.mayHaveSideEffects());
EXPECT_TRUE(Recipe.mayReadFromMemory());		EXPECT_TRUE(Recipe.mayReadFromMemory());
EXPECT_TRUE(Recipe.mayWriteToMemory());		EXPECT_TRUE(Recipe.mayWriteToMemory());
EXPECT_TRUE(Recipe.mayReadOrWriteMemory());		EXPECT_TRUE(Recipe.mayReadOrWriteMemory());
delete Call;		delete Call;
}		}

// The initial implementation is conservative with respect to VPInstructions.		// The initial implementation is conservative with respect to VPInstructions.
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add field to track if intrinsic should be used for call. (NFC)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 457232

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/AArch64/widen-call-with-intrinsic-or-libfunc.ll

llvm/test/Transforms/LoopVectorize/vplan-dot-printing.ll

llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp

llvm/unittests/Transforms/Vectorize/VPlanTest.cpp

[VPlan] Add field to track if intrinsic should be used for call. (NFC)
ClosedPublic