This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
11/11
LoopVectorize.cpp
-
VPlan.h
-
VPlan.cpp
-
VPlanValue.h
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
6/6
sve-vpreplicate.ll

Differential D105199

[LoopVectorize] Fix scalable vector crash in VPReplicateRecipe::execute
AbandonedPublic

Authored by david-arm on Jun 30 2021, 8:34 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
CarolineConcatto
peterwaller-arm
kmclaughlin
fhahn

Summary

When trying to vectorise certain loops using scalable vectors we try to
use the replicate recipe even for non-uniform cases. The recipe does
not handle scalable vectors correctly because it tries to scalarise
the vector instruction by generating a scalar instance for each lane
and packing them into a vector. We don't know the number of lanes at
runtime so this is not an option.

I've decided to create a new scalable replicate recipe instead called
VPScalableReplicate, which also calls a new overloaded version of
scalarizeInstruction that generates a whole vector part in one go,
instead of generating N scalar instances for N lanes. The new version
of scalarizeInstruction is based on the original version, and currently
only supports certain cases, such as GEP instructions, or instructions
with loop-invariant operands.

Tests have been added here:

Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll

Diff Detail

Event Timeline

david-arm created this revision.Jun 30 2021, 8:34 AM

Herald added subscribers: rogfer01, hiraditya, kristof.beyls. · View Herald TranscriptJun 30 2021, 8:34 AM

david-arm requested review of this revision.Jun 30 2021, 8:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 8:34 AM

Herald added subscribers: llvm-commits, vkmr. · View Herald Transcript

david-arm added a parent revision: D105100: [NFC] Change setDebugLocFromInst to use the class Builder by default.Jun 30 2021, 8:34 AM

david-arm added a reviewer: fhahn.

Harbormaster completed remote builds in B111763: Diff 355560.Jun 30 2021, 9:11 AM

Hey David,
Thank you for adding me as a reviewer.
I try to add some comments that I believe are valid, hope they make sense.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3071	Do we need this test also for scalable recipe in scalarizeInstructions if (IfPredicateInstr) ?
8959–8960	I believe you don't need the assert See lib/Transforms/Vectorize/VPlan.h struct VFRange;
8961	Why do we need R and Recipe here?
8962	Should we test if this is not predicated instruction here? I don't see any test for that in the recipe.
llvm/test/Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll
3	Is it possible to use that with for all architectures? Or only for now for AArch64?
36	nit: s/while.body189/loop.body/ s/while.end192.loopexit/exit/
137	Why do you need 4,5,6 and 7?

david-arm added inline comments.Jul 5 2021, 1:07 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3071	Good question! So in VPRecipeBuilder::handleReplication I have explicitly asserted that we should never create this recipe with predication, because I think the only cases we'd hit are divides (which we don't support) or loads/stores. For the latter we'll widen the instruction using masked load/store intrinsics and shouldn't use this recipe I think.
8959–8960	Ah I see now - good spot thank you! I hadn't realised we already asserted this in the VFRange constructor.
8961	The problem here is caused by complicated multiple inheritance, i.e. class VPReplicateRecipe : public VPRecipeBase, public VPValue and the fact the functions below take pointers to only one of the inherited base classes, i.e. void setRecipe(Instruction I, VPRecipeBase R) { void addVPValue(Value V, VPValue VPV) In order to have a single common block here I would have to restructure the recipes to have the same common base, i.e. class VPReplicateBaseRecipe : public VPRecipeBase, public VPValue class VPScalableReplicateRecipe : public VPReplicateBaseRecipe { class VPReplicateRecipe : public VPReplicateBaseRecipe { This would mean I could then write: VPReplicateBaseRecipe *Recipe; if (IsScalable && !IsUniform) Recipe = new VPScalableReplicateRecipe(I, Plan->mapToVPValues(I->operands())); else Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()), IsUniform, IsPredicated); setRecipe(I, R); Plan->addVPValue(I, R); However, I wasn't sure which was more acceptable here - rewrite the class structures to be more complex or simply have two blocks of code here? Alternatively, if anyone knows some magic C++ that allows me to write a common block with rewriting the class structure that would be great too! I could well be missing some trick here.
8962	Ah you're right! I added an assert in the for loop below, but it's not quite the same thing. I'll add one here too.
llvm/test/Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll
3	Possibly so? I thought I might have tried that when I first started the patch and hit issues, but I can try again.
137	This is needed for one of the loops above that contains metadata: tail call void @llvm.experimental.noalias.scope.decl(metadata !4) This line is explicitly testing one of the code paths in the new scalarizeInstruction() function.

IIUC from the description, this recipe effectively either widens the instruction or just computes the first lane if it is uniform. Does it handle any other case?

In D105199#2857606, @fhahn wrote:

IIUC from the description, this recipe effectively either widens the instruction or just computes the first lane if it is uniform. Does it handle any other case?

Yes. If you look at the new scalarizeInstruction() it also broadcasts a value into a new instruction if the result is non-void and *all* operands are loop invariant. This is a it different to the original scalarizeInstruction function that simply creates a scalar instruction per lane, generating the same value each time. This tests the extractvalue case in one of the tests.

Addressed review comments. Removed unnecessary VFRange assert, added new one for IsPredicated and tidied up the test.

david-arm marked 7 inline comments as done.Jul 5 2021, 5:13 AM

david-arm added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll
3	After some discussion downstream we think it's better to keep such tests in the AArch64 directory because without a target it leads to problems related to not knowing about vector widths, scalable properties of the target, etc.

Harbormaster completed remote builds in B112431: Diff 356484.Jul 5 2021, 5:56 AM

kmclaughlin added inline comments.Jul 7 2021, 9:39 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3079	I'm a bit confused by the message for this assert, as I thought vectors are not considered to be aggregate types?
3133	This switch statement currently only has one case, was this added so that we can support more instructions in the future? If so, would it be better to instead just try a `dyn_cast<GetElementPtrInst>` here and then add the switch again later when we need to cover other instruction types?

Removed isAggregateTy assert for new scalarizeInstruction as I don't think we need to worry about this. I also updated the assert message for the original scalarizeInstruction function.
Specialised the new scalarizeInstruction function for GEPs, since that's the only case we currently deal with. It's more readable and efficient this way.

Harbormaster completed remote builds in B112949: Diff 357176.Jul 8 2021, 2:41 AM

david-arm marked 2 inline comments as done.Jul 8 2021, 2:42 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3079	That's a good point! I've removed the assert anyway as I don't think it's necessary, and I've also corrected the message for the assert in the original scalarizeInstruction function

In D105199#2857611, @david-arm wrote:

In D105199#2857606, @fhahn wrote:

IIUC from the description, this recipe effectively either widens the instruction or just computes the first lane if it is uniform. Does it handle any other case?

Yes. If you look at the new scalarizeInstruction() it also broadcasts a value into a new instruction if the result is non-void and *all* operands are loop invariant. This is a it different to the original scalarizeInstruction function that simply creates a scalar instruction per lane, generating the same value each time. This tests the extractvalue case in one of the tests.

Ok thanks. So effectively this handles 3 different cases (the instruction could be widened and only the first lane needs to be computed, with calls to llvm.experimental.noalias.scope.decl being a special case, where generating it once in the loop is also sufficient ), which should not really specific to scalable vectors, but for scalable vectors it crashes, right?

I'm not sure if we actually need a new recipe for those cases and we might be able to adjust the recipes in the plan to similar effect. Let sketch an alternative and share it later today or tomorrow.

In D105199#2864080, @fhahn wrote:

... Let sketch an alternative and share it later today or tomorrow.

This should be Let me ... sketch

In D105199#2864081, @fhahn wrote:

In D105199#2864080, @fhahn wrote:

... Let sketch an alternative and share it later today or tomorrow.

This should be Let me ... sketch

Hi @fhahn, if you want I'm also happy to sketch out an alternative if you can give me some pointers? I wrote the code this way as it wasn't obvious what other way I'd do this - at least for the GEP it's due to the decision to treat it as being "scalarised after vectorisation" very early on in the process.

It should be is possible to fix/adjust the recipes in the plan after initial generation, instead of anticipating those cases up front.

This should be easier in this case and I think the existing recipes should cover the functionality required. For example, the widening part for GEPs can be done in D105784. Note that this may also be beneficial for fixed-width vectors, pending a cost check. We should be able to do this not only for GEPs, but any widen able instruction. For the other cases, we only need to compute lane 0 or nothing in the intrinsic case. We can also adjust the recipes in a similar way.

david-arm mentioned this in D105784: [VPlan] Use vector version of GEP if result is used as vector..Jul 12 2021, 3:17 AM

sdesmalen mentioned this in D106164: [LV] Don't assume isScalarAfterVectorization if one of the uses needs widening..Jul 16 2021, 10:16 AM

Hi @david-arm, the case for the first test (phi_multiple_use) seems like an existing bug in collectLoopScalars where the LV incorrectly assumes that a value is scalar after vectorization even when it's also needed as a widened value.
The code already seemed to guard against similar cases, but some values still slipped through the cracks, as your test points out!

I've created a patch to address that in D106164.

Matt added a subscriber: Matt.Jul 16 2021, 11:38 AM

fhahn mentioned this in rG156b431c6658: [LV] Add test with ptr induction used as scalar and vector..Jul 19 2021, 4:16 AM

sdesmalen mentioned this in rG981e9dce5482: [LV] Don't assume isScalarAfterVectorization if one of the uses needs widening..Jul 26 2021, 8:02 AM

This is being implemented differently in a series of other patches currently being worked on.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

115 lines

35 lines

16 lines

2 lines

test/

Transforms/

LoopVectorize/

AArch64/

sve-vpreplicate.ll

139 lines

Diff 357176

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 512 Lines • ▼ Show 20 Lines	public:
/// Generates a sequence of scalar instances for each lane between \p MinLane		/// Generates a sequence of scalar instances for each lane between \p MinLane
/// and \p MaxLane, times each part between \p MinPart and \p MaxPart,		/// and \p MaxLane, times each part between \p MinPart and \p MaxPart,
/// inclusive. Uses the VPValue operands from \p Operands instead of \p		/// inclusive. Uses the VPValue operands from \p Operands instead of \p
/// Instr's operands.		/// Instr's operands.
void scalarizeInstruction(Instruction Instr, VPValue Def, VPUser &Operands,		void scalarizeInstruction(Instruction Instr, VPValue Def, VPUser &Operands,
const VPIteration &Instance, bool IfPredicateInstr,		const VPIteration &Instance, bool IfPredicateInstr,
VPTransformState &State);		VPTransformState &State);

		/// A helper function to scalarize a single Instruction in the innermost
		/// loop, which is currently only used for scalable vectors.
		/// Generates a whole vector equivalent for a given \p Part, which may
		/// involve a simple broadcast in the case of uniform instructions. Uses the
		/// VPValue operands from \p Operands instead of \p Instr's operands.
		void scalarizeInstruction(Instruction Instr, VPValue Def, VPUser &Operands,
		unsigned Part, VPTransformState &State);

/// Widen an integer or floating-point induction variable \p IV. If \p Trunc		/// Widen an integer or floating-point induction variable \p IV. If \p Trunc
/// is provided, the integer induction variable will first be truncated to		/// is provided, the integer induction variable will first be truncated to
/// the corresponding type.		/// the corresponding type.
void widenIntOrFpInduction(PHINode IV, Value Start, TruncInst *Trunc,		void widenIntOrFpInduction(PHINode IV, Value Start, TruncInst *Trunc,
VPValue Def, VPValue CastDef,		VPValue Def, VPValue CastDef,
VPTransformState &State);		VPTransformState &State);

/// Construct the vector value of a scalarized value \p V one lane at a time.		/// Construct the vector value of a scalarized value \p V one lane at a time.
▲ Show 20 Lines • Show All 2,484 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(
}		}
}		}

void InnerLoopVectorizer::scalarizeInstruction(Instruction Instr, VPValue Def,		void InnerLoopVectorizer::scalarizeInstruction(Instruction Instr, VPValue Def,
VPUser &User,		VPUser &User,
const VPIteration &Instance,		const VPIteration &Instance,
bool IfPredicateInstr,		bool IfPredicateInstr,
VPTransformState &State) {		VPTransformState &State) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() &&
		"Can't handle aggregate types");

// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for		// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for
// the first lane and part.		// the first lane and part.
if (isa<NoAliasScopeDeclInst>(Instr))		if (isa<NoAliasScopeDeclInst>(Instr))
if (!Instance.isFirstIteration())		if (!Instance.isFirstIteration())
return;		return;

setDebugLocFromInst(Instr);		setDebugLocFromInst(Instr);
Show All 24 Lines	void InnerLoopVectorizer::scalarizeInstruction(Instruction Instr, VPValue Def,
Builder.Insert(Cloned);		Builder.Insert(Cloned);

State.set(Def, Cloned, Instance);		State.set(Def, Cloned, Instance);

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<AssumeInst>(Cloned))		if (auto *II = dyn_cast<AssumeInst>(Cloned))
AC->registerAssumption(II);		AC->registerAssumption(II);

// End if-block.		// End if-block.
		CarolineConcattoUnsubmitted Done Reply Inline Actions Do we need this test also for scalable recipe in scalarizeInstructions if (IfPredicateInstr) ? CarolineConcatto: Do we need this test also for scalable recipe in scalarizeInstructions ``` if…
		david-armAuthorUnsubmitted Done Reply Inline Actions Good question! So in VPRecipeBuilder::handleReplication I have explicitly asserted that we should never create this recipe with predication, because I think the only cases we'd hit are divides (which we don't support) or loads/stores. For the latter we'll widen the instruction using masked load/store intrinsics and shouldn't use this recipe I think. david-arm: Good question! So in VPRecipeBuilder::handleReplication I have explicitly asserted that we…
if (IfPredicateInstr)		if (IfPredicateInstr)
PredicatedInstructions.push_back(Cloned);		PredicatedInstructions.push_back(Cloned);
}		}

		void InnerLoopVectorizer::scalarizeInstruction(Instruction Instr, VPValue Def,
		VPUser &User, unsigned Part,
		VPTransformState &State) {
		// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated
		kmclaughlinUnsubmitted Done Reply Inline Actions I'm a bit confused by the message for this assert, as I thought vectors are not considered to be aggregate types? kmclaughlin: I'm a bit confused by the message for this assert, as I thought vectors are not considered to…
		david-armAuthorUnsubmitted Done Reply Inline Actions That's a good point! I've removed the assert anyway as I don't think it's necessary, and I've also corrected the message for the assert in the original scalarizeInstruction function david-arm: That's a good point! I've removed the assert anyway as I don't think it's necessary, and I've…
		// for the first part.
		if (isa<NoAliasScopeDeclInst>(Instr) && Part != 0)
		return;

		setDebugLocFromInst(Instr);

		State.Builder.SetInsertPoint(Builder.GetInsertBlock(),
		Builder.GetInsertPoint());

		// Does this instruction return a value ?
		bool IsVoidRetTy = Instr->getType()->isVoidTy();

		Value *NewPart = nullptr;

		if (OrigLoop->hasLoopInvariantOperands(Instr)) {
		// This instruction does not have any operands that vary in the loop.

		// First we clone the scalar instruction for the vector.body and copy
		// the metadata across.
		Instruction *Cloned = Instr->clone();
		addNewMetadata(Cloned, Instr);

		// Place the cloned scalar in the new loop.
		Builder.Insert(Cloned);

		if (!IsVoidRetTy) {
		// Since the instruction returns a scalar type we should broadcast that
		// value across all lanes of the vector.
		Cloned->setName(Instr->getName() + ".cloned");
		NewPart = State.Builder.CreateVectorSplat(State.VF, (Value *)Cloned);
		addMetadata(NewPart, Instr);
		} else
		NewPart = Cloned;
		} else if (auto *GEP = dyn_cast<GetElementPtrInst>(Instr)) {
		Value *Ptr = nullptr;
		SmallVector<Value *, 2> Ops;

		// Create a new set of operands for the vector instruction. If the operand
		// is invariant or uniform in the loop we leave it as a scalar, otherwise
		// we use the full vector equivalent.
		for (unsigned OpI = 0, E = User.getNumOperands(); OpI != E; ++OpI) {
		auto *Operand = dyn_cast<Instruction>(Instr->getOperand(OpI));
		Value *NewVal = nullptr;
		if (!Operand \|\| !OrigLoop->contains(Operand) \|\|
		Cost->isUniformAfterVectorization(Operand, State.VF)) {
		VPIteration InputInstance(Part, 0);
		InputInstance.Lane = VPLane::getFirstLane();
		NewVal = State.get(User.getOperand(OpI), InputInstance);
		} else
		NewVal = State.get(User.getOperand(OpI), Part);
		if (OpI)
		Ops.push_back(NewVal);
		else
		Ptr = NewVal;
		kmclaughlinUnsubmitted Done Reply Inline Actions This switch statement currently only has one case, was this added so that we can support more instructions in the future? If so, would it be better to instead just try a `dyn_cast<GetElementPtrInst>` here and then add the switch again later when we need to cover other instruction types? kmclaughlin: This switch statement currently only has one case, was this added so that we can support more…
		}

		auto *NewGEP =
		GEP->isInBounds()
		? Builder.CreateInBoundsGEP(GEP->getSourceElementType(), Ptr, Ops)
		: Builder.CreateGEP(GEP->getSourceElementType(), Ptr, Ops);
		NewPart = NewGEP;

		addMetadata(NewPart, Instr);
		} else
		llvm_unreachable(
		"Don't know to scalarize this instruction for scalable vectors!");

		State.set(Def, NewPart, Part);
		}

PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,		PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,
Value End, Value Step,		Value End, Value Step,
Instruction *DL) {		Instruction *DL) {
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
// As we're just creating this loop, it's possible no latch exists		// As we're just creating this loop, it's possible no latch exists
// yet. If so, use the header as this will be a single block loop.		// yet. If so, use the header as this will be a single block loop.
if (!Latch)		if (!Latch)
▲ Show 20 Lines • Show All 5,792 Lines • ▼ Show 20 Lines	VPBasicBlock *VPRecipeBuilder::handleReplication(
VPlanPtr &Plan) {		VPlanPtr &Plan) {
bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },		[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },
Range);		Range);

bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isPredicatedInst(I); }, Range);		[&](ElementCount VF) { return CM.isPredicatedInst(I); }, Range);

auto *Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),		bool IsScalable = Range.Start.isScalable();
		VPRecipeBase *Recipe;
		if (IsScalable && !IsUniform) {
		CarolineConcattoUnsubmitted Done Reply Inline Actions I believe you don't need the assert See lib/Transforms/Vectorize/VPlan.h struct VFRange; CarolineConcatto: I believe you don't need the assert See lib/Transforms/Vectorize/VPlan.h struct VFRange;
		david-armAuthorUnsubmitted Done Reply Inline Actions Ah I see now - good spot thank you! I hadn't realised we already asserted this in the VFRange constructor. david-arm: Ah I see now - good spot thank you! I hadn't realised we already asserted this in the VFRange…
		assert(!IsPredicated &&
		CarolineConcattoUnsubmitted Done Reply Inline Actions Why do we need R and Recipe here? CarolineConcatto: Why do we need R and Recipe here?
		david-armAuthorUnsubmitted Done Reply Inline Actions The problem here is caused by complicated multiple inheritance, i.e. class VPReplicateRecipe : public VPRecipeBase, public VPValue and the fact the functions below take pointers to only one of the inherited base classes, i.e. void setRecipe(Instruction I, VPRecipeBase R) { void addVPValue(Value V, VPValue VPV) In order to have a single common block here I would have to restructure the recipes to have the same common base, i.e. class VPReplicateBaseRecipe : public VPRecipeBase, public VPValue class VPScalableReplicateRecipe : public VPReplicateBaseRecipe { class VPReplicateRecipe : public VPReplicateBaseRecipe { This would mean I could then write: VPReplicateBaseRecipe Recipe; if (IsScalable && !IsUniform) Recipe = new VPScalableReplicateRecipe(I, Plan->mapToVPValues(I->operands())); else Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()), IsUniform, IsPredicated); setRecipe(I, R); Plan->addVPValue(I, R); However, I wasn't sure which was more acceptable here - rewrite the class structures to be more complex or simply have two blocks of code here? Alternatively, if anyone knows some magic C++ that allows me to write a common block with rewriting the class structure that would be great too! I could well be missing some trick here. david-arm:* The problem here is caused by complicated multiple inheritance, i.e. class VPReplicateRecipe…
		"Don't expect to replicate predicated scalable instructions");
		CarolineConcattoUnsubmitted Done Reply Inline Actions Should we test if this is not predicated instruction here? I don't see any test for that in the recipe. CarolineConcatto: Should we test if this is not predicated instruction here? I don't see any test for that in the…
		david-armAuthorUnsubmitted Done Reply Inline Actions Ah you're right! I added an assert in the for loop below, but it's not quite the same thing. I'll add one here too. david-arm: Ah you're right! I added an assert in the for loop below, but it's not quite the same thing.
		auto *R =
		new VPScalableReplicateRecipe(I, Plan->mapToVPValues(I->operands()));
		setRecipe(I, R);
		Plan->addVPValue(I, R);
		Recipe = R;
		} else {
		auto *R = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),
IsUniform, IsPredicated);		IsUniform, IsPredicated);
setRecipe(I, Recipe);		setRecipe(I, R);
Plan->addVPValue(I, Recipe);		Plan->addVPValue(I, R);
		Recipe = R;
		}

// Find if I uses a predicated instruction. If so, it will use its scalar		// Find if I uses a predicated instruction. If so, it will use its scalar
// value. Avoid hoisting the insert-element which packs the scalar value into		// value. Avoid hoisting the insert-element which packs the scalar value into
// a vector value, as that happens iff all users use the vector value.		// a vector value, as that happens iff all users use the vector value.
for (VPValue *Op : Recipe->operands()) {		for (VPValue *Op : Recipe->operands()) {
auto *PredR = dyn_cast_or_null<VPPredInstPHIRecipe>(Op->getDef());		auto *PredR = dyn_cast_or_null<VPPredInstPHIRecipe>(Op->getDef());
if (!PredR)		if (!PredR)
continue;		continue;
		assert(!IsScalable &&
		"Don't expect to replicate predicated scalable instructions");
auto *RepR =		auto *RepR =
cast_or_null<VPReplicateRecipe>(PredR->getOperand(0)->getDef());		cast_or_null<VPReplicateRecipe>(PredR->getOperand(0)->getDef());
assert(RepR->isPredicated() &&		assert(RepR->isPredicated() &&
"expected Replicate recipe to be predicated");		"expected Replicate recipe to be predicated");
RepR->setAlsoPack(false);		RepR->setAlsoPack(false);
}		}

// Finalize the recipe for Instr, first if it is not predicated.		// Finalize the recipe for Instr, first if it is not predicated.
▲ Show 20 Lines • Show All 678 Lines • ▼ Show 20 Lines	else {
NextInChain = State.Builder.CreateBinOp(		NextInChain = State.Builder.CreateBinOp(
(Instruction::BinaryOps)getUnderlyingInstr()->getOpcode(), NewRed,		(Instruction::BinaryOps)getUnderlyingInstr()->getOpcode(), NewRed,
PrevInChain);		PrevInChain);
}		}
State.set(this, NextInChain, Part);		State.set(this, NextInChain, Part);
}		}
}		}

		void VPScalableReplicateRecipe::execute(VPTransformState &State) {
		assert(State.VF.isScalable() && "Only expect scalable vectors");
		for (unsigned Part = 0; Part < State.UF; ++Part)
		State.ILV->scalarizeInstruction(getUnderlyingInstr(), this, *this, Part,
		State);
		}

void VPReplicateRecipe::execute(VPTransformState &State) {		void VPReplicateRecipe::execute(VPTransformState &State) {
if (State.Instance) { // Generate a single instance.		if (State.Instance) { // Generate a single instance.
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");		assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
State.ILV->scalarizeInstruction(getUnderlyingInstr(), this, *this,		State.ILV->scalarizeInstruction(getUnderlyingInstr(), this, *this,
*State.Instance, IsPredicated, State);		*State.Instance, IsPredicated, State);
// Insert scalar instance packing it into a vector.		// Insert scalar instance packing it into a vector.
if (AlsoPack && State.VF.isVector()) {		if (AlsoPack && State.VF.isVector()) {
// If we're constructing lane 0, initialize to start from poison.		// If we're constructing lane 0, initialize to start from poison.
▲ Show 20 Lines • Show All 795 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 749 Lines • ▼ Show 20 Lines	inline bool VPUser::classof(const VPDef *Def) {
return Def->getVPDefID() == VPRecipeBase::VPInstructionSC \|\|		return Def->getVPDefID() == VPRecipeBase::VPInstructionSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenCallSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenCallSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenSelectSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenSelectSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenGEPSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenGEPSC \|\|
Def->getVPDefID() == VPRecipeBase::VPBlendSC \|\|		Def->getVPDefID() == VPRecipeBase::VPBlendSC \|\|
Def->getVPDefID() == VPRecipeBase::VPInterleaveSC \|\|		Def->getVPDefID() == VPRecipeBase::VPInterleaveSC \|\|
Def->getVPDefID() == VPRecipeBase::VPReplicateSC \|\|		Def->getVPDefID() == VPRecipeBase::VPReplicateSC \|\|
		Def->getVPDefID() == VPRecipeBase::VPScalableReplicateSC \|\|
Def->getVPDefID() == VPRecipeBase::VPReductionSC \|\|		Def->getVPDefID() == VPRecipeBase::VPReductionSC \|\|
Def->getVPDefID() == VPRecipeBase::VPBranchOnMaskSC \|\|		Def->getVPDefID() == VPRecipeBase::VPBranchOnMaskSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;		Def->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;
}		}

/// This is a concrete Recipe that models a single VPlan-level instruction.		/// This is a concrete Recipe that models a single VPlan-level instruction.
/// While as any Recipe it may generate a sequence of IR instructions when		/// While as any Recipe it may generate a sequence of IR instructions when
/// executed, these instructions would always form a single-def expression as		/// executed, these instructions would always form a single-def expression as
▲ Show 20 Lines • Show All 568 Lines • ▼ Show 20 Lines	#endif

bool isUniform() const { return IsUniform; }		bool isUniform() const { return IsUniform; }

bool isPacked() const { return AlsoPack; }		bool isPacked() const { return AlsoPack; }

bool isPredicated() const { return IsPredicated; }		bool isPredicated() const { return IsPredicated; }
};		};

		/// VPScalableReplicateRecipe replicates a given instruction producing a whole
		/// vector value whose lanes are copies of the original scalar type, one per
		/// lane. If the instruction is known to be uniform only one copy, per lane
		/// zero, will be generated.
		class VPScalableReplicateRecipe : public VPRecipeBase, public VPValue {
		public:
		template <typename IterT>
		VPScalableReplicateRecipe(Instruction *I, iterator_range<IterT> Operands)
		: VPRecipeBase(VPScalableReplicateSC, Operands),
		VPValue(VPVScalableReplicateSC, I, this) {}

		~VPScalableReplicateRecipe() override = default;

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPDef *D) {
		return D->getVPDefID() == VPRecipeBase::VPScalableReplicateSC;
		}

		static inline bool classof(const VPValue *V) {
		return V->getVPValueID() == VPValue::VPVScalableReplicateSC;
		}

		/// Generate replicas of the desired Ingredient. Replicas will be generated
		/// for all parts and lanes unless a specific part and lane are specified in
		/// the \p State.
		void execute(VPTransformState &State) override;

		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const override;
		#endif
		};

/// A recipe for generating conditional branches on the bits of a mask.		/// A recipe for generating conditional branches on the bits of a mask.
class VPBranchOnMaskRecipe : public VPRecipeBase {		class VPBranchOnMaskRecipe : public VPRecipeBase {
public:		public:
VPBranchOnMaskRecipe(VPValue *BlockInMask)		VPBranchOnMaskRecipe(VPValue *BlockInMask)
: VPRecipeBase(VPBranchOnMaskSC, {}) {		: VPRecipeBase(VPBranchOnMaskSC, {}) {
if (BlockInMask) // nullptr means all-one mask.		if (BlockInMask) // nullptr means all-one mask.
addOperand(BlockInMask);		addOperand(BlockInMask);
}		}
▲ Show 20 Lines • Show All 1,096 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 588 Lines • ▼ Show 20 Lines	case VPWidenSelectSC: {
assert((!I \|\| !I->mayHaveSideEffects()) &&		assert((!I \|\| !I->mayHaveSideEffects()) &&
"underlying instruction has side-effects");		"underlying instruction has side-effects");
return false;		return false;
}		}
case VPReplicateSC: {		case VPReplicateSC: {
auto *R = cast<VPReplicateRecipe>(this);		auto *R = cast<VPReplicateRecipe>(this);
return R->getUnderlyingInstr()->mayHaveSideEffects();		return R->getUnderlyingInstr()->mayHaveSideEffects();
}		}
		case VPScalableReplicateSC: {
		auto *R = cast<VPScalableReplicateRecipe>(this);
		return R->getUnderlyingInstr()->mayHaveSideEffects();
		}
default:		default:
return true;		return true;
}		}
}		}

void VPRecipeBase::insertBefore(VPRecipeBase *InsertPos) {		void VPRecipeBase::insertBefore(VPRecipeBase *InsertPos) {
assert(!Parent && "Recipe already in some VPBasicBlock");		assert(!Parent && "Recipe already in some VPBasicBlock");
assert(InsertPos->getParent() &&		assert(InsertPos->getParent() &&
▲ Show 20 Lines • Show All 555 Lines • ▼ Show 20 Lines	void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
}		}
O << Instruction::getOpcodeName(getUnderlyingInstr()->getOpcode()) << " ";		O << Instruction::getOpcodeName(getUnderlyingInstr()->getOpcode()) << " ";
printOperands(O, SlotTracker);		printOperands(O, SlotTracker);

if (AlsoPack)		if (AlsoPack)
O << " (S->V)";		O << " (S->V)";
}		}

		void VPScalableReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const {
		O << Indent << "SCALABLE REPLICATE ";

		if (!getUnderlyingInstr()->getType()->isVoidTy()) {
		printAsOperand(O, SlotTracker);
		O << " = ";
		}
		O << Instruction::getOpcodeName(getUnderlyingInstr()->getOpcode()) << " ";
		printOperands(O, SlotTracker);
		}

void VPPredInstPHIRecipe::print(raw_ostream &O, const Twine &Indent,		void VPPredInstPHIRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
O << Indent << "PHI-PREDICATED-INSTRUCTION ";		O << Indent << "PHI-PREDICATED-INSTRUCTION ";
printAsOperand(O, SlotTracker);		printAsOperand(O, SlotTracker);
O << " = ";		O << " = ";
printOperands(O, SlotTracker);		printOperands(O, SlotTracker);
}		}

▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanValue.h

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	public:
enum {		enum {
VPValueSC,		VPValueSC,
VPVBlendSC,		VPVBlendSC,
VPVInstructionSC,		VPVInstructionSC,
VPVMemoryInstructionSC,		VPVMemoryInstructionSC,
VPVPredInstPHI,		VPVPredInstPHI,
VPVReductionSC,		VPVReductionSC,
VPVReplicateSC,		VPVReplicateSC,
		VPVScalableReplicateSC,
VPVWidenSC,		VPVWidenSC,
VPVWidenCallSC,		VPVWidenCallSC,
VPVWidenGEPSC,		VPVWidenGEPSC,
VPVWidenIntOrFpIndcutionSC,		VPVWidenIntOrFpIndcutionSC,
VPVWidenPHISC,		VPVWidenPHISC,
VPVWidenSelectSC,		VPVWidenSelectSC,
};		};

▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	public:
using VPRecipeTy = enum {		using VPRecipeTy = enum {
VPBlendSC,		VPBlendSC,
VPBranchOnMaskSC,		VPBranchOnMaskSC,
VPInstructionSC,		VPInstructionSC,
VPInterleaveSC,		VPInterleaveSC,
VPPredInstPHISC,		VPPredInstPHISC,
VPReductionSC,		VPReductionSC,
VPReplicateSC,		VPReplicateSC,
		VPScalableReplicateSC,
VPWidenCallSC,		VPWidenCallSC,
VPWidenCanonicalIVSC,		VPWidenCanonicalIVSC,
VPWidenGEPSC,		VPWidenGEPSC,
VPWidenIntOrFpInductionSC,		VPWidenIntOrFpInductionSC,
VPWidenMemoryInstructionSC,		VPWidenMemoryInstructionSC,
VPWidenPHISC,		VPWidenPHISC,
VPWidenSC,		VPWidenSC,
VPWidenSelectSC		VPWidenSelectSC
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-vpreplicate.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -S \| FileCheck %s

				CarolineConcattoUnsubmitted Done Reply Inline Actions Is it possible to use that with for all architectures? Or only for now for AArch64? CarolineConcatto: Is it possible to use that with for all architectures? Or only for now for AArch64?
				david-armAuthorUnsubmitted Done Reply Inline Actions Possibly so? I thought I might have tried that when I first started the patch and hit issues, but I can try again. david-arm: Possibly so? I thought I might have tried that when I first started the patch and hit issues…
				david-armAuthorUnsubmitted Done Reply Inline Actions After some discussion downstream we think it's better to keep such tests in the AArch64 directory because without a target it leads to problems related to not knowing about vector widths, scalable properties of the target, etc. david-arm: After some discussion downstream we think it's better to keep such tests in the AArch64…
				target triple = "aarch64-unknown-linux-gnu"

				; In the test below the PHI instruction:
				; %0 = phi i8* [ %incdec.ptr190, %loop.body ], [ %src, %entry ]
				; has multiple uses, i.e.
				; 1. As a uniform address for the load, and
				; 2. Non-uniform use by the getelementptr + store, which leads to replication.

				define void @phi_multiple_use(i8** noalias %curptr, i8* noalias %src, i8* noalias %cond.i, i64 %N) #0 {
				; CHECK-LABEL: @phi_multiple_use(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ {{.}}, %vector.body ]
				; CHECK-NEXT: {{.*}} = add i64 [[INDEX1]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX1]], 0
				; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8** %curptr, i64 [[TMP1]]
				; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[INDEX1]], i32 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP3:%.*]] = add <vscale x 2 x i64> shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 0, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer), [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = add <vscale x 2 x i64> [[DOTSPLAT]], [[TMP3]]
				; CHECK-NEXT: [[NEXT_GEP6:%.]] = getelementptr i8, i8 %src, <vscale x 2 x i64> [[TMP4]]
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, <vscale x 2 x i8> [[NEXT_GEP6]], i64 1
				; CHECK: store <vscale x 2 x i8> [[TMP5]], <vscale x 2 x i8>*
				; CHECK-NEXT: [[TMP6:%.]] = extractelement <vscale x 2 x i8> [[NEXT_GEP6]], i32 0
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr i8, i8 [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[TMP7]] to <vscale x 2 x i8>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 2 x i8>, <vscale x 2 x i8> [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 2 x i8> [[WIDE_LOAD]],
				; CHECK: store <vscale x 2 x i8> [[TMP9]], <vscale x 2 x i8>*

				entry:
				br label %loop.body

				CarolineConcattoUnsubmitted Done Reply Inline Actions nit: s/while.body189/loop.body/ s/while.end192.loopexit/exit/ CarolineConcatto: nit: s/while.body189/loop.body/ s/while.end192.loopexit/exit/
				loop.body: ; preds = %loop.body, %entry
				%index = phi i64 [ 0, %entry ], [ %index.next, %loop.body ]
				%curchar = phi i8** [ %curchar.next, %loop.body ], [ %curptr, %entry ]
				%0 = phi i8* [ %incdec.ptr190, %loop.body ], [ %src, %entry ]
				%incdec.ptr190 = getelementptr inbounds i8, i8* %0, i64 1
				%curchar.next = getelementptr inbounds i8, i8* %curchar, i64 1
				store i8* %incdec.ptr190, i8** %curchar, align 8
				%1 = load i8, i8* %0, align 1
				%2 = add i8 %1, 1
				store i8 %2, i8* %0, align 1
				%index.next = add nuw i64 %index, 1
				%3 = icmp ne i64 %index.next, %N
				br i1 %3, label %loop.body, label %exit, !llvm.loop !0

				exit: ; preds = %loop.body
				ret void
				}

				define void @replicate_noalias_decl(i8** noalias %curptr, i8* noalias %src, i8* noalias %cond.i, i64 %N) #0 {
				; CHECK-LABEL: @replicate_noalias_decl(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ {{.}}, %vector.body ]
				; CHECK-NEXT: {{.*}} = add i64 [[INDEX1]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX1]], 0
				; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8** %curptr, i64 [[TMP1]]
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX1]], 0
				; CHECK-NEXT: [[TMP3:%.]] = getelementptr i8, i8 %src, i64 [[TMP2]]
				; CHECK-NEXT: tail call void @llvm.experimental.noalias.scope.decl
				; CHECK-NOT: tail call void @llvm.experimental.noalias.scope.decl
				; CHECK-NEXT: [[TMP4:%.]] = getelementptr i8, i8 [[TMP3]], i32 0
				; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <vscale x 2 x i8>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 2 x i8>, <vscale x 2 x i8> [[TMP5]]
				; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 2 x i8> [[WIDE_LOAD]],
				; CHECK: store <vscale x 2 x i8> [[TMP6]], <vscale x 2 x i8>*

				entry:
				br label %loop.body

				loop.body: ; preds = %loop.body, %entry
				%index = phi i64 [ 0, %entry ], [ %index.next, %loop.body ]
				%curchar = phi i8** [ %curchar.next, %loop.body ], [ %curptr, %entry ]
				%0 = phi i8* [ %incdec.ptr190, %loop.body ], [ %src, %entry ]
				%curchar.next = getelementptr inbounds i8, i8* %curchar, i64 1
				tail call void @llvm.experimental.noalias.scope.decl(metadata !4)
				%1 = load i8, i8* %0, align 1
				%2 = add i8 %1, 1
				store i8 %2, i8* %0, align 1
				%incdec.ptr190 = getelementptr inbounds i8, i8* %0, i64 1
				%index.next = add nuw i64 %index, 1
				%3 = icmp ne i64 %index.next, %N
				br i1 %3, label %loop.body, label %exit, !llvm.loop !0

				exit: ; preds = %loop.body
				ret void
				}

				define void @replicate_extractvalue(i64* %dst, {i64, i64} %sv) #0 {
				; CHECK-LABEL: replicate_extractvalue(
				; CHECK: vector.body: ; preds = %vector.body, %vector.ph
				; CHECK-NEXT: [[INDEX1:%.]] = phi i32 [ 0, %vector.ph ], [ {{.}}, %vector.body ]
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX1]], 0
				; CHECK-NEXT: [[EXTRACT1:%.*]] = extractvalue { i64, i64 } %sv, 0
				; CHECK-NEXT: [[TMP2:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[EXTRACT1]], i32 0
				; CHECK-NEXT: [[SPLAT1:%.*]] = shufflevector <vscale x 2 x i64> [[TMP2]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[EXTRACT2:%.*]] = extractvalue { i64, i64 } %sv, 1
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[EXTRACT2]], i32 0
				; CHECK-NEXT: [[SPLAT2:%.*]] = shufflevector <vscale x 2 x i64> [[TMP3]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr i64, i64 %dst, i32 [[TMP1]]
				; CHECK-NEXT: [[STOREVAL:%.*]] = add <vscale x 2 x i64> [[SPLAT1]], [[SPLAT2]]
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr i64, i64 [[GEP1]], i32 0
				; CHECK-NEXT: [[STOREPTR:%.]] = bitcast i64 [[GEP2]] to <vscale x 2 x i64>*
				; CHECK-NEXT: store <vscale x 2 x i64> [[STOREVAL]], <vscale x 2 x i64>* [[STOREPTR]], align 4

				entry:
				br label %loop.body

				loop.body:
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.body ]
				%a = extractvalue { i64, i64 } %sv, 0
				%b = extractvalue { i64, i64 } %sv, 1
				%addr = getelementptr i64, i64* %dst, i32 %iv
				%add = add i64 %a, %b
				store i64 %add, i64* %addr
				%iv.next = add nsw i32 %iv, 1
				%cond = icmp ne i32 %iv.next, 0
				br i1 %cond, label %loop.body, label %exit, !llvm.loop !0

				exit:
				ret void
				}

				declare void @llvm.experimental.noalias.scope.decl(metadata)

				attributes #0 = {"target-features"="+sve"}

				!0 = distinct !{!0, !1, !2, !3}
				!1 = !{!"llvm.loop.interleave.count", i32 1}
				!2 = !{!"llvm.loop.vectorize.width", i32 2}
				!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!4 = !{ !5 }
				!5 = distinct !{ !5, !6 }
				CarolineConcattoUnsubmitted Done Reply Inline Actions Why do you need 4,5,6 and 7? CarolineConcatto: Why do you need 4,5,6 and 7?
				david-armAuthorUnsubmitted Done Reply Inline Actions This is needed for one of the loops above that contains metadata: tail call void @llvm.experimental.noalias.scope.decl(metadata !4) This line is explicitly testing one of the code paths in the new scalarizeInstruction() function. david-arm: This is needed for one of the loops above that contains metadata: tail call void @llvm.
				!6 = distinct !{ !7 }
				!7 = distinct !{ !7, !6 }