This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/2
LoopVectorize.cpp
-
VPRecipeBuilder.h
3/5
VPlan.h
-
VPlanValue.h
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
interleaved-accesses.ll

Differential D84684

[VPlan] Use VPValue def for VPInterleaveRecipe.
AbandonedPublic

Authored by fhahn on Jul 27 2020, 10:42 AM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
rengolin

Summary

This patch turns VPInterleaveRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

VPInterleaveRecipe produces multiple result values (one for each group
member). This patch introduces a very simple VPMultiUse class.

VPInterleaveRecipe is a VPValue itself, to track all uses of any result
value. For each member, it also has an individual VPMultiUse which is
used for each member result.

This is mostly a straight-forward initial approach to model operations
with multiple results and I am sure this can be much improved.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jul 27 2020, 10:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2020, 10:42 AM

Herald added subscribers: vkmr, psnobl, rogfer01 and 2 others. · View Herald Transcript

Harbormaster failed remote builds in B65872: Diff 280973!Jul 27 2020, 10:43 AM

fhahn added child revisions: D84683: [VPlan] Use VPValue def for VPWidenGEPRecipe., D84682: [VPlan] Use VPValue def for VPWidenSelectRecipe., D84681: [VPlan] Use VPValue def for VPWidenCall., D84680: [VPlan] Use VPValue def for VPMemoryInstructionRecipe., D84679: [VPlan] Disconnect VPValue and VPUser..Jul 27 2020, 10:43 AM

fhahn mentioned this in D74695: [VPlan] Replace VPWidenRecipe with VPInstruction (WIP)..Jul 27 2020, 10:47 AM

bmahjour added a subscriber: bmahjour.Jul 27 2020, 2:07 PM

Split off multi-value parts to D87752

Harbormaster completed remote builds in B71853: Diff 292160.Sep 16 2020, 2:59 AM

Would this now be based upon D88380 now instead?

I was thinking about interleaving a little. Trying some experiments to do with "de-interleaving" code. (Much like the VPlanSLP stuff I think).

The LLVM-IR equivalent of this would be an extract from a struct? Would it be worth representing this in a similar way? There could be a VPInterleaveRecipe which was a VPValue, with multiple "VPExtractElementRecipe"'s or something similar. Those would have 1 operand - the VPInterleaveRecipe and act as the "real" VPValues. They could have a 0 cost and possible hold the index for which element they are getting out of the interleaving group. Maybe there would be a validate method to make sure they remain as the only users of a VPInterleaveRecipe.

It is essentially equivalent to these other ways of handling it as a new type of VPValue, but isn't as tied into the lower levels of the class. What do you think? There might be something about it that wouldn't work, but it might be a simple way to handle these multiple-result nodes.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7328	Should this be surrounded by some sort of "if it's a load IG" check? Equally, do we need to be adding the operands for a store IG group?
llvm/lib/Transforms/Vectorize/VPlan.h
1065	This looks left over from something.

Thanks for taking a look. Addressed comments.

In D84684#2308806, @dmgreen wrote:

Would this now be based upon D88380 now instead?

Yep, let me update the links in Phab.

I was thinking about interleaving a little. Trying some experiments to do with "de-interleaving" code. (Much like the VPlanSLP stuff I think).

The LLVM-IR equivalent of this would be an extract from a struct? Would it be worth representing this in a similar way? There could be a VPInterleaveRecipe which was a VPValue, with multiple "VPExtractElementRecipe"'s or something similar. Those would have 1 operand - the VPInterleaveRecipe and act as the "real" VPValues. They could have a 0 cost and possible hold the index for which element they are getting out of the interleaving group. Maybe there would be a validate method to make sure they remain as the only users of a VPInterleaveRecipe.

It is essentially equivalent to these other ways of handling it as a new type of VPValue, but isn't as tied into the lower levels of the class. What do you think? There might be something about it that wouldn't work, but it might be a simple way to handle these multiple-result nodes.

Agreed, we could also model it so that VPInterleaveRecipe creates a single vector with all loaded values and then use extract instructions for the different parts of the interleave group. This keeps the modeling in the VPValue classes & co a bit simpler. But it has the cost that we need to add additional instructions/recipes to the plan, which need to be considered by other analyses and transformations. Another problem is that we would have to materialize a vector with all loaded values, which is completely artificial.

I think we have the make the following trade-off. Either we add additional complexity to VPValue & co to model recipes that produce multiple VPValues or we add additional instructions/recipes to the plans we generate.

Personally I think the additional complexity in VPValue & co is worth having simpler plans. Also, there might be additional complex recipes that produce multiple values in the future, which would also benefit from dedicated multi-value support.

Harbormaster completed remote builds in B73999: Diff 296186.Oct 5 2020, 7:58 AM

fhahn added a parent revision: D88380: [VPlan] Extend VPValue to also model sub- & 'virtual' values..Oct 5 2020, 8:41 AM

fhahn edited child revisions, added: D88379: [VPlan] Make VPRecipeBase a VPValue (WIP)., D88378: [VPlan] Make VPUser a subclass of VPValue again (WIP).; removed: D84680: [VPlan] Use VPValue def for VPMemoryInstructionRecipe., D84681: [VPlan] Use VPValue def for VPWidenCall., D84682: [VPlan] Use VPValue def for VPWidenSelectRecipe., D84683: [VPlan] Use VPValue def for VPWidenGEPRecipe..

fhahn added inline comments.Oct 5 2020, 8:44 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7328	Right, we only need to add VPValues for load groups, updated. I should probably also add a check that we do not add VPValues to a plan with underlying instructions/values of type void.
llvm/lib/Transforms/Vectorize/VPlan.h
1065	Indeed, that should be gone now, thanks!

Agreed, we could also model it so that VPInterleaveRecipe creates a single vector with all loaded values and then use extract instructions for the different parts of the interleave group. This keeps the modeling in the VPValue classes & co a bit simpler. But it has the cost that we need to add additional instructions/recipes to the plan, which need to be considered by other analyses and transformations. Another problem is that we would have to materialize a vector with all loaded values, which is completely artificial.

I was considering that the VPInterleaveRecipe and the VPExtract's would work in tandem. You would never get the "all loaded values" vector, it would just create the interleaving gorup as it does at the moment, with the VPExtracts representing the values for each item. They would still be special values, VPExtracts would be the only type of recipe that could use a VPInterleave, and the VPExtract would not really contain anything in it's execute method.

You could just consider the VPExtract as a new type of VPUser, not a VPRecipe directly. It's really a convenient way of using the operands to join up the two elements, not having to rely upon UnderlyingOrProducerTy. I guess that's a lot like the VPMultiValue concept. Using a new type of VPUser removes the complexity from VPValue at least!

I think we have the make the following trade-off. Either we add additional complexity to VPValue & co to model recipes that produce multiple VPValues or we add additional instructions/recipes to the plans we generate.

Personally I think the additional complexity in VPValue & co is worth having simpler plans. Also, there might be additional complex recipes that produce multiple values in the future, which would also benefit from dedicated multi-value support.

It feels like we have one type system (VPValueSC/VPInstructionSC/VPWidenSC/etc), and we are adding another system on top of it, for different types of VPValue via UnderlyingOrProducerTy being virtual or subvalues or concrete.

Is there a nice way at the moment of detecting _which_ output you are seeing from a subvalue VPValue? As in which member index it would be from a interleaved group.

llvm/lib/Transforms/Vectorize/VPlan.cpp
966 ↗	(On Diff #296186)	Is this used at the moment? Should we be blindly replacing all the different uses with the same value?
llvm/lib/Transforms/Vectorize/VPlan.h
1023–1024	-> VPRecipeBase::VPInterleaveSC
1028	Does anything delete these Defs?

In D84684#2314359, @dmgreen wrote:

Agreed, we could also model it so that VPInterleaveRecipe creates a single vector with all loaded values and then use extract instructions for the different parts of the interleave group. This keeps the modeling in the VPValue classes & co a bit simpler. But it has the cost that we need to add additional instructions/recipes to the plan, which need to be considered by other analyses and transformations. Another problem is that we would have to materialize a vector with all loaded values, which is completely artificial.

I was considering that the VPInterleaveRecipe and the VPExtract's would work in tandem. You would never get the "all loaded values" vector, it would just create the interleaving gorup as it does at the moment, with the VPExtracts representing the values for each item. They would still be special values, VPExtracts would be the only type of recipe that could use a VPInterleave, and the VPExtract would not really contain anything in it's execute method.

You could just consider the VPExtract as a new type of VPUser, not a VPRecipe directly. It's really a convenient way of using the operands to join up the two elements, not having to rely upon UnderlyingOrProducerTy. I guess that's a lot like the VPMultiValue concept. Using a new type of VPUser removes the complexity from VPValue at least!

IICU conceptually the suggested VPExtract and the 'sub-value' are effectively the same thing, just slightly different in terms where exactly they are defined.

I think most of the trade-offs are still similar, with a VPExtract version probably ending up slightly simpler and the sub-value one being a bit more complex but also a bit more explicit in the modeling and more flexible in the future. I think as-is, the 'sub-value' approach is relatively lightweight and does not add too much complexity and the difference between the VPExtract alternative should be relatively small. But if people generally prefer this approach I am happy to update the patches.

I think we have the make the following trade-off. Either we add additional complexity to VPValue & co to model recipes that produce multiple VPValues or we add additional instructions/recipes to the plans we generate.

Personally I think the additional complexity in VPValue & co is worth having simpler plans. Also, there might be additional complex recipes that produce multiple values in the future, which would also benefit from dedicated multi-value support.

It feels like we have one type system (VPValueSC/VPInstructionSC/VPWidenSC/etc), and we are adding another system on top of it, for different types of VPValue via UnderlyingOrProducerTy being virtual or subvalues or concrete.

That definitely can be improved. We should probably just rely on the VPValueID and use that to extract the right value from a union instead of sum type.

Is there a nice way at the moment of detecting _which_ output you are seeing from a subvalue VPValue? As in which member index it would be from a interleaved group.

Currently there's no nice way, but it can easily be done by getting the defining value and then checking which def corresponds to the sub-value. It should be possible to add a convenient interface once there's a need for it.

fhahn mentioned this in D88380: [VPlan] Extend VPValue to also model sub- & 'virtual' values..Oct 6 2020, 8:09 AM

Delete sub-value defs in VPInterleaveRecipe, handle virtual values in dropAllReferences, assert that repalceAllUses is not called with virtual VPValues.

Harbormaster completed remote builds in B74168: Diff 296516.Oct 6 2020, 12:31 PM

Use VPRecipeBase::VPInterleaveSC.

llvm/lib/Transforms/Vectorize/VPlan.cpp
966 ↗	(On Diff #296186)	No it's not used (also not in follow-ups). I added an assert.
llvm/lib/Transforms/Vectorize/VPlan.h
1028	They should be freed in the destructor. Done now.

Harbormaster completed remote builds in B74171: Diff 296519.Oct 6 2020, 12:33 PM

Superseded by VPDef approach.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

76 lines

10 lines

30 lines

3 lines

test/

Transforms/

LoopVectorize/

interleaved-accesses.ll

2 lines

Diff 292160

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	public:
/// Construct the vector value of a scalarized value \p V one lane at a time.		/// Construct the vector value of a scalarized value \p V one lane at a time.
void packScalarIntoVectorValue(Value *V, const VPIteration &Instance);		void packScalarIntoVectorValue(Value *V, const VPIteration &Instance);

/// Try to vectorize interleaved access group \p Group with the base address		/// Try to vectorize interleaved access group \p Group with the base address
/// given in \p Addr, optionally masking the vector operations if \p		/// given in \p Addr, optionally masking the vector operations if \p
/// BlockInMask is non-null. Use \p State to translate given VPValues to IR		/// BlockInMask is non-null. Use \p State to translate given VPValues to IR
/// values in the vectorized loop.		/// values in the vectorized loop.
void vectorizeInterleaveGroup(const InterleaveGroup<Instruction> *Group,		void vectorizeInterleaveGroup(const InterleaveGroup<Instruction> *Group,
		ArrayRef<VPMultiValue *> VPDefs,
VPTransformState &State, VPValue *Addr,		VPTransformState &State, VPValue *Addr,
VPValue *BlockInMask = nullptr);		VPValue *BlockInMask = nullptr);

/// Vectorize Load and Store instructions with the base address given in \p		/// Vectorize Load and Store instructions with the base address given in \p
/// Addr, optionally masking the vector operations if \p BlockInMask is		/// Addr, optionally masking the vector operations if \p BlockInMask is
/// non-null. Use \p State to translate given VPValues to IR values in the		/// non-null. Use \p State to translate given VPValues to IR values in the
/// vectorized loop.		/// vectorized loop.
void vectorizeMemoryInstruction(Instruction *Instr, VPTransformState &State,		void vectorizeMemoryInstruction(Instruction *Instr, VPTransformState &State,
▲ Show 20 Lines • Show All 1,758 Lines • ▼ Show 20 Lines
// }		// }
// To:		// To:
// %R_G.vec = shuffle %R.vec, %G.vec, <0, 1, 2, ..., 7>		// %R_G.vec = shuffle %R.vec, %G.vec, <0, 1, 2, ..., 7>
// %B_U.vec = shuffle %B.vec, undef, <0, 1, 2, 3, u, u, u, u>		// %B_U.vec = shuffle %B.vec, undef, <0, 1, 2, 3, u, u, u, u>
// %interleaved.vec = shuffle %R_G.vec, %B_U.vec,		// %interleaved.vec = shuffle %R_G.vec, %B_U.vec,
// <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> ; Interleave R,G,B elements		// <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> ; Interleave R,G,B elements
// store <12 x i32> %interleaved.vec ; Write 4 tuples of R,G,B		// store <12 x i32> %interleaved.vec ; Write 4 tuples of R,G,B
void InnerLoopVectorizer::vectorizeInterleaveGroup(		void InnerLoopVectorizer::vectorizeInterleaveGroup(
const InterleaveGroup<Instruction> *Group, VPTransformState &State,		const InterleaveGroup<Instruction> Group, ArrayRef<VPMultiValue > VPDefs,
VPValue Addr, VPValue BlockInMask) {		VPTransformState &State, VPValue Addr, VPValue BlockInMask) {
Instruction *Instr = Group->getInsertPos();		Instruction *Instr = Group->getInsertPos();
const DataLayout &DL = Instr->getModule()->getDataLayout();		const DataLayout &DL = Instr->getModule()->getDataLayout();

// Prepare for the vector type of the interleaved load/store.		// Prepare for the vector type of the interleaved load/store.
Type *ScalarTy = getMemInstValueType(Instr);		Type *ScalarTy = getMemInstValueType(Instr);
unsigned InterleaveFactor = Group->getFactor();		unsigned InterleaveFactor = Group->getFactor();
assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto VecTy = VectorType::get(ScalarTy, VF InterleaveFactor);		auto VecTy = VectorType::get(ScalarTy, VF InterleaveFactor);
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; Part++) {
NewLoad = Builder.CreateAlignedLoad(VecTy, AddrParts[Part],		NewLoad = Builder.CreateAlignedLoad(VecTy, AddrParts[Part],
Group->getAlign(), "wide.vec");		Group->getAlign(), "wide.vec");
Group->addMetadata(NewLoad);		Group->addMetadata(NewLoad);
NewLoads.push_back(NewLoad);		NewLoads.push_back(NewLoad);
}		}

// For each member in the group, shuffle out the appropriate data from the		// For each member in the group, shuffle out the appropriate data from the
// wide loads.		// wide loads.
		unsigned J = 0;
for (unsigned I = 0; I < InterleaveFactor; ++I) {		for (unsigned I = 0; I < InterleaveFactor; ++I) {
Instruction *Member = Group->getMember(I);		Instruction *Member = Group->getMember(I);

// Skip the gaps in the group.		// Skip the gaps in the group.
if (!Member)		if (!Member)
continue;		continue;

assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto StrideMask =		auto StrideMask =
createStrideMask(I, InterleaveFactor, VF.getKnownMinValue());		createStrideMask(I, InterleaveFactor, VF.getKnownMinValue());
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Value *StridedVec = Builder.CreateShuffleVector(		Value *StridedVec = Builder.CreateShuffleVector(
NewLoads[Part], UndefVec, StrideMask, "strided.vec");		NewLoads[Part], UndefVec, StrideMask, "strided.vec");

// If this member has different type, cast the result type.		// If this member has different type, cast the result type.
if (Member->getType() != ScalarTy) {		if (Member->getType() != ScalarTy) {
assert(!VF.isScalable() && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
VectorType *OtherVTy = VectorType::get(Member->getType(), VF);		VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);		StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);
}		}

if (Group->isReverse())		if (Group->isReverse())
StridedVec = reverseVector(StridedVec);		StridedVec = reverseVector(StridedVec);

VectorLoopValueMap.setVectorValue(Member, Part, StridedVec);		State.set(VPDefs[J], Member, StridedVec, Part);
}		}
		++J;
}		}
return;		return;
}		}

// The sub vector type for current instruction.		// The sub vector type for current instruction.
assert(!VF.isScalable() && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
auto *SubVT = VectorType::get(ScalarTy, VF);		auto *SubVT = VectorType::get(ScalarTy, VF);

▲ Show 20 Lines • Show All 4,822 Lines • ▼ Show 20 Lines	for (auto *Predecessor : predecessors(BB)) {
}		}

BlockMask = Builder.createOr(BlockMask, EdgeMask);		BlockMask = Builder.createOr(BlockMask, EdgeMask);
}		}

return BlockMaskCache[BB] = BlockMask;		return BlockMaskCache[BB] = BlockMask;
}		}

VPWidenMemoryInstructionRecipe *		VPRecipeBase VPRecipeBuilder::tryToWidenMemory(Instruction I, VFRange &Range,
VPRecipeBuilder::tryToWidenMemory(Instruction *I, VFRange &Range,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
assert((isa<LoadInst>(I) \|\| isa<StoreInst>(I)) &&		assert((isa<LoadInst>(I) \|\| isa<StoreInst>(I)) &&
"Must be called with either a load or store");		"Must be called with either a load or store");

auto willWiden = [&](ElementCount VF) -> bool {		auto willWiden = [&](ElementCount VF) -> bool {
assert(!VF.isScalable() && "unexpected scalable ElementCount");		assert(!VF.isScalable() && "unexpected scalable ElementCount");
if (VF.isScalar())		if (VF.isScalar())
return false;		return false;
LoopVectorizationCostModel::InstWidening Decision =		LoopVectorizationCostModel::InstWidening Decision =
Show All 11 Lines	VPRecipeBase VPRecipeBuilder::tryToWidenMemory(Instruction I, VFRange &Range,
if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))		if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))
return nullptr;		return nullptr;

VPValue *Mask = nullptr;		VPValue *Mask = nullptr;
if (Legal->isMaskRequired(I))		if (Legal->isMaskRequired(I))
Mask = createBlockInMask(I->getParent(), Plan);		Mask = createBlockInMask(I->getParent(), Plan);

VPValue *Addr = Plan->getOrAddVPValue(getLoadStorePointerOperand(I));		VPValue *Addr = Plan->getOrAddVPValue(getLoadStorePointerOperand(I));
		auto II = InsertPtToGroup.find(I);
		if (II != InsertPtToGroup.end()) {
		auto *IG = II->second;
		auto *InterleaveG = new VPInterleaveRecipe(IG, Addr, Mask);
		unsigned j = 0;
		for (unsigned i = 0; i < IG->getFactor(); i++)
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this be surrounded by some sort of "if it's a load IG" check? Equally, do we need to be adding the operands for a store IG group? dmgreen: Should this be surrounded by some sort of "if it's a load IG" check? Equally, do we need to be…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Right, we only need to add VPValues for load groups, updated. I should probably also add a check that we do not add VPValues to a plan with underlying instructions/values of type void. fhahn: Right, we only need to add VPValues for load groups, updated. I should probably also add a…
		if (Instruction *Member = IG->getMember(i)) {
		Plan->addVPValue(Member, InterleaveG->getResult(j));
		j++;
		}
		return InterleaveG;
		}

if (LoadInst *Load = dyn_cast<LoadInst>(I)) {		if (LoadInst *Load = dyn_cast<LoadInst>(I)) {
auto WidenLoad = new VPWidenMemoryInstructionRecipe(Load, Addr, Mask);		auto WidenLoad = new VPWidenMemoryInstructionRecipe(Load, Addr, Mask);
Plan->addVPValue(Load, WidenLoad);		Plan->addVPValue(Load, WidenLoad);
return WidenLoad;		return WidenLoad;
}		}

StoreInst *Store = cast<StoreInst>(I);		StoreInst *Store = cast<StoreInst>(I);
VPValue *StoredValue = Plan->getOrAddVPValue(Store->getValueOperand());		VPValue *StoredValue = Plan->getOrAddVPValue(Store->getValueOperand());
▲ Show 20 Lines • Show All 343 Lines • ▼ Show 20 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(
const DenseMap<Instruction , Instruction > &SinkAfter) {		const DenseMap<Instruction , Instruction > &SinkAfter) {

// Hold a mapping from predicated instructions to their recipes, in order to		// Hold a mapping from predicated instructions to their recipes, in order to
// fix their AlsoPack behavior if a user is determined to replicate and use a		// fix their AlsoPack behavior if a user is determined to replicate and use a
// scalar instead of vector value.		// scalar instead of vector value.
DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;		DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;

SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;		SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;
		SmallPtrSet<Instruction *, 8> DeadInterleaveGroupMembers;

VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, Legal, CM, PSE, Builder);		VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, Legal, CM, PSE, Builder);

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Pre-construction: record ingredients whose recipes we'll need to further		// Pre-construction: record ingredients whose recipes we'll need to further
// process after constructing the initial VPlan.		// process after constructing the initial VPlan.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

Show All 29 Lines	for (InterleaveGroup<Instruction> *IG : IAI.getInterleaveGroups()) {
auto applyIG = [IG, this](ElementCount VF) -> bool {		auto applyIG = [IG, this](ElementCount VF) -> bool {
return (VF.isVector() && // Query is illegal for VF == 1		return (VF.isVector() && // Query is illegal for VF == 1
CM.getWideningDecision(IG->getInsertPos(), VF) ==		CM.getWideningDecision(IG->getInsertPos(), VF) ==
LoopVectorizationCostModel::CM_Interleave);		LoopVectorizationCostModel::CM_Interleave);
};		};
if (!getDecisionAndClampRange(applyIG, Range))		if (!getDecisionAndClampRange(applyIG, Range))
continue;		continue;
InterleaveGroups.insert(IG);		InterleaveGroups.insert(IG);
		RecipeBuilder.recordInterleaveGroup(IG);
for (unsigned i = 0; i < IG->getFactor(); i++)		for (unsigned i = 0; i < IG->getFactor(); i++)
if (Instruction *Member = IG->getMember(i))		if (Instruction *Member = IG->getMember(i)) {
RecipeBuilder.recordRecipeOf(Member);		RecipeBuilder.recordRecipeOf(Member);
		if (Member != IG->getInsertPos())
		DeadInterleaveGroupMembers.insert(Member);
		}
		};

		auto skipDeadInterleaveMembers =
		[&DeadInterleaveGroupMembers](Instruction *I) {
		BasicBlock *BB = I->getParent();
		for (auto &I : make_range(I->getIterator(), BB->end()))
		if (!DeadInterleaveGroupMembers.contains(&I))
		return &I;
		llvm_unreachable("Need to find a valid insert point");
};		};
		// Mark instructions we'll need to sink later and their targets as
		// ingredients whose recipe we'll need to record.
		for (auto &Entry : SinkAfter) {
		RecipeBuilder.recordRecipeOf(skipDeadInterleaveMembers(Entry.first));
		RecipeBuilder.recordRecipeOf(skipDeadInterleaveMembers(Entry.second));
		}

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Build initial VPlan: Scan the body of the loop in a topological order to		// Build initial VPlan: Scan the body of the loop in a topological order to
// visit each basic block after having visited its predecessor basic blocks.		// visit each basic block after having visited its predecessor basic blocks.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

// Create a dummy pre-entry VPBasicBlock to start building the VPlan.		// Create a dummy pre-entry VPBasicBlock to start building the VPlan.
auto Plan = std::make_unique<VPlan>();		auto Plan = std::make_unique<VPlan>();
Show All 20 Lines	for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {

// Introduce each ingredient into VPlan.		// Introduce each ingredient into VPlan.
// TODO: Model and preserve debug instrinsics in VPlan.		// TODO: Model and preserve debug instrinsics in VPlan.
for (Instruction &I : BB->instructionsWithoutDebug()) {		for (Instruction &I : BB->instructionsWithoutDebug()) {
Instruction *Instr = &I;		Instruction *Instr = &I;

// First filter out irrelevant instructions, to ensure no recipes are		// First filter out irrelevant instructions, to ensure no recipes are
// built for them.		// built for them.
if (isa<BranchInst>(Instr) \|\| DeadInstructions.count(Instr))		if (isa<BranchInst>(Instr) \|\| DeadInstructions.count(Instr) \|\|
		DeadInterleaveGroupMembers.contains(Instr))
continue;		continue;

if (auto Recipe =		if (auto Recipe =
RecipeBuilder.tryToCreateWidenRecipe(Instr, Range, Plan)) {		RecipeBuilder.tryToCreateWidenRecipe(Instr, Range, Plan)) {
RecipeBuilder.setRecipe(Instr, Recipe);		RecipeBuilder.setRecipe(Instr, Recipe);
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}
Show All 21 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to		// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.		// bring the VPlan to its final state.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

// Apply Sink-After legal constraints.		// Apply Sink-After legal constraints.
for (auto &Entry : SinkAfter) {		for (auto &Entry : SinkAfter) {
VPRecipeBase *Sink = RecipeBuilder.getRecipe(Entry.first);		VPRecipeBase *Sink =
VPRecipeBase *Target = RecipeBuilder.getRecipe(Entry.second);		RecipeBuilder.getRecipe(skipDeadInterleaveMembers(Entry.first));
		VPRecipeBase *Target =
		RecipeBuilder.getRecipe(skipDeadInterleaveMembers(Entry.second));
Sink->moveAfter(Target);		Sink->moveAfter(Target);
}		}

// Interleave memory: for each Interleave Group we marked earlier as relevant
// for this VPlan, replace the Recipes widening its memory instructions with a
// single VPInterleaveRecipe at its insertion point.
for (auto IG : InterleaveGroups) {
auto *Recipe = cast<VPWidenMemoryInstructionRecipe>(
RecipeBuilder.getRecipe(IG->getInsertPos()));
(new VPInterleaveRecipe(IG, Recipe->getAddr(), Recipe->getMask()))
->insertBefore(Recipe);

for (unsigned i = 0; i < IG->getFactor(); ++i)
if (Instruction *Member = IG->getMember(i)) {
RecipeBuilder.getRecipe(Member)->eraseFromParent();
}
}

// Adjust the recipes for any inloop reductions.		// Adjust the recipes for any inloop reductions.
if (Range.Start > 1)		if (Range.Start > 1)
adjustRecipesForInLoopReductions(Plan, RecipeBuilder);		adjustRecipesForInLoopReductions(Plan, RecipeBuilder);

// Finally, if tail is folded by masking, introduce selects between the phi		// Finally, if tail is folded by masking, introduce selects between the phi
// and the live-out instruction of each reduction, at the end of the latch.		// and the live-out instruction of each reduction, at the end of the latch.
if (CM.foldTailByMasking() && !Legal->getReductionVars().empty()) {		if (CM.foldTailByMasking() && !Legal->getReductionVars().empty()) {
Builder.setInsertPoint(VPBB);		Builder.setInsertPoint(VPBB);
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	for (unsigned In = 0; In < NumIncoming; ++In) {
}		}
}		}
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
State.ValueMap.setVectorValue(Phi, Part, Entry[Part]);		State.ValueMap.setVectorValue(Phi, Part, Entry[Part]);
}		}

void VPInterleaveRecipe::execute(VPTransformState &State) {		void VPInterleaveRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Interleave group being replicated.");		assert(!State.Instance && "Interleave group being replicated.");
State.ILV->vectorizeInterleaveGroup(IG, State, getAddr(), getMask());		State.ILV->vectorizeInterleaveGroup(IG, getDefs(), State, getAddr(),
		getMask());
}		}

void VPReductionRecipe::execute(VPTransformState &State) {		void VPReductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Reduction being replicated.");		assert(!State.Instance && "Reduction being replicated.");
for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
unsigned Kind = RdxDesc->getRecurrenceKind();		unsigned Kind = RdxDesc->getRecurrenceKind();
Value *NewVecOp = State.get(VecOp, Part);		Value *NewVecOp = State.get(VecOp, Part);
Value *NewRed =		Value *NewRed =
▲ Show 20 Lines • Show All 630 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	class VPRecipeBuilder {
EdgeMaskCacheTy EdgeMaskCache;		EdgeMaskCacheTy EdgeMaskCache;
BlockMaskCacheTy BlockMaskCache;		BlockMaskCacheTy BlockMaskCache;

// VPlan-VPlan transformations support: Hold a mapping from ingredients to		// VPlan-VPlan transformations support: Hold a mapping from ingredients to
// their recipe. To save on memory, only do so for selected ingredients,		// their recipe. To save on memory, only do so for selected ingredients,
// marked by having a nullptr entry in this map.		// marked by having a nullptr entry in this map.
DenseMap<Instruction , VPRecipeBase > Ingredient2Recipe;		DenseMap<Instruction , VPRecipeBase > Ingredient2Recipe;

		DenseMap<Instruction , const InterleaveGroup<Instruction> > InsertPtToGroup;

/// Check if \p I can be widened at the start of \p Range and possibly		/// Check if \p I can be widened at the start of \p Range and possibly
/// decrease the range such that the returned value holds for the entire \p		/// decrease the range such that the returned value holds for the entire \p
/// Range. The function should not be called for memory instructions or calls.		/// Range. The function should not be called for memory instructions or calls.
bool shouldWiden(Instruction *I, VFRange &Range) const;		bool shouldWiden(Instruction *I, VFRange &Range) const;

/// Check if the load or store instruction \p I should widened for \p		/// Check if the load or store instruction \p I should widened for \p
/// Range.Start and potentially masked. Such instructions are handled by a		/// Range.Start and potentially masked. Such instructions are handled by a
/// recipe that takes an additional VPInstruction for the mask.		/// recipe that takes an additional VPInstruction for the mask.
VPWidenMemoryInstructionRecipe *		VPRecipeBase tryToWidenMemory(Instruction I, VFRange &Range,
tryToWidenMemory(Instruction *I, VFRange &Range, VPlanPtr &Plan);		VPlanPtr &Plan);

/// Check if an induction recipe should be constructed for \I. If so build and		/// Check if an induction recipe should be constructed for \I. If so build and
/// return it. If not, return null.		/// return it. If not, return null.
VPWidenIntOrFpInductionRecipe tryToOptimizeInductionPHI(PHINode Phi) const;		VPWidenIntOrFpInductionRecipe tryToOptimizeInductionPHI(PHINode Phi) const;

/// Optimize the special case where the operand of \p I is a constant integer		/// Optimize the special case where the operand of \p I is a constant integer
/// induction variable.		/// induction variable.
VPWidenIntOrFpInductionRecipe *		VPWidenIntOrFpInductionRecipe *
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:
/// Mark given ingredient for recording its recipe once one is created for		/// Mark given ingredient for recording its recipe once one is created for
/// it.		/// it.
void recordRecipeOf(Instruction *I) {		void recordRecipeOf(Instruction *I) {
assert((!Ingredient2Recipe.count(I) \|\| Ingredient2Recipe[I] == nullptr) &&		assert((!Ingredient2Recipe.count(I) \|\| Ingredient2Recipe[I] == nullptr) &&
"Recipe already set for ingredient");		"Recipe already set for ingredient");
Ingredient2Recipe[I] = nullptr;		Ingredient2Recipe[I] = nullptr;
}		}

		void recordInterleaveGroup(const InterleaveGroup<Instruction> *IG) {
		InsertPtToGroup[IG->getInsertPos()] = IG;
		}

/// Return the recipe created for given ingredient.		/// Return the recipe created for given ingredient.
VPRecipeBase getRecipe(Instruction I) {		VPRecipeBase getRecipe(Instruction I) {
assert(Ingredient2Recipe.count(I) &&		assert(Ingredient2Recipe.count(I) &&
"Recording this ingredients recipe was not requested");		"Recording this ingredients recipe was not requested");
assert(Ingredient2Recipe[I] != nullptr &&		assert(Ingredient2Recipe[I] != nullptr &&
"Ingredient doesn't have a recipe");		"Ingredient doesn't have a recipe");
return Ingredient2Recipe[I];		return Ingredient2Recipe[I];
}		}
Show All 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 1,006 Lines • ▼ Show 20 Lines	public:

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
};		};

/// VPInterleaveRecipe is a recipe for transforming an interleave group of load		/// VPInterleaveRecipe is a recipe for transforming an interleave group of load
/// or stores into one wide load/store and shuffles.		/// or stores into one wide load/store and shuffles.
class VPInterleaveRecipe : public VPRecipeBase {		class VPInterleaveRecipe : public VPRecipeBase, public VPValue {
const InterleaveGroup<Instruction> *IG;		const InterleaveGroup<Instruction> *IG;
VPUser User;		VPUser User;
		SmallVector<VPMultiValue *, 4> Defs;

public:		public:
VPInterleaveRecipe(const InterleaveGroup<Instruction> IG, VPValue Addr,		VPInterleaveRecipe(const InterleaveGroup<Instruction> IG, VPValue Addr,
VPValue *Mask)		VPValue *Mask)
: VPRecipeBase(VPInterleaveSC), IG(IG), User({Addr}) {		: VPRecipeBase(VPInterleaveSC), VPValue(VPValue::VPVInterleaveSC), IG(IG),
		User({Addr}) {
		dmgreenUnsubmitted Done Reply Inline Actions -> VPRecipeBase::VPInterleaveSC dmgreen: -> VPRecipeBase::VPInterleaveSC
if (Mask)		if (Mask)
User.addOperand(Mask);		User.addOperand(Mask);
		for (unsigned i = 0; i < IG->getNumMembers(); i++)
		Defs.push_back(new VPMultiValue(this));
		dmgreenUnsubmitted Not Done Reply Inline Actions Does anything delete these Defs? dmgreen: Does anything delete these Defs?
		fhahnAuthorUnsubmitted Done Reply Inline Actions They should be freed in the destructor. Done now. fhahn: They should be freed in the destructor. Done now.
}		}
~VPInterleaveRecipe() override = default;		~VPInterleaveRecipe() override = default;

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPInterleaveSC;		return V->getVPRecipeID() == VPRecipeBase::VPInterleaveSC;
}		}
		static inline bool classof(const VPValue *V) {
		return V->getVPValueID() == VPValue::VPVInterleaveSC;
		}

/// Return the address accessed by this recipe.		/// Return the address accessed by this recipe.
VPValue *getAddr() const {		VPValue *getAddr() const {
return User.getOperand(0); // Address is the 1st, mandatory operand.		return User.getOperand(0); // Address is the 1st, mandatory operand.
}		}

/// Return the mask used by this recipe. Note that a full mask is represented		/// Return the mask used by this recipe. Note that a full mask is represented
/// by a nullptr.		/// by a nullptr.
VPValue *getMask() const {		VPValue *getMask() const {
// Mask is optional and therefore the last, currently 2nd operand.		// Mask is optional and therefore the last, currently 2nd operand.
return User.getNumOperands() == 2 ? User.getOperand(1) : nullptr;		return User.getNumOperands() == 2 ? User.getOperand(1) : nullptr;
}		}

/// Generate the wide load or store, and shuffles.		/// Generate the wide load or store, and shuffles.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;

const InterleaveGroup<Instruction> *getInterleaveGroup() { return IG; }		const InterleaveGroup<Instruction> *getInterleaveGroup() { return IG; }

		VPMultiValue *getResult(unsigned Idx) { return Defs[Idx]; }
		ArrayRef<VPMultiValue *> getDefs() { return Defs; }
};		};

		/template <> struct simplify_type<VPMultiValue> {/
		dmgreenUnsubmitted Not Done Reply Inline Actions This looks left over from something. dmgreen: This looks left over from something.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Indeed, that should be gone now, thanks! fhahn: Indeed, that should be gone now, thanks!
		// using SimpleType = VPValue *;

		// static SimpleType getSimplifiedValue(VPMultiValue &Val) {
		// return Val.getProducer();
		//}
		//};

		// template <> struct simplify_type<VPMultiValue *> {
		// using SimpleType = VPValue *;

		// static SimpleType getSimplifiedValue(VPMultiValue *&Val) {
		// return Val->getProducer();
		//}
		//};

/// A recipe to represent inloop reduction operations, performing a reduction on		/// A recipe to represent inloop reduction operations, performing a reduction on
/// a vector operand into a scalar value, and adding the result to a chain.		/// a vector operand into a scalar value, and adding the result to a chain.
class VPReductionRecipe : public VPRecipeBase {		class VPReductionRecipe : public VPRecipeBase {
/// The recurrence decriptor for the reduction in question.		/// The recurrence decriptor for the reduction in question.
RecurrenceDescriptor *RdxDesc;		RecurrenceDescriptor *RdxDesc;
/// The original instruction being converted to a reduction.		/// The original instruction being converted to a reduction.
Instruction *I;		Instruction *I;
/// The VPValue of the vector value to be reduced.		/// The VPValue of the vector value to be reduced.
▲ Show 20 Lines • Show All 1,000 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanValue.h

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	public:
/// type identification.		/// type identification.
enum {		enum {
VPValueSC,		VPValueSC,
VPMultiValueSC,		VPMultiValueSC,
VPInstructionSC,		VPInstructionSC,
VPMemoryInstructionSC,		VPMemoryInstructionSC,
VPVWidenCallSC,		VPVWidenCallSC,
VPVWidenSelectSC,		VPVWidenSelectSC,
VPVWidenGEPSC		VPVWidenGEPSC,
		VPVInterleaveSC
};		};

VPValue(Value *UV = nullptr) : VPValue(VPValueSC, UV) {}		VPValue(Value *UV = nullptr) : VPValue(VPValueSC, UV) {}
VPValue(const VPValue &) = delete;		VPValue(const VPValue &) = delete;
VPValue &operator=(const VPValue &) = delete;		VPValue &operator=(const VPValue &) = delete;
virtual ~VPValue() {}		virtual ~VPValue() {}

/// \return an ID for the concrete type of this object.		/// \return an ID for the concrete type of this object.
▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

	Show First 20 Lines • Show All 903 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @PR34743(			; CHECK-LABEL: @PR34743(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %vector.recur = phi <4 x i16> [ %vector.recur.init, %vector.ph ], [ %[[VSHUF1:.+]], %vector.body ]			; CHECK: %vector.recur = phi <4 x i16> [ %vector.recur.init, %vector.ph ], [ %[[VSHUF1:.+]], %vector.body ]
	; CHECK: %wide.vec = load <8 x i16>			; CHECK: %wide.vec = load <8 x i16>
	; CHECK: %[[VSHUF0:.+]] = shufflevector <8 x i16> %wide.vec, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			; CHECK: %[[VSHUF0:.+]] = shufflevector <8 x i16> %wide.vec, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; CHECK: %[[VSHUF1:.+]] = shufflevector <8 x i16> %wide.vec, <8 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			; CHECK: %[[VSHUF1:.+]] = shufflevector <8 x i16> %wide.vec, <8 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK: %[[VSHUF:.+]] = shufflevector <4 x i16> %vector.recur, <4 x i16> %[[VSHUF1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK: %[[VSHUF:.+]] = shufflevector <4 x i16> %vector.recur, <4 x i16> %[[VSHUF1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK: sext <4 x i16> %[[VSHUF0]] to <4 x i32>			; CHECK: sext <4 x i16> %[[VSHUF0]] to <4 x i32>
	; CHECK: sext <4 x i16> %[[VSHUF]] to <4 x i32>
	; CHECK: sext <4 x i16> %[[VSHUF1]] to <4 x i32>			; CHECK: sext <4 x i16> %[[VSHUF1]] to <4 x i32>
				; CHECK: sext <4 x i16> %[[VSHUF]] to <4 x i32>
	; CHECK: mul nsw <4 x i32>			; CHECK: mul nsw <4 x i32>
	; CHECK: mul nsw <4 x i32>			; CHECK: mul nsw <4 x i32>

	define void @PR34743(i16* %a, i32* %b, i64 %n) {			define void @PR34743(i16* %a, i32* %b, i64 %n) {
	entry:			entry:
	%.pre = load i16, i16* %a			%.pre = load i16, i16* %a
	br label %loop			br label %loop

	Show All 26 Lines