Download Raw Diff

Details

Reviewers

Ayal
rengolin
dcaballe
hsaito
mkuper
mzolotukhin

Commits

rGb3c6f07dde8c: [VPlan] Move recipe based VPlan generation to separate function.
rL334284: [VPlan] Move recipe based VPlan generation to separate function.

Summary

This first step separates VPInstruction-based and VPRecipe-based
VPlan creation, which should make it easier to migrate to VPInstruction
based code-gen step by step.

Diff Detail

Repository: rL LLVM

Event Timeline

fhahn created this revision.May 29 2018, 7:09 AM

Herald added subscribers: rkruppe, tschuett, bollu. · View Herald TranscriptMay 29 2018, 7:09 AM

fhahn mentioned this in D46827: [VPlan] Add VPInstruction to VPRecipe transformation..May 29 2018, 7:10 AM

Thanks for the patch, Florian! Ok, now I understand what you mean exactly. Some comments inline.

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
352 ↗	(On Diff #148902)	I really like this change! This is decoupling the VPlan construction with VPInstructions from the legacy VPlan construction using recipes. I understand that you will refactor even more the code in `buildVPRecipes` to implement the temporary VPInstruction-to-VPRecipe step in the native path. Maybe we should name this buildVPlanWithRecipes or something like that to clearly state that it's creating a new VPlan?
lib/Transforms/Vectorize/LoopVectorize.cpp
6342 ↗	(On Diff #148902)	Why `BestVF.Width + 1`? Shouldn't be `BestVF.Width`?
6347 ↗	(On Diff #148902)	The rationale of building VPlans for all the VFs and then invoke CM to choose the best VF (without using those VPlans) was to eventually port CM to work on these VPlans. This hasn't happened yet but @hsaito is working on this VPlan based version of the CM. We think that this step is very important in the converge of both vectorization paths since it will allow us to move the VPlan construction in the inner loop path to earlier stages of the path. Since this is a small change, my suggestion would be: 1) if this change is needed by your subsequent patches, let's go with it. Hideki could move the VPlan construction again after CM when his code is ready; 2) otherwise, let's keep the original version. Please, note that we are mostly sharing a single VPlan for all the VFs so we shouldn't be using unnecessary memory most of the times. WDYT?

Thanks Diego!

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
352 ↗	(On Diff #148902)	Yep there is some more refactoring I plan on doing. Ideally I think we should try to move as much of the VPlan related implementation out of LoopVectorize.cpp
lib/Transforms/Vectorize/LoopVectorize.cpp
6342 ↗	(On Diff #148902)	yep, buildVPRecipes does not really need a range , so this could be simplified.
6347 ↗	(On Diff #148902)	I understand, but do you think we will be able to replace the legacy cost model anytime soon? I expect implementing VPlan based inner loop vectorization that does not introduce regressions over the legacy cost model will be a big task. I thought the plan was to develop this cost model in the VPlan native path (with support for inner loops). That will allow us to get an initial cost model in and iterate on that, until we match the performance of the legacy one. At that point, we can retire the legacy inner loop vectorizer. What do you think?

lib/Transforms/Vectorize/LoopVectorize.cpp
6347 ↗	(On Diff #148902)	I think we should be doing both, and that's probably the only way to gain enough credibility with the rest of the community. Certainly not an easy task. From my perspective of CM refactoring, if VPlan is built before CM runs, incremental work of introducing VPlan-ness is easier. If you move it to after CM computing VF/UF, the first thing I need to do, for incremental introduction of VPlan-ness, is essentially undoing your change. First CM refactoring I plan to do is to change "BB->Instruction" traversal into "VPBB -> Recipe -> Instruction" traversal, making sure that the change is NFC. We can then proceed to change "per instruction" modeling to "per Recipe" modeling one by one and/or to "per VPInstruction" modeling, making sure cost modeling methodology and computed cost are comparable or better (we'll probably have to establish some metric there) each step. Along the way, we'll need to reflect many "behind the scenes smarts" into VPlan. All that expects VPlan to be built before CM computation. Does this make sense to you?

fhahn added inline comments.May 29 2018, 2:41 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6347 ↗	(On Diff #148902)	Please, note that we are mostly sharing a single VPlan for all the VFs so we shouldn't be using unnecessary memory most of the times. Right Diego, thanks! I think we should be doing both, and that's probably the only way to gain enough credibility with the rest of the community. Certainly not an easy task. From my perspective of CM refactoring, if VPlan is built before CM runs, incremental work of introducing VPlan-ness is easier. If you move it to after CM computing VF/UF, the first thing I need to do, for incremental introduction of VPlan-ness, is essentially undoing your change. First CM refactoring I plan to do is to change "BB->Instruction" traversal into "VPBB -> Recipe -> Instruction" traversal, making sure that the change is NFC. We can then proceed to change "per instruction" modeling to "per Recipe" modeling one by one and/or to "per VPInstruction" modeling, making sure cost modeling methodology and computed cost are comparable or better (we'll probably have to establish some metric there) each step. Along the way, we'll need to reflect many "behind the scenes smarts" into VPlan. All that expects VPlan to be built before CM computation. Does this make sense to you? Yep thanks! My understanding from one of Diego's responses at D46827 was that we want to try to keep changes to the Vplan native path, to allow us to iterate quickly without introducing regressions. Anyways, I think it makes sense to move building the Vplans before cost modelling again, as it sounds like the VPlan based cost modelling is coming soon :) Do you think we should get the rest of this patch in regardless?

hsaito added inline comments.May 29 2018, 3:28 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6347 ↗	(On Diff #148902)	There's a fast path for velocity of implementation Diego is working on and a gradual path for credibility building that I'm working on. Eventually, we'll have enough sharing of code between the two paths and that's the time we can declare to conclude the transition. I'm neutral to the rest of the changes, minus the fact that NeedDef computation is needed only once, and we'll end up in more than once after the change. If you think this is a step forward for constructing HCFG first and then populate Recipes later for the innermost loop vectorization path, I'd say go for it. Otherwise, I'll stay neutral. Decoupling is the opposite of what we are trying to do. Increased sharing is the direction we are heading.

dcaballe added inline comments.May 29 2018, 3:35 PM

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
352 ↗	(On Diff #148902)	+1
lib/Transforms/Vectorize/LoopVectorize.cpp
6347 ↗	(On Diff #148902)	Yep thanks! My understanding from one of Diego's responses at D46827 was that we want to try to keep changes to the Vplan native path, to allow us to iterate quickly without introducing regressions. Sorry if my comment was confusing. We have to evaluate case by case an make individual decisions. We think the CM refactoring is necessary for convergence: 1) to be able to move the VPlan construction to an earlier stage in the inner loop path, and, 2) to be able to decouple and move some of the code in CM outside of CM. We still have to evaluate the actual consequences of this approach, though. However, what I think we shouldn't do in the VPlan native path is to build unnecessary recipes (since we are trying to get rid of them) or move code related to them without doing the proper porting of that code to VPInstructions. Otherwise we would end up having a VPlan native path with the same problems as the inner loop path. Does it make sense? Do you think we should get the rest of this patch in regardless? I think so! I like the other changes!

This first step separates VPInstruction-based and VPRecipe-based
VPlan creation, which should make it easier to migrate to VPInstruction
based code-gen step by step.

fhahn added a child revision: D47595: [VPlan] Move recipe construction to VPRecipeBuilder..May 31 2018, 10:07 AM

Thanks for revisiting this, Florian!
No major issues. Just one question below.

Thanks,
Diego

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
357 ↗	(On Diff #149309)	ijf -> if
lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗	(On Diff #149309)	Just wondering if `buildVPlans` happened before the if condition for some reason. Is it possible that we still need the VF=1 VPlan for interleaving? Not sure if there is a test covering this case. Could you please have a look? If should be easy to test using #pragma clang loop interleave.

Thanks for having a look Diego!

lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗	(On Diff #149309)	Thanks for spotting this. I think at the moment there are some assertions making sure we only vectorize with VF > 1, but I moved the check to the original position, as for loop aware SLP it might be profitable to vectorize even if the computed VF is 1.

Add missing context.

Thanks, Florian! Again, wait for the pending review to commit this one.

Diego

This revision is now accepted and ready to land.Jun 1 2018, 9:20 AM

Will do, thanks for having a look Diego!

In D47477#1119199, @dcaballe wrote:

Thanks, Florian! Again, wait for the pending review to commit this one.

as I updated D46827, do you think there is anything left that's pending for this patch?

In D47477#1123833, @fhahn wrote:

In D47477#1119199, @dcaballe wrote:

Thanks, Florian! Again, wait for the pending review to commit this one.

as I updated D46827, do you think there is anything left that's pending for this patch?

I don't think so! Please, go ahead and thanks for your patience! :)

Closed by commit rL334284: [VPlan] Move recipe based VPlan generation to separate function. (authored by fhahn). · Explain WhyJun 8 2018, 5:58 AM

This revision was automatically updated to reflect the committed changes.

Diff 150496

llvm/trunk/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	private:
/// Create a replicating region for instruction \p I that requires		/// Create a replicating region for instruction \p I that requires
/// predication. \p PredRecipe is a VPReplicateRecipe holding \p I.		/// predication. \p PredRecipe is a VPReplicateRecipe holding \p I.
VPRegionBlock createReplicateRegion(Instruction I, VPRecipeBase *PredRecipe,		VPRegionBlock createReplicateRegion(Instruction I, VPRecipeBase *PredRecipe,
VPlanPtr &Plan);		VPlanPtr &Plan);

/// Build a VPlan according to the information gathered by Legal. \return a		/// Build a VPlan according to the information gathered by Legal. \return a
/// VPlan for vectorization factors \p Range.Start and up to \p Range.End		/// VPlan for vectorization factors \p Range.Start and up to \p Range.End
/// exclusive, possibly decreasing \p Range.End.		/// exclusive, possibly decreasing \p Range.End.
VPlanPtr buildVPlan(VFRange &Range,		VPlanPtr buildVPlan(VFRange &Range);
const SmallPtrSetImpl<Value *> &NeedDef);
		/// Build a VPlan using VPRecipes according to the information gather by
		/// Legal. This method is only used for the legacy inner loop vectorizer.
		VPlanPtr
		buildVPlanWithVPRecipes(VFRange &Range, SmallPtrSetImpl<Value *> &NeedDef,
		SmallPtrSetImpl<Instruction *> &DeadInstructions);

		/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
		/// according to the information gathered by Legal when it checked if it is
		/// legal to vectorize the loop. This method creates VPlans using VPRecipes.
		void buildVPlansWithVPRecipes(unsigned MinVF, unsigned MaxVF);
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZATIONPLANNER_H		#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZATIONPLANNER_H

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,310 Lines • ▼ Show 20 Lines	if (!MaybeMaxVF.hasValue()) // Cases considered too costly to vectorize.
return NoVectorization;		return NoVectorization;

if (UserVF) {		if (UserVF) {
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(UserVF);		CM.selectUserVectorizationFactor(UserVF);
buildVPlans(UserVF, UserVF);		buildVPlansWithVPRecipes(UserVF, UserVF);
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
return {UserVF, 0};		return {UserVF, 0};
}		}

unsigned MaxVF = MaybeMaxVF.getValue();		unsigned MaxVF = MaybeMaxVF.getValue();
assert(MaxVF != 0 && "MaxVF is zero.");		assert(MaxVF != 0 && "MaxVF is zero.");

for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {		for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {
// Collect Uniform and Scalar instructions after vectorization with VF.		// Collect Uniform and Scalar instructions after vectorization with VF.
CM.collectUniformsAndScalars(VF);		CM.collectUniformsAndScalars(VF);

// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
if (VF > 1)		if (VF > 1)
CM.collectInstsToScalarize(VF);		CM.collectInstsToScalarize(VF);
}		}

buildVPlans(1, MaxVF);		buildVPlansWithVPRecipes(1, MaxVF);
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
if (MaxVF == 1)		if (MaxVF == 1)
return NoVectorization;		return NoVectorization;

// Select the optimal vectorization factor.		// Select the optimal vectorization factor.
return CM.selectVectorizationFactor(MaxVF);		return CM.selectVectorizationFactor(MaxVF);
}		}

▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
}		}

/// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,		/// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,
/// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range		/// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range
/// of VF's starting at a given VF and extending it as much as possible. Each		/// of VF's starting at a given VF and extending it as much as possible. Each
/// vectorization decision can potentially shorten this sub-range during		/// vectorization decision can potentially shorten this sub-range during
/// buildVPlan().		/// buildVPlan().
void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) {		void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) {

// Collect conditions feeding internal conditional branches; they need to be
// represented in VPlan for it to model masking.
SmallPtrSet<Value *, 1> NeedDef;

auto *Latch = OrigLoop->getLoopLatch();
for (BasicBlock *BB : OrigLoop->blocks()) {
if (BB == Latch)
continue;
BranchInst *Branch = dyn_cast<BranchInst>(BB->getTerminator());
if (Branch && Branch->isConditional())
NeedDef.insert(Branch->getCondition());
}

for (unsigned VF = MinVF; VF < MaxVF + 1;) {		for (unsigned VF = MinVF; VF < MaxVF + 1;) {
VFRange SubRange = {VF, MaxVF + 1};		VFRange SubRange = {VF, MaxVF + 1};
VPlans.push_back(buildVPlan(SubRange, NeedDef));		VPlans.push_back(buildVPlan(SubRange));
VF = SubRange.End;		VF = SubRange.End;
}		}
}		}

VPValue LoopVectorizationPlanner::createEdgeMask(BasicBlock Src,		VPValue LoopVectorizationPlanner::createEdgeMask(BasicBlock Src,
BasicBlock *Dst,		BasicBlock *Dst,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
assert(is_contained(predecessors(Dst), Src) && "Invalid edge");		assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
▲ Show 20 Lines • Show All 338 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,
// Note: first set Entry as region entry and then connect successors starting		// Note: first set Entry as region entry and then connect successors starting
// from it in order, to propagate the "parent" of each VPBasicBlock.		// from it in order, to propagate the "parent" of each VPBasicBlock.
VPBlockUtils::insertTwoBlocksAfter(Pred, Exit, Entry);		VPBlockUtils::insertTwoBlocksAfter(Pred, Exit, Entry);
VPBlockUtils::connectBlocks(Pred, Exit);		VPBlockUtils::connectBlocks(Pred, Exit);

return Region;		return Region;
}		}

LoopVectorizationPlanner::VPlanPtr		void LoopVectorizationPlanner::buildVPlansWithVPRecipes(unsigned MinVF,
LoopVectorizationPlanner::buildVPlan(VFRange &Range,		unsigned MaxVF) {
const SmallPtrSetImpl<Value *> &NeedDef) {		assert(OrigLoop->empty() && "Inner loop expected.");
// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.
if (!OrigLoop->empty()) {
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

// Create new empty VPlan
auto Plan = llvm::make_unique<VPlan>();

// Build hierarchical CFG		// Collect conditions feeding internal conditional branches; they need to be
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI);		// represented in VPlan for it to model masking.
HCFGBuilder.buildHierarchicalCFG(*Plan.get());		SmallPtrSet<Value *, 1> NeedDef;

return Plan;		auto *Latch = OrigLoop->getLoopLatch();
		for (BasicBlock *BB : OrigLoop->blocks()) {
		if (BB == Latch)
		continue;
		BranchInst *Branch = dyn_cast<BranchInst>(BB->getTerminator());
		if (Branch && Branch->isConditional())
		NeedDef.insert(Branch->getCondition());
}		}

assert(OrigLoop->empty() && "Inner loop expected.");
EdgeMaskCache.clear();
BlockMaskCache.clear();
DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();
DenseMap<Instruction , Instruction > SinkAfterInverse;

// Collect instructions from the original loop that will become trivially dead		// Collect instructions from the original loop that will become trivially dead
// in the vectorized loop. We don't need to vectorize these instructions. For		// in the vectorized loop. We don't need to vectorize these instructions. For
// example, original induction update instructions can become dead because we		// example, original induction update instructions can become dead because we
// separately emit induction "steps" when generating code for the new loop.		// separately emit induction "steps" when generating code for the new loop.
// Similarly, we create a new latch condition when setting up the structure		// Similarly, we create a new latch condition when setting up the structure
// of the new loop, so the old one can become dead.		// of the new loop, so the old one can become dead.
SmallPtrSet<Instruction *, 4> DeadInstructions;		SmallPtrSet<Instruction *, 4> DeadInstructions;
collectTriviallyDeadInstructions(DeadInstructions);		collectTriviallyDeadInstructions(DeadInstructions);

		for (unsigned VF = MinVF; VF < MaxVF + 1;) {
		VFRange SubRange = {VF, MaxVF + 1};
		VPlans.push_back(
		buildVPlanWithVPRecipes(SubRange, NeedDef, DeadInstructions));
		VF = SubRange.End;
		}
		}

		LoopVectorizationPlanner::VPlanPtr
		LoopVectorizationPlanner::buildVPlanWithVPRecipes(
		VFRange &Range, SmallPtrSetImpl<Value *> &NeedDef,
		SmallPtrSetImpl<Instruction *> &DeadInstructions) {
// Hold a mapping from predicated instructions to their recipes, in order to		// Hold a mapping from predicated instructions to their recipes, in order to
// fix their AlsoPack behavior if a user is determined to replicate and use a		// fix their AlsoPack behavior if a user is determined to replicate and use a
// scalar instead of vector value.		// scalar instead of vector value.
DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;		DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;

		EdgeMaskCache.clear();
		BlockMaskCache.clear();
		DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();
		DenseMap<Instruction , Instruction > SinkAfterInverse;

// Create a dummy pre-entry VPBasicBlock to start building the VPlan.		// Create a dummy pre-entry VPBasicBlock to start building the VPlan.
VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");		VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");
auto Plan = llvm::make_unique<VPlan>(VPBB);		auto Plan = llvm::make_unique<VPlan>(VPBB);

// Represent values that will have defs inside VPlan.		// Represent values that will have defs inside VPlan.
for (Value *V : NeedDef)		for (Value *V : NeedDef)
Plan->addVPValue(V);		Plan->addVPValue(V);

▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::buildVPlanWithVPRecipes(
}		}
RSO << "},UF>=1";		RSO << "},UF>=1";
RSO.flush();		RSO.flush();
Plan->setName(PlanName);		Plan->setName(PlanName);

return Plan;		return Plan;
}		}

		LoopVectorizationPlanner::VPlanPtr
		LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
		// Outer loop handling: They may require CFG and instruction level
		// transformations before even evaluating whether vectorization is profitable.
		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
		// the vectorization pipeline.
		assert(!OrigLoop->empty());
		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

		// Create new empty VPlan
		auto Plan = llvm::make_unique<VPlan>();

		// Build hierarchical CFG
		VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI);
		HCFGBuilder.buildHierarchicalCFG(*Plan.get());

		return Plan;
		}

Value* LoopVectorizationPlanner::VPCallbackILV::		Value* LoopVectorizationPlanner::VPCallbackILV::
getOrCreateVectorValues(Value *V, unsigned Part) {		getOrCreateVectorValues(Value *V, unsigned Part) {
return ILV.getOrCreateVectorValue(V, Part);		return ILV.getOrCreateVectorValue(V, Part);
}		}

void VPInterleaveRecipe::print(raw_ostream &O, const Twine &Indent) const {		void VPInterleaveRecipe::print(raw_ostream &O, const Twine &Indent) const {
O << " +\n"		O << " +\n"
<< Indent << "\"INTERLEAVE-GROUP with factor " << IG->getFactor() << " at ";		<< Indent << "\"INTERLEAVE-GROUP with factor " << IG->getFactor() << " at ";
▲ Show 20 Lines • Show All 591 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Move recipe based VPlan generation to separate function.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150496

llvm/trunk/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Move recipe based VPlan generation to separate function.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150496

llvm/trunk/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

[VPlan] Move recipe based VPlan generation to separate function.
ClosedPublic