This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2
LoopVectorizationPlanner.h
-
VPlan.h
-
VPlanHCFGBuilder.cpp
-
unittests/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
VPlanHCFGTest.cpp

Differential D50823

[VPlan] Introduce VPCmpInst sub-class in the instruction-level representation
AbandonedPublic

Authored by dcaballe on Aug 15 2018, 5:12 PM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
rengolin
hsaito
mkuper
hfinkel
rkruppe

Summary

This patch introduces VPCmpInst, a sub-class of VPInstruction used to model details of comparison instructions in VPlan, such as the comparison's predicate. At this point, we don't see the need of distinguishing between integer and floating point comparison at VPlan representation level.

VPCmpInst is needed in D50480 to properly model a new compare VPInstruction (i.e., a compare that is not part of the input IR) generated during the vectorization process.

Diff Detail

Event Timeline

dcaballe created this revision.Aug 15 2018, 5:12 PM

Herald added subscribers: llvm-commits, rogfer01, rkruppe and 2 others. · View Herald TranscriptAug 15 2018, 5:12 PM

dcaballe mentioned this in D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.Aug 15 2018, 5:18 PM

The implementation looks good to me. The interface chosen here (directly mirroring CmpInst from the Value hierarchy in the VPValue hierarchy) also seems like the right direction to me. Besides avoiding the problematic concept of "underlying Instructions" altogether, it also gives a convenient place to put any helper functionality that the vectorizer code might want when generating and manipulating such comparisons.

That said, this is currently the only subclass of its kind and it would be weird if it remained that way. In other words, I would expect that VPInstruction would gain more such subclasses (e.g., VPBinaryOperator, VPSelectInst, etc.) as the vectorizer starts to create and manipulate these instructions. Does this make sense to y'all?

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
157	This duplicates the assertion in the other overload that's called in the next line, right? I don't mind that, it's good to be a bit more defensive, just want to make sure I didn't miss anything and that this is intentional.

In D50823#1202831, @rkruppe wrote:

That said, this is currently the only subclass of its kind and it would be weird if it remained that way. In other words, I would expect that VPInstruction would gain more such subclasses (e.g., VPBinaryOperator, VPSelectInst, etc.) as the vectorizer starts to create and manipulate these instructions. Does this make sense to y'all?

Our general intention is to make VPInstructions as easy to use as Instructions to many LLVM developers, who aren't necessarily very familiar with vectorizer, and also reduce the duplicate development/maintenance effort. If that requires more subclassing, we'll consider doing that but very carefully, and only as needed basis. At this point, letting VPInstruction to have all the functionality of Instruction is not an objective. We are starting from implementing just enough to satisfy vectorizer needs and minimizing unnecessary divergence in doing so (i.e., what's implemented can be used in a very similar manner).

Sorry that I'm not directly answering your question. Hope this helps in evaluating between the two alternatives: new opcode approach in D50480 and subclassing approach in this patch, however. I think subclassing here helps us avoid unnecessary divergence.

In D50823#1202955, @hsaito wrote:

Our general intention is to make VPInstructions as easy to use as Instructions to many LLVM developers, who aren't necessarily very familiar with vectorizer, and also reduce the duplicate development/maintenance effort. If that requires more subclassing, we'll consider doing that but very carefully, and only as needed basis. At this point, letting VPInstruction to have all the functionality of Instruction is not an objective. We are starting from implementing just enough to satisfy vectorizer needs and minimizing unnecessary divergence in doing so (i.e., what's implemented can be used in a very similar manner).

Sorry that I'm not directly answering your question. Hope this helps in evaluating between the two alternatives: new opcode approach in D50480 and subclassing approach in this patch, however. I think subclassing here helps us avoid unnecessary divergence.

Thanks, it doesn't answer my question in the exact terms I used, but it hits all the notes I was curious about. I agree with the goals (accessibility to developers, minimizing unnecessary divergence) and also with getting there incrementally and on demand.

dcaballe added inline comments.Aug 17 2018, 9:24 AM

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
157	Thanks, Robin. IIRC, the initial implementation didn't invoke the overload. I agree with what you said. It's better to be defensive, just in case the implementation of this one ends up not invoking the overload again.

In case you missed it, there is some discussion in D50480 regarding this code. Your feedback would be appreciated.

Thanks!
Diego

Jumping from D50480:

In D50480#1214790, @hsaito wrote:

In D50480#1213673, @Ayal wrote:

This patch aims to model a rather special early-exit condition that restricts the execution of the entire loop body to certain iterations, rather than model general compare instructions. If preferred, an "EarlyExit" extended opcode can be introduced instead of the controversial ICmpULE. This should be easy to revisit in the future if needed.

This patch is fine as is, or rather much better with ICmpULE than EarlyExit.

This patch focuses on modeling an early-exit compare and then generating it, w/o making strategic design decisions supporting future vplan-to-vplan transformations, the interfaces they may need, potential templatization, or other long-term high-level VPlan concerns. These should be explained and discussed separately along with pros and cons of alternative solutions for supporting the desired interfaces and for holding their storage, including subclassing VPInstructions, using detached Instructions, or other possibilities.

Sure. I agree.

[Full disclosure] I have a big mental barrier in accepting your "early-exit" terminology here since I relate that term to "break out of the loop", but that's just the terminology difference. Nothing to do with the substance of this patch. [End of full disclosure]

Regarding "using detached Instructions". I fully go against that because that'll forever prohibit moving the VPlan/VPInstructions into Analysis. IR Verifier will trigger if there is a detached IR Instruction at the end of an Analysis pass. I already had a hallway chat with @lattner about a possibility of using IR Instructions and IR CFG in the detached mode (and that also requires many utilities to be usable in detached mode) and he was totally pessimistic about it. That was two years ago at 2016 Developer Conference, but nothing really has changed since then in that regard. That was the end of my hope for using detached IR Instructions, instead of introducing VPInstructions. Detached Instructions under the hood of VPInstructions is not very useful if we can't keep them between vectorization Analysis pass and vectorization Transformation pass.

In D50823#1214823, @hsaito wrote:

Jumping from D50480:

In D50480#1214790, @hsaito wrote:

In D50480#1213673, @Ayal wrote:

This patch aims to model a rather special early-exit condition that restricts the execution of the entire loop body to certain iterations, rather than model general compare instructions. If preferred, an "EarlyExit" extended opcode can be introduced instead of the controversial ICmpULE. This should be easy to revisit in the future if needed.

This patch is fine as is, or rather much better with ICmpULE than EarlyExit.

This patch focuses on modeling an early-exit compare and then generating it, w/o making strategic design decisions supporting future vplan-to-vplan transformations, the interfaces they may need, potential templatization, or other long-term high-level VPlan concerns. These should be explained and discussed separately along with pros and cons of alternative solutions for supporting the desired interfaces and for holding their storage, including subclassing VPInstructions, using detached Instructions, or other possibilities.

Sure. I agree.

[Full disclosure] I have a big mental barrier in accepting your "early-exit" terminology here since I relate that term to "break out of the loop", but that's just the terminology difference. Nothing to do with the substance of this patch. [End of full disclosure]

Regarding "using detached Instructions". I fully go against that because that'll forever prohibit moving the VPlan/VPInstructions into Analysis. IR Verifier will trigger if there is a detached IR Instruction at the end of an Analysis pass. I already had a hallway chat with @lattner about a possibility of using IR Instructions and IR CFG in the detached mode (and that also requires many utilities to be usable in detached mode) and he was totally pessimistic about it. That was two years ago at 2016 Developer Conference, but nothing really has changed since then in that regard. That was the end of my hope for using detached IR Instructions, instead of introducing VPInstructions. Detached Instructions under the hood of VPInstructions is not very useful if we can't keep them between vectorization Analysis pass and vectorization Transformation pass.

(Just for the record, the detached ICmpInst used in https://reviews.llvm.org/D50480?id=161564 passes when verifyModule() is called right after LVP.plan(). If the Undef's it uses are of concern, its operands could be nullified after construction.)

Not needed for now.

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2019, 12:46 PM

Herald added a subscriber: psnobl. · View Herald Transcript

Ayal mentioned this in D149079: [VPlan] Record IR flags on VPWidenRecipe directly (NFC)..May 7 2023, 8:10 AM

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorizationPlanner.h

21 lines

VPlan.h

44 lines

VPlanHCFGBuilder.cpp

12 lines

unittests/

Transforms/

Vectorize/

VPlanHCFGTest.cpp

3 lines

Diff 160945

lib/Transforms/Vectorize/LoopVectorizationPlanner.h

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:
VPValue createAnd(VPValue LHS, VPValue *RHS) {		VPValue createAnd(VPValue LHS, VPValue *RHS) {
return createInstruction(Instruction::BinaryOps::And, {LHS, RHS});		return createInstruction(Instruction::BinaryOps::And, {LHS, RHS});
}		}

VPValue createOr(VPValue LHS, VPValue *RHS) {		VPValue createOr(VPValue LHS, VPValue *RHS) {
return createInstruction(Instruction::BinaryOps::Or, {LHS, RHS});		return createInstruction(Instruction::BinaryOps::Or, {LHS, RHS});
}		}

		/// Create a VPCmpInst with \p LeftOp and \p RightOp as operands and \p Pred
		/// as predicate.
		VPCmpInst createCmpInst(VPValue LeftOp, VPValue *RightOp,
		CmpInst::Predicate Pred) {
		assert(LeftOp && RightOp && "VPCmpInst's operands can't be null!");
		VPCmpInst *Instr = new VPCmpInst(LeftOp, RightOp, Pred);
		if (BB)
		BB->insert(Instr, InsertPt);
		return Instr;
		}

		/// Create a VPCmpInst with \p LeftOp and \p RightOp as operands and \p CI's
		/// predicate as predicate. \p CI is also set as underlying Instruction.
		VPCmpInst createCmpInst(VPValue LeftOp, VPValue RightOp, CmpInst CI) {
		assert(CI && "CI can't be null!");
		assert(LeftOp && RightOp && "VPCmpInst's operands can't be null!");
		rkruppeUnsubmitted Not Done Reply Inline Actions This duplicates the assertion in the other overload that's called in the next line, right? I don't mind that, it's good to be a bit more defensive, just want to make sure I didn't miss anything and that this is intentional. rkruppe: This duplicates the assertion in the other overload that's called in the next line, right? I…
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Thanks, Robin. IIRC, the initial implementation didn't invoke the overload. I agree with what you said. It's better to be defensive, just in case the implementation of this one ends up not invoking the overload again. dcaballe: Thanks, Robin. IIRC, the initial implementation didn't invoke the overload. I agree with what…
		VPCmpInst *VPCI = createCmpInst(LeftOp, RightOp, CI->getPredicate());
		VPCI->setUnderlyingValue(CI);
		return VPCI;
		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// RAII helpers.		// RAII helpers.
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// RAII object that stores the current insertion point and restores it when		/// RAII object that stores the current insertion point and restores it when
/// the object is destroyed.		/// the object is destroyed.
class InsertPointGuard {		class InsertPointGuard {
VPBuilder &Builder;		VPBuilder &Builder;
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 639 Lines • ▼ Show 20 Lines	public:

/// Print the Recipe.		/// Print the Recipe.
void print(raw_ostream &O, const Twine &Indent) const override;		void print(raw_ostream &O, const Twine &Indent) const override;

/// Print the VPInstruction.		/// Print the VPInstruction.
void print(raw_ostream &O) const;		void print(raw_ostream &O) const;
};		};

		/// Concrete class for comparison VPInstruction. Represents both integer and
		/// floating point comparisons.
		class VPCmpInst : public VPInstruction {
		public:
		typedef CmpInst::Predicate Predicate;

		/// Create VPCmpInst with operands \p LHS and \p RHS and predicate \p Pred.
		VPCmpInst(VPValue LHS, VPValue RHS, Predicate Pred)
		: VPInstruction(inferOpcodeFromPredicate(Pred),
		ArrayRef<VPValue *>({LHS, RHS})),
		Pred(Pred) {
		assert(LHS && RHS && "VPCmpInst's operands can't be null!");
		}

		/// Return the predicate for this comparison VPInstruction.
		Predicate getPredicate() const { return Pred; }

		/// Methods for supporting type inquiry through isa, cast, and dyn_cast:
		static bool classof(const VPInstruction *VPI) {
		return VPI->getOpcode() == Instruction::ICmp \|\|
		VPI->getOpcode() == Instruction::FCmp;
		}
		static bool classof(const VPValue *V) {
		return isa<VPInstruction>(V) && classof(cast<VPInstruction>(V));
		}
		static bool classof(const VPRecipeBase *V) {
		return isa<VPInstruction>(V) && classof(cast<VPInstruction>(V));
		}

		private:
		// Predicate of the comparison.
		Predicate Pred;

		// Return the opcode that corresponds to predicate \p Pred.
		unsigned inferOpcodeFromPredicate(Predicate Pred) {
		// Infer Opcode from Pred.
		if (CmpInst::isIntPredicate(Pred))
		return Instruction::ICmp;
		if (CmpInst::isFPPredicate(Pred))
		return Instruction::FCmp;
		llvm_unreachable("Integer/Float predicate expected!");
		}
		};

/// VPWidenRecipe is a recipe for producing a copy of vector type for each		/// VPWidenRecipe is a recipe for producing a copy of vector type for each
/// Instruction in its ingredients independently, in order. This recipe covers		/// Instruction in its ingredients independently, in order. This recipe covers
/// most of the traditional vectorization cases where each ingredient transforms		/// most of the traditional vectorization cases where each ingredient transforms
/// into a vectorized version of itself.		/// into a vectorized version of itself.
class VPWidenRecipe : public VPRecipeBase {		class VPWidenRecipe : public VPRecipeBase {
private:		private:
/// Hold the ingredients by pointing to their original BasicBlock location.		/// Hold the ingredients by pointing to their original BasicBlock location.
BasicBlock::iterator Begin;		BasicBlock::iterator Begin;
▲ Show 20 Lines • Show All 779 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	if (auto *Phi = dyn_cast<PHINode>(Inst)) {
PhisToFix.push_back(Phi);		PhisToFix.push_back(Phi);
} else {		} else {
// Translate LLVM-IR operands into VPValue operands and set them in the		// Translate LLVM-IR operands into VPValue operands and set them in the
// new VPInstruction.		// new VPInstruction.
SmallVector<VPValue *, 4> VPOperands;		SmallVector<VPValue *, 4> VPOperands;
for (Value *Op : Inst->operands())		for (Value *Op : Inst->operands())
VPOperands.push_back(getOrCreateVPOperand(Op));		VPOperands.push_back(getOrCreateVPOperand(Op));

		if (auto *CI = dyn_cast<CmpInst>(Inst)) {
		assert(VPOperands.size() == 2 && "Expected 2 operands in CmpInst.");
		NewVPInst = VPIRBuilder.createCmpInst(VPOperands[0], VPOperands[1], CI);
		} else
// Build VPInstruction for any arbitraty Instruction without specific		// Build VPInstruction for any arbitraty Instruction without specific
// representation in VPlan.		// representation in VPlan.
NewVPInst = cast<VPInstruction>(		NewVPInst = cast<VPInstruction>(
VPIRBuilder.createNaryOp(Inst->getOpcode(), VPOperands, Inst));		VPIRBuilder.createNaryOp(Inst->getOpcode(), VPOperands, Inst));
}		}

IRDef2VPValue[Inst] = NewVPInst;		IRDef2VPValue[Inst] = NewVPInst;
}		}
}		}

// Main interface to build the plain CFG.		// Main interface to build the plain CFG.
VPRegionBlock *PlainCFGBuilder::buildPlainCFG() {		VPRegionBlock *PlainCFGBuilder::buildPlainCFG() {
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

unittests/Transforms/Vectorize/VPlanHCFGTest.cpp

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	TEST_F(VPlanHCFGTest, testBuildHCFGInnerLoop) {
EXPECT_EQ(Add, Store->getOperand(0));		EXPECT_EQ(Add, Store->getOperand(0));
EXPECT_EQ(Idx, Store->getOperand(1));		EXPECT_EQ(Idx, Store->getOperand(1));

VPInstruction IndvarAdd = dyn_cast<VPInstruction>(&Iter++);		VPInstruction IndvarAdd = dyn_cast<VPInstruction>(&Iter++);
EXPECT_EQ(Instruction::Add, IndvarAdd->getOpcode());		EXPECT_EQ(Instruction::Add, IndvarAdd->getOpcode());
EXPECT_EQ(2u, IndvarAdd->getNumOperands());		EXPECT_EQ(2u, IndvarAdd->getNumOperands());
EXPECT_EQ(Phi, IndvarAdd->getOperand(0));		EXPECT_EQ(Phi, IndvarAdd->getOperand(0));

VPInstruction ICmp = dyn_cast<VPInstruction>(&Iter++);		VPCmpInst ICmp = dyn_cast<VPCmpInst>(&Iter++);
EXPECT_EQ(Instruction::ICmp, ICmp->getOpcode());		EXPECT_EQ(Instruction::ICmp, ICmp->getOpcode());
EXPECT_EQ(2u, ICmp->getNumOperands());		EXPECT_EQ(2u, ICmp->getNumOperands());
		EXPECT_EQ(CmpInst::ICMP_NE, ICmp->getPredicate());
EXPECT_EQ(IndvarAdd, ICmp->getOperand(0));		EXPECT_EQ(IndvarAdd, ICmp->getOperand(0));
EXPECT_EQ(VecBB->getCondBit(), ICmp);		EXPECT_EQ(VecBB->getCondBit(), ICmp);

LoopVectorizationLegality::InductionList Inductions;		LoopVectorizationLegality::InductionList Inductions;
SmallPtrSet<Instruction *, 1> DeadInstructions;		SmallPtrSet<Instruction *, 1> DeadInstructions;
VPlanHCFGTransforms::VPInstructionsToVPRecipes(Plan, &Inductions,		VPlanHCFGTransforms::VPInstructionsToVPRecipes(Plan, &Inductions,
DeadInstructions);		DeadInstructions);
}		}
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines