This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve cost estimation/emission of externally used extractelements.
ClosedPublic

Authored by ABataev on May 21 2021, 10:15 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
vdmitrie
anton-afanasyev
dtemirbulatov

Commits

rG8c48d77cdfe5: [SLP]Improve cost estimation/emission of externally used extractelements.

Summary

No need to recalculate the cost of extractelements, just no need to
compensate the cost of all extractelements, need to check before if this
is actually going to be removed at the vectorization. Also, no need to
generate new extractelement instruction, we may just regenerate the
original one. It may improve the final vectorization.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.May 21 2021, 10:15 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 21 2021, 10:15 AM

ABataev requested review of this revision.May 21 2021, 10:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 21 2021, 10:15 AM

RKSimon edited the summary of this revision. (Show Details)May 21 2021, 10:36 AM

RKSimon added inline comments.May 21 2021, 10:38 AM

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll
80 ↗	(On Diff #347071)	this looks like a NFC regeneration?

ABataev added inline comments.May 21 2021, 10:40 AM

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll
80 ↗	(On Diff #347071)	Not quite, the order of extractelements has changed

Harbormaster completed remote builds in B105666: Diff 347071.May 21 2021, 10:47 AM

ABataev added inline comments.May 21 2021, 11:00 AM

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll
80 ↗	(On Diff #347071)	I will investigate why it happens.

Rebase + rework.

ABataev retitled this revision from [SLP]Fix cost/emission of externally used extractelements. to [SLP]Improve cost estimation/emission of externally used extractelements..May 21 2021, 12:58 PM

ABataev edited the summary of this revision. (Show Details)

Reworked the patch, had to make changes for the vectorizable extractelements compensation cost. Also, we were not quite correct when trying to check for the vectorized users, need to check also if the instruction is going to be vectorized as a reduction value. Otherwise we could be too optimistic in non-reduction cases (when extractelements are the seeds and their user is not actually in the vectorization graph).

Harbormaster completed remote builds in B105695: Diff 347106.May 21 2021, 1:44 PM

LGTM

This revision is now accepted and ready to land.Jun 3 2021, 4:26 AM

Closed by commit rG8c48d77cdfe5: [SLP]Improve cost estimation/emission of externally used extractelements. (authored by ABataev). · Explain WhyJun 3 2021, 10:28 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG8c48d77cdfe5: [SLP]Improve cost estimation/emission of externally used extractelements..

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

56 lines

test/

Transforms/

SLPVectorizer/

AArch64/

PR38339.ll

6 lines

vectorize-free-extracts-inserts.ll

4 lines

Diff 349602

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 629 Lines • ▼ Show 20 Lines	public:
Value *vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues);		Value *vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues);

/// \returns the cost incurred by unwanted spills and fills, caused by		/// \returns the cost incurred by unwanted spills and fills, caused by
/// holding live values over call sites.		/// holding live values over call sites.
InstructionCost getSpillCost() const;		InstructionCost getSpillCost() const;

/// \returns the vectorization cost of the subtree that starts at \p VL.		/// \returns the vectorization cost of the subtree that starts at \p VL.
/// A negative number means that this is profitable.		/// A negative number means that this is profitable.
InstructionCost getTreeCost();		InstructionCost getTreeCost(ArrayRef<Value *> VectorizedVals = None);

/// Construct a vectorizable tree that starts at \p Roots, ignoring users for		/// Construct a vectorizable tree that starts at \p Roots, ignoring users for
/// the purpose of scheduling and extraction in the \p UserIgnoreLst.		/// the purpose of scheduling and extraction in the \p UserIgnoreLst.
void buildTree(ArrayRef<Value *> Roots,		void buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst = None);		ArrayRef<Value *> UserIgnoreLst = None);

/// Construct a vectorizable tree that starts at \p Roots, ignoring users for		/// Construct a vectorizable tree that starts at \p Roots, ignoring users for
/// the purpose of scheduling and extraction in the \p UserIgnoreLst taking		/// the purpose of scheduling and extraction in the \p UserIgnoreLst taking
▲ Show 20 Lines • Show All 897 Lines • ▼ Show 20 Lines	#endif

/// Marks values operands for later deletion by replacing them with Undefs.		/// Marks values operands for later deletion by replacing them with Undefs.
void eraseInstructions(ArrayRef<Value *> AV);		void eraseInstructions(ArrayRef<Value *> AV);

~BoUpSLP();		~BoUpSLP();

private:		private:
/// Checks if all users of \p I are the part of the vectorization tree.		/// Checks if all users of \p I are the part of the vectorization tree.
bool areAllUsersVectorized(Instruction *I) const;		bool areAllUsersVectorized(Instruction *I,
		ArrayRef<Value *> VectorizedVals) const;

/// \returns the cost of the vectorizable entry.		/// \returns the cost of the vectorizable entry.
InstructionCost getEntryCost(const TreeEntry *E);		InstructionCost getEntryCost(const TreeEntry *E,
		ArrayRef<Value *> VectorizedVals);

/// This is the recursive part of buildTree.		/// This is the recursive part of buildTree.
void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth,		void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth,
const EdgeInfo &EI);		const EdgeInfo &EI);

/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can		/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can
/// be vectorized to use the original vector (or aggregate "bitcast" to a		/// be vectorized to use the original vector (or aggregate "bitcast" to a
/// vector) and sets \p CurrentOrder to the identity permutation; otherwise		/// vector) and sets \p CurrentOrder to the identity permutation; otherwise
▲ Show 20 Lines • Show All 1,936 Lines • ▼ Show 20 Lines	bool BoUpSLP::canReuseExtract(ArrayRef<Value > VL, Value OpValue,
if (I < E) {		if (I < E) {
CurrentOrder.clear();		CurrentOrder.clear();
return false;		return false;
}		}

return ShouldKeepOrder;		return ShouldKeepOrder;
}		}

bool BoUpSLP::areAllUsersVectorized(Instruction *I) const {		bool BoUpSLP::areAllUsersVectorized(Instruction *I,
return I->hasOneUse() \|\| llvm::all_of(I->users(), [this](User *U) {		ArrayRef<Value *> VectorizedVals) const {
		return (I->hasOneUse() && is_contained(VectorizedVals, I)) \|\|
		llvm::all_of(I->users(), [this](User *U) {
return ScalarToTreeEntry.count(U) > 0;		return ScalarToTreeEntry.count(U) > 0;
});		});
}		}

static std::pair<InstructionCost, InstructionCost>		static std::pair<InstructionCost, InstructionCost>
getVectorCallCosts(CallInst CI, FixedVectorType VecTy,		getVectorCallCosts(CallInst CI, FixedVectorType VecTy,
TargetTransformInfo TTI, TargetLibraryInfo TLI) {		TargetTransformInfo TTI, TargetLibraryInfo TLI) {
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	for (auto *V : VL) {
// cost to extract the a vector with EltsPerVector elements.		// cost to extract the a vector with EltsPerVector elements.
Cost += TTI.getShuffleCost(		Cost += TTI.getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc,		TargetTransformInfo::SK_PermuteSingleSrc,
FixedVectorType::get(VecTy->getElementType(), EltsPerVector));		FixedVectorType::get(VecTy->getElementType(), EltsPerVector));
}		}
return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E) {		InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
		ArrayRef<Value *> VectorizedVals) {
ArrayRef<Value*> VL = E->Scalars;		ArrayRef<Value*> VL = E->Scalars;

Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))		else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))
ScalarTy = CI->getOperand(0)->getType();		ScalarTy = CI->getOperand(0)->getType();
else if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))		else if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))
Show All 12 Lines	InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
InstructionCost ReuseShuffleCost = 0;		InstructionCost ReuseShuffleCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost =		ReuseShuffleCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy,		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy,
E->ReuseShuffleIndices);		E->ReuseShuffleIndices);
}		}
// FIXME: it tries to fix a problem with MSVC buildbots.		// FIXME: it tries to fix a problem with MSVC buildbots.
TargetTransformInfo &TTIRef = *TTI;		TargetTransformInfo &TTIRef = *TTI;
auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL,		auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,
VecTy](InstructionCost &Cost, bool IsGather) {		VectorizedVals](InstructionCost &Cost,
		bool IsGather) {
DenseMap<Value *, int> ExtractVectorsTys;		DenseMap<Value *, int> ExtractVectorsTys;
for (auto *V : VL) {		for (auto *V : VL) {
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
if (IsGather && (!areAllUsersVectorized(cast<Instruction>(V)) \|\|		if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|
ScalarToTreeEntry.count(V)))		(IsGather && ScalarToTreeEntry.count(V)))
continue;		continue;
auto *EE = cast<ExtractElementInst>(V);		auto *EE = cast<ExtractElementInst>(V);
unsigned Idx = *getExtractIndex(EE);		unsigned Idx = *getExtractIndex(EE);
if (TTIRef.getNumberOfParts(VecTy) !=		if (TTIRef.getNumberOfParts(VecTy) !=
TTIRef.getNumberOfParts(EE->getVectorOperandType())) {		TTIRef.getNumberOfParts(EE->getVectorOperandType())) {
auto It =		auto It =
ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;		ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;
It->getSecond() = std::min<int>(It->second, Idx);		It->getSecond() = std::min<int>(It->second, Idx);
▲ Show 20 Lines • Show All 737 Lines • ▼ Show 20 Lines	for (Instruction *Inst : OrderedScalars) {
}		}

PrevInst = Inst;		PrevInst = Inst;
}		}

return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getTreeCost() {		InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "		LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
<< VectorizableTree.size() << ".\n");		<< VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0]->Scalars.size();		unsigned BundleWidth = VectorizableTree[0]->Scalars.size();

for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {		for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I].get();		TreeEntry &TE = *VectorizableTree[I].get();

InstructionCost C = getEntryCost(&TE);		InstructionCost C = getEntryCost(&TE, VectorizedVals);
Cost += C;		Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n"		<< ".\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

SmallPtrSet<Value *, 16> ExtractCostCalculated;		SmallPtrSet<Value *, 16> ExtractCostCalculated;
Show All 13 Lines	for (ExternalUser &EU : ExternalUses) {
// removed as well).		// removed as well).
if (EphValues.count(EU.User))		if (EphValues.count(EU.User))
continue;		continue;

// No extract cost for vector "scalar"		// No extract cost for vector "scalar"
if (isa<FixedVectorType>(EU.Scalar->getType()))		if (isa<FixedVectorType>(EU.Scalar->getType()))
continue;		continue;

		// Already counted the cost for external uses when tried to adjust the cost
		// for extractelements, no need to add it again.
		if (isa<ExtractElementInst>(EU.Scalar))
		continue;

// If found user is an insertelement, do not calculate extract cost but try		// If found user is an insertelement, do not calculate extract cost but try
// to detect it as a final shuffled/identity match.		// to detect it as a final shuffled/identity match.
if (EU.User && isa<InsertElementInst>(EU.User)) {		if (EU.User && isa<InsertElementInst>(EU.User)) {
if (auto *FTy = dyn_cast<FixedVectorType>(EU.User->getType())) {		if (auto *FTy = dyn_cast<FixedVectorType>(EU.User->getType())) {
Optional<int> InsertIdx = getInsertIndex(EU.User, 0);		Optional<int> InsertIdx = getInsertIndex(EU.User, 0);
if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
continue;		continue;
Value *VU = EU.User;		Value *VU = EU.User;
▲ Show 20 Lines • Show All 1,121 Lines • ▼ Show 20 Lines	assert(E->State != TreeEntry::NeedToGather &&
"Extracting from a gather list");		"Extracting from a gather list");

Value *Vec = E->VectorizedValue;		Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");		assert(Vec && "Can't find vectorizable value");

Value *Lane = Builder.getInt32(ExternalUse.Lane);		Value *Lane = Builder.getInt32(ExternalUse.Lane);
auto ExtractAndExtendIfNeeded = [&](Value *Vec) {		auto ExtractAndExtendIfNeeded = [&](Value *Vec) {
if (Scalar->getType() != Vec->getType()) {		if (Scalar->getType() != Vec->getType()) {
Value *Ex = Builder.CreateExtractElement(Vec, Lane);		Value *Ex;
		// "Reuse" the existing extract to improve final codegen.
		if (auto *ES = dyn_cast<ExtractElementInst>(Scalar)) {
		Ex = Builder.CreateExtractElement(ES->getOperand(0),
		ES->getOperand(1));
		} else {
		Ex = Builder.CreateExtractElement(Vec, Lane);
		}
// If necessary, sign-extend or zero-extend ScalarRoot		// If necessary, sign-extend or zero-extend ScalarRoot
// to the larger type.		// to the larger type.
if (!MinBWs.count(ScalarRoot))		if (!MinBWs.count(ScalarRoot))
return Ex;		return Ex;
if (MinBWs[ScalarRoot].second)		if (MinBWs[ScalarRoot].second)
return Builder.CreateSExt(Ex, Scalar->getType());		return Builder.CreateSExt(Ex, Scalar->getType());
return Builder.CreateZExt(Ex, Scalar->getType());		return Builder.CreateZExt(Ex, Scalar->getType());
} else {		}
assert(isa<FixedVectorType>(Scalar->getType()) &&		assert(isa<FixedVectorType>(Scalar->getType()) &&
isa<InsertElementInst>(Scalar) &&		isa<InsertElementInst>(Scalar) &&
"In-tree scalar of vector type is not insertelement?");		"In-tree scalar of vector type is not insertelement?");
return Vec;		return Vec;
}
};		};
// If User == nullptr, the Scalar is used as extra arg. Generate		// If User == nullptr, the Scalar is used as extra arg. Generate
// ExtractElement instruction and update the record for this scalar in		// ExtractElement instruction and update the record for this scalar in
// ExternallyUsedValues.		// ExternallyUsedValues.
if (!User) {		if (!User) {
assert(ExternallyUsedValues.count(Scalar) &&		assert(ExternallyUsedValues.count(Scalar) &&
"Scalar with nullptr as an external user must be registered in "		"Scalar with nullptr as an external user must be registered in "
"ExternallyUsedValues map");		"ExternallyUsedValues map");
▲ Show 20 Lines • Show All 2,055 Lines • ▼ Show 20 Lines	while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {
if (V.isTreeTinyAndNotFullyVectorizable())		if (V.isTreeTinyAndNotFullyVectorizable())
break;		break;
if (V.isLoadCombineReductionCandidate(RdxKind))		if (V.isLoadCombineReductionCandidate(RdxKind))
break;		break;

V.computeMinimumValueSizes();		V.computeMinimumValueSizes();

// Estimate cost.		// Estimate cost.
InstructionCost TreeCost = V.getTreeCost();		InstructionCost TreeCost =
		V.getTreeCost(makeArrayRef(&ReducedVals[i], ReduxWidth));
InstructionCost ReductionCost =		InstructionCost ReductionCost =
getReductionCost(TTI, ReducedVals[i], ReduxWidth);		getReductionCost(TTI, ReducedVals[i], ReduxWidth);
InstructionCost Cost = TreeCost + ReductionCost;		InstructionCost Cost = TreeCost + ReductionCost;
if (!Cost.isValid()) {		if (!Cost.isValid()) {
LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");		LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");
return false;		return false;
}		}
if (Cost >= -SLPCostThreshold) {		if (Cost >= -SLPCostThreshold) {
▲ Show 20 Lines • Show All 781 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=aarch64-apple-ios -mcpu=cyclone -o - %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=aarch64-apple-ios -mcpu=cyclone -o - %s \| FileCheck %s

	define void @f1(<2 x i16> %x, i16* %a) {			define void @f1(<2 x i16> %x, i16* %a) {
	; CHECK-LABEL: @f1(			; CHECK-LABEL: @f1(
	; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i16> [[X:%.]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i16> [[X:%.]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i16> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i16> [[X]], i32 0
	; CHECK-NEXT: store i16 [[TMP1]], i16* [[A:%.*]], align 2			; CHECK-NEXT: store i16 [[TMP1]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP2]], align 2			; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP2]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t2 = extractelement <2 x i16> %x, i32 0			%t2 = extractelement <2 x i16> %x, i32 0
	%t3 = extractelement <2 x i16> %x, i32 1			%t3 = extractelement <2 x i16> %x, i32 1
	%ptr0 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 0			%ptr0 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 0
	Show All 15 Lines
	; CHECK: cont:			; CHECK: cont:
	; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x i16> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i16> [[XX]], i32 0
	; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2			; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2
	; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2			; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 30 Lines
	; CHECK: cont:			; CHECK: cont:
	; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x i16> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i16> [[XX]], i32 1
	; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2			; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2
	; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2			; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[V_1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[V_1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: call void @use(double [[TMP3]])			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: call void @use(double [[TMP4]])			; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 546 Lines • Show Last 20 Lines