This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Fix alternate cmp operands analysis.
AbandonedPublic

Authored by ABataev on Aug 25 2022, 12:22 PM.

Download Raw Diff

Details

Reviewers

RKSimon
vdmitrie

Summary

Missed negation on the results for areCompatibleCmpOps for alternate cmp
operations, it may lead to non-optimal operands ordering.

Metric: SLP.NumVectorInstructions

 Program                                                                                       SLP.NumVectorInstructions
                                                                                               results                   results0 diff
                                test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  5109.00                   5107.00 -0.0%
                                test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5514.00                   5509.00 -0.1%
                        test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5505.00                   5500.00 -0.1%
                      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27293.00                  27193.00 -0.4%
                         test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   252.00                    251.00 -0.4%
                        test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   252.00                    251.00 -0.4%
                     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   400.00                    396.00 -1.0%

447.dealII - extra <4x> vectorization
453.povray - same
511.povray_r - same, better shuffles
526.blender_r - > 20 new <4x> vectorization, less shuffles
541.leela_r, 641.leela_s, miniFE - pretty the same, better shuffles

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Aug 25 2022, 12:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 12:22 PM

Herald added subscribers: vporpo, hiraditya. · View Herald Transcript

ABataev requested review of this revision.Aug 25 2022, 12:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 12:22 PM

Harbormaster completed remote builds in B183452: Diff 455687.Aug 25 2022, 1:21 PM

RKSimon added inline comments.Aug 30 2022, 8:48 AM

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
231–237	The "<0,1,2,3,1,0,2,3>" swizzle in the upper half is a regression - I noticed it on a couple of other test changes as well.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptAug 30 2022, 8:48 AM

Address comments

ABataev edited the summary of this revision. (Show Details)Sep 1 2022, 12:54 PM

Herald added a subscriber: dmgreen. · View Herald TranscriptSep 1 2022, 12:54 PM

ABataev edited the summary of this revision. (Show Details)Sep 1 2022, 12:55 PM

Harbormaster completed remote builds in B184658: Diff 457366.Sep 1 2022, 1:46 PM

vdmitrie added inline comments.Sep 7 2022, 9:38 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3669	Is it possible to split off the reordering improvement into a separate patch?
5883	The main issue with existing code here which makes me nervous is that main/alternate logic is not consistent across SLP. getSameOpcode has some clear enough approach but its logic does not match one here [and another similar code]. Unfortunately this patch does not fix that. That is maintainability issue too. For isAlternateInstruction, for example, there are couple of check-points: it shall answer true if I == AltOp and false when I == MainOp. But it is very difficult to figure from this code whether it will work this way. Without clear description it is pretty difficult to reverse engineer the intent. I put a draft patch here https://reviews.llvm.org/D133430 - as a possible approach to the problem. It works very similar to this patch (not taking into account reordering improvement) but has consistent approach to main/alternate selection. There were couple of minor differences when an instruction could fit both main and alternate flows.

ABataev added inline comments.Sep 7 2022, 10:43 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3669	Sure

Abandoned in favor of D133430

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

105 lines

test/

Transforms/

PhaseOrdering/

X86/

vector-reductions-logical.ll

37 lines

SLPVectorizer/

X86/

alternate-cmp-swapped-pred.ll

17 lines

cmp-as-alternate-ops.ll

2 lines

insert-shuffle.ll

11 lines

reduction-logical.ll

40 lines

Diff 457366

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,659 Lines • ▼ Show 20 Lines	if (clusterSortPtrAccesses(Ptrs, ScalarTy, DL, SE, Order))
return Order;		return Order;
return None;		return None;
}		}

Optional<BoUpSLP::OrdersType> BoUpSLP::getReorderingData(const TreeEntry &TE,		Optional<BoUpSLP::OrdersType> BoUpSLP::getReorderingData(const TreeEntry &TE,
bool TopToBottom) {		bool TopToBottom) {
// No need to reorder if need to shuffle reuses, still need to shuffle the		// No need to reorder if need to shuffle reuses, still need to shuffle the
// node.		// node.
if (!TE.ReuseShuffleIndices.empty())		if (!TE.ReuseShuffleIndices.empty()) {
		// Check if reuse shuffle indices can be improved by reordering.
		vdmitrieUnsubmitted Not Done Reply Inline Actions Is it possible to split off the reordering improvement into a separate patch? vdmitrie: Is it possible to split off the reordering improvement into a separate patch?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure ABataev: Sure
		// For this, check that reuse mask is "clustered", i.e. each scalar values
		// is used once in each submask of size <number_of_scalars>.
		// Example: 4 scalar values.
		// ReuseShuffleIndices mask: 0, 1, 2, 3, 3, 2, 0, 1 - clustered.
		// 0, 1, 2, 3, 3, 3, 1, 0 - not clustered, because
		// element 3 is used twice in the second submask.
		unsigned Sz = TE.Scalars.size();
		unsigned VF = TE.getVectorFactor();
		auto &&IsClusteredReuse = [Sz, VF](const TreeEntry &TE) {
		for (unsigned K = 0; K < VF; K += Sz) {
		ArrayRef<int> SubMask =
		makeArrayRef(TE.ReuseShuffleIndices).slice(K, Sz);
		if (all_of(SubMask, [](int Idx) { return Idx == UndefMaskElem; }))
		continue;
		SmallBitVector UsedIndices(Sz);
		for (int Idx : SubMask) {
		if (Idx == UndefMaskElem)
		return false;
		UsedIndices.set(Idx);
		}
		if (!UsedIndices.all())
		return false;
		}
		return true;
		};
		if (!IsClusteredReuse(TE))
return None;		return None;
		// Try build correct order for extractelement instructions.
		SmallVector<int> ReusedMask(TE.ReuseShuffleIndices.begin(),
		TE.ReuseShuffleIndices.end());
		if (TE.getOpcode() == Instruction::ExtractElement && !TE.isAltShuffle() &&
		all_of(TE.Scalars, [Sz](Value *V) {
		Optional<unsigned> Idx = getExtractIndex(cast<Instruction>(V));
		return Idx && *Idx < Sz;
		})) {
		SmallVector<int> ReorderMask(Sz, UndefMaskElem);
		if (TE.ReorderIndices.empty())
		std::iota(ReorderMask.begin(), ReorderMask.end(), 0);
		else
		inversePermutation(TE.ReorderIndices, ReorderMask);
		for (unsigned I = 0; I < VF; ++I) {
		int &Idx = ReusedMask[I];
		if (Idx == UndefMaskElem)
		continue;
		Value *V = TE.Scalars[ReorderMask[Idx]];
		Optional<unsigned> EI = getExtractIndex(cast<Instruction>(V));
		Idx = std::distance(ReorderMask.begin(), find(ReorderMask, *EI));
		}
		}
		// Build the order of the VF size, need to reorder reuses shuffles, they are
		// always of VF size.
		OrdersType ResOrder(VF);
		std::iota(ResOrder.begin(), ResOrder.end(), 0);
		auto *It = ResOrder.begin();
		for (unsigned K = 0; K < VF; K += Sz) {
		OrdersType CurrentOrder(TE.ReorderIndices);
		SmallVector<int> SubMask(makeArrayRef(ReusedMask).slice(K, Sz));
		if (SubMask.front() == UndefMaskElem)
		std::iota(SubMask.begin(), SubMask.end(), 0);
		reorderOrder(CurrentOrder, SubMask);
		transform(CurrentOrder, It, [K](unsigned Pos) { return Pos + K; });
		std::advance(It, Sz);
		}
		if (all_of(enumerate(ResOrder),
		[](const auto &Data) { return Data.index() == Data.value(); }))
		return {}; // Use identity order.
		return ResOrder;
		}
if (TE.State == TreeEntry::Vectorize &&		if (TE.State == TreeEntry::Vectorize &&
(isa<LoadInst, ExtractElementInst, ExtractValueInst>(TE.getMainOp()) \|\|		(isa<LoadInst, ExtractElementInst, ExtractValueInst>(TE.getMainOp()) \|\|
(TopToBottom && isa<StoreInst, InsertElementInst>(TE.getMainOp()))) &&		(TopToBottom && isa<StoreInst, InsertElementInst>(TE.getMainOp()))) &&
!TE.isAltShuffle())		!TE.isAltShuffle())
return TE.ReorderIndices;		return TE.ReorderIndices;
if (TE.State == TreeEntry::NeedToGather) {		if (TE.State == TreeEntry::NeedToGather) {
// TODO: add analysis of other gather nodes with extractelement		// TODO: add analysis of other gather nodes with extractelement
// instructions and other values/instructions, not only undefs.		// instructions and other values/instructions, not only undefs.
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (Optional<OrdersType> CurrentOrder =
if (all_of(UserTE->UserTreeIndices, [](const EdgeInfo &EI) {		if (all_of(UserTE->UserTreeIndices, [](const EdgeInfo &EI) {
return EI.UserTE->State == TreeEntry::Vectorize &&		return EI.UserTE->State == TreeEntry::Vectorize &&
EI.UserTE->isAltShuffle() && EI.UserTE->Idx != 0;		EI.UserTE->isAltShuffle() && EI.UserTE->Idx != 0;
}))		}))
return;		return;
UserTE = UserTE->UserTreeIndices.back().UserTE;		UserTE = UserTE->UserTreeIndices.back().UserTE;
++Cnt;		++Cnt;
}		}
VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());		VFToOrderedEntries[TE->getVectorFactor()].insert(TE.get());
if (TE->State != TreeEntry::Vectorize)		if (TE->State != TreeEntry::Vectorize \|\| !TE->ReuseShuffleIndices.empty())
GathersToOrders.try_emplace(TE.get(), *CurrentOrder);		GathersToOrders.try_emplace(TE.get(), *CurrentOrder);
}		}
});		});

// Reorder the graph nodes according to their vectorization factor.		// Reorder the graph nodes according to their vectorization factor.
for (unsigned VF = VectorizableTree.front()->Scalars.size(); VF > 1;		for (unsigned VF = VectorizableTree.front()->Scalars.size(); VF > 1;
VF /= 2) {		VF /= 2) {
auto It = VFToOrderedEntries.find(VF);		auto It = VFToOrderedEntries.find(VF);
if (It == VFToOrderedEntries.end())		if (It == VFToOrderedEntries.end())
continue;		continue;
// Try to find the most profitable order. We just are looking for the most		// Try to find the most profitable order. We just are looking for the most
// used order and reorder scalar elements in the nodes according to this		// used order and reorder scalar elements in the nodes according to this
// mostly used order.		// mostly used order.
ArrayRef<TreeEntry *> OrderedEntries = It->second.getArrayRef();		ArrayRef<TreeEntry *> OrderedEntries = It->second.getArrayRef();
// All operands are reordered and used only in this node - propagate the		// All operands are reordered and used only in this node - propagate the
// most used order to the user node.		// most used order to the user node.
MapVector<OrdersType, unsigned,		MapVector<OrdersType, unsigned,
DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo>>		DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo>>
OrdersUses;		OrdersUses;
SmallPtrSet<const TreeEntry *, 4> VisitedOps;		SmallPtrSet<const TreeEntry *, 4> VisitedOps;
for (const TreeEntry *OpTE : OrderedEntries) {		for (const TreeEntry *OpTE : OrderedEntries) {
// No need to reorder this nodes, still need to extend and to use shuffle,		// No need to reorder this nodes, still need to extend and to use shuffle,
// just need to merge reordering shuffle and the reuse shuffle.		// just need to merge reordering shuffle and the reuse shuffle.
if (!OpTE->ReuseShuffleIndices.empty())		if (!OpTE->ReuseShuffleIndices.empty() && !GathersToOrders.count(OpTE))
continue;		continue;
// Count number of orders uses.		// Count number of orders uses.
const auto &Order = [OpTE, &GathersToOrders,		const auto &Order = [OpTE, &GathersToOrders,
&AltShufflesToOrders]() -> const OrdersType & {		&AltShufflesToOrders]() -> const OrdersType & {
if (OpTE->State == TreeEntry::NeedToGather) {		if (OpTE->State == TreeEntry::NeedToGather \|\|
		!OpTE->ReuseShuffleIndices.empty()) {
auto It = GathersToOrders.find(OpTE);		auto It = GathersToOrders.find(OpTE);
if (It != GathersToOrders.end())		if (It != GathersToOrders.end())
return It->second;		return It->second;
}		}
if (OpTE->isAltShuffle()) {		if (OpTE->isAltShuffle()) {
auto It = AltShufflesToOrders.find(OpTE);		auto It = AltShufflesToOrders.find(OpTE);
if (It != AltShufflesToOrders.end())		if (It != AltShufflesToOrders.end())
return It->second;		return It->second;
}		}
return OpTE->ReorderIndices;		return OpTE->ReorderIndices;
}();		}();
// First consider the order of the external scalar users.		// First consider the order of the external scalar users.
auto It = ExternalUserReorderMap.find(OpTE);		auto It = ExternalUserReorderMap.find(OpTE);
if (It != ExternalUserReorderMap.end()) {		if (It != ExternalUserReorderMap.end()) {
const auto &ExternalUserReorderIndices = It->second;		const auto &ExternalUserReorderIndices = It->second;
for (const OrdersType &ExtOrder : ExternalUserReorderIndices)		for (const OrdersType &ExtOrder : ExternalUserReorderIndices)
++OrdersUses.insert(std::make_pair(ExtOrder, 0)).first->second;		// If the OpTE vector factor != number of scalars (ExtOrder size) -
		// use natural order, it is an attempt to reorder node with reused
		// scalars but with external uses.
		++OrdersUses
		.insert(std::make_pair(
		OpTE->getVectorFactor() == ExtOrder.size() ? ExtOrder
		: OrdersType(),
		0))
		.first->second;
// No other useful reorder data in this entry.		// No other useful reorder data in this entry.
if (Order.empty())		if (Order.empty())
continue;		continue;
}		}
// Stores actually store the mask, not the order, need to invert.		// Stores actually store the mask, not the order, need to invert.
if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&		if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&
OpTE->getOpcode() == Instruction::Store && !Order.empty()) {		OpTE->getOpcode() == Instruction::Store && !Order.empty()) {
SmallVector<int> Mask;		SmallVector<int> Mask;
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	void BoUpSLP::reorderBottomToTop(bool IgnoreReorder) {
for_each(VectorizableTree, [this, &OrderedEntries, &GathersToOrders,		for_each(VectorizableTree, [this, &OrderedEntries, &GathersToOrders,
&NonVectorized](		&NonVectorized](
const std::unique_ptr<TreeEntry> &TE) {		const std::unique_ptr<TreeEntry> &TE) {
if (TE->State != TreeEntry::Vectorize)		if (TE->State != TreeEntry::Vectorize)
NonVectorized.push_back(TE.get());		NonVectorized.push_back(TE.get());
if (Optional<OrdersType> CurrentOrder =		if (Optional<OrdersType> CurrentOrder =
getReorderingData(TE, /TopToBottom=*/false)) {		getReorderingData(TE, /TopToBottom=*/false)) {
OrderedEntries.insert(TE.get());		OrderedEntries.insert(TE.get());
if (TE->State != TreeEntry::Vectorize)		if (TE->State != TreeEntry::Vectorize \|\| !TE->ReuseShuffleIndices.empty())
GathersToOrders.try_emplace(TE.get(), *CurrentOrder);		GathersToOrders.try_emplace(TE.get(), *CurrentOrder);
}		}
});		});

// 1. Propagate order to the graph nodes, which use only reordered nodes.		// 1. Propagate order to the graph nodes, which use only reordered nodes.
// I.e., if the node has operands, that are reordered, try to make at least		// I.e., if the node has operands, that are reordered, try to make at least
// one operand order in the natural order and reorder others + reorder the		// one operand order in the natural order and reorder others + reorder the
// user node itself.		// user node itself.
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	for (auto &Data : UsersVec) {
// the same node my be considered several times, though might be not		// the same node my be considered several times, though might be not
// profitable.		// profitable.
SmallPtrSet<const TreeEntry *, 4> VisitedOps;		SmallPtrSet<const TreeEntry *, 4> VisitedOps;
SmallPtrSet<const TreeEntry *, 4> VisitedUsers;		SmallPtrSet<const TreeEntry *, 4> VisitedUsers;
for (const auto &Op : Data.second) {		for (const auto &Op : Data.second) {
TreeEntry *OpTE = Op.second;		TreeEntry *OpTE = Op.second;
if (!VisitedOps.insert(OpTE).second)		if (!VisitedOps.insert(OpTE).second)
continue;		continue;
if (!OpTE->ReuseShuffleIndices.empty())		if (!OpTE->ReuseShuffleIndices.empty() && !GathersToOrders.count(OpTE))
continue;		continue;
const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {		const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {
if (OpTE->State == TreeEntry::NeedToGather)		if (OpTE->State == TreeEntry::NeedToGather \|\|
		!OpTE->ReuseShuffleIndices.empty())
return GathersToOrders.find(OpTE)->second;		return GathersToOrders.find(OpTE)->second;
return OpTE->ReorderIndices;		return OpTE->ReorderIndices;
}();		}();
unsigned NumOps = count_if(		unsigned NumOps = count_if(
Data.second, [OpTE](const std::pair<unsigned, TreeEntry *> &P) {		Data.second, [OpTE](const std::pair<unsigned, TreeEntry *> &P) {
return P.second == OpTE;		return P.second == OpTE;
});		});
// Stores actually store the mask, not the order, need to invert.		// Stores actually store the mask, not the order, need to invert.
▲ Show 20 Lines • Show All 1,417 Lines • ▼ Show 20 Lines	case Instruction::ShuffleVector: {
Value *LHS = Cmp->getOperand(0);		Value *LHS = Cmp->getOperand(0);
Value *RHS = Cmp->getOperand(1);		Value *RHS = Cmp->getOperand(1);
CmpInst::Predicate CurrentPred = Cmp->getPredicate();		CmpInst::Predicate CurrentPred = Cmp->getPredicate();
if (P0 == AltP0Swapped) {		if (P0 == AltP0Swapped) {
if (CI != Cmp && S.AltOp != Cmp &&		if (CI != Cmp && S.AltOp != Cmp &&
((P0 == CurrentPred &&		((P0 == CurrentPred &&
!areCompatibleCmpOps(BaseOp0, BaseOp1, LHS, RHS)) \|\|		!areCompatibleCmpOps(BaseOp0, BaseOp1, LHS, RHS)) \|\|
(AltP0 == CurrentPred &&		(AltP0 == CurrentPred &&
areCompatibleCmpOps(BaseOp0, BaseOp1, LHS, RHS))))		!areCompatibleCmpOps(BaseOp0, BaseOp1, LHS, RHS))))
std::swap(LHS, RHS);		std::swap(LHS, RHS);
} else if (P0 != CurrentPred && AltP0 != CurrentPred) {		} else if (P0 != CurrentPred && AltP0 != CurrentPred) {
std::swap(LHS, RHS);		std::swap(LHS, RHS);
}		}
Left.push_back(LHS);		Left.push_back(LHS);
Right.push_back(RHS);		Right.push_back(RHS);
}		}
}		}
▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	static bool isAlternateInstruction(const Instruction *I,
if (auto *CI0 = dyn_cast<CmpInst>(MainOp)) {		if (auto *CI0 = dyn_cast<CmpInst>(MainOp)) {
auto *AltCI0 = cast<CmpInst>(AltOp);		auto *AltCI0 = cast<CmpInst>(AltOp);
auto *CI = cast<CmpInst>(I);		auto *CI = cast<CmpInst>(I);
CmpInst::Predicate P0 = CI0->getPredicate();		CmpInst::Predicate P0 = CI0->getPredicate();
CmpInst::Predicate AltP0 = AltCI0->getPredicate();		CmpInst::Predicate AltP0 = AltCI0->getPredicate();
assert(P0 != AltP0 && "Expected different main/alternate predicates.");		assert(P0 != AltP0 && "Expected different main/alternate predicates.");
CmpInst::Predicate AltP0Swapped = CmpInst::getSwappedPredicate(AltP0);		CmpInst::Predicate AltP0Swapped = CmpInst::getSwappedPredicate(AltP0);
CmpInst::Predicate CurrentPred = CI->getPredicate();		CmpInst::Predicate CurrentPred = CI->getPredicate();
if (P0 == AltP0Swapped)		if (P0 == AltP0Swapped)
		vdmitrieUnsubmitted Not Done Reply Inline Actions The main issue with existing code here which makes me nervous is that main/alternate logic is not consistent across SLP. getSameOpcode has some clear enough approach but its logic does not match one here [and another similar code]. Unfortunately this patch does not fix that. That is maintainability issue too. For isAlternateInstruction, for example, there are couple of check-points: it shall answer true if I == AltOp and false when I == MainOp. But it is very difficult to figure from this code whether it will work this way. Without clear description it is pretty difficult to reverse engineer the intent. I put a draft patch here https://reviews.llvm.org/D133430 - as a possible approach to the problem. It works very similar to this patch (not taking into account reordering improvement) but has consistent approach to main/alternate selection. There were couple of minor differences when an instruction could fit both main and alternate flows. vdmitrie: The main issue with existing code here which makes me nervous is that main/alternate logic is…
return I == AltCI0 \|\|		return I == AltCI0 \|\|
(I != MainOp &&		(I != CI0 && AltP0 == CurrentPred &&
		areCompatibleCmpOps(CI0->getOperand(0), CI0->getOperand(1),
		CI->getOperand(0), CI->getOperand(1))) \|\|
		(I != CI0 && P0 == CurrentPred &&
!areCompatibleCmpOps(CI0->getOperand(0), CI0->getOperand(1),		!areCompatibleCmpOps(CI0->getOperand(0), CI0->getOperand(1),
CI->getOperand(0), CI->getOperand(1)));		CI->getOperand(0), CI->getOperand(1)));
return AltP0 == CurrentPred \|\| AltP0Swapped == CurrentPred;		return AltP0 == CurrentPred \|\| AltP0Swapped == CurrentPred;
}		}
return I->getOpcode() == AltOp->getOpcode();		return I->getOpcode() == AltOp->getOpcode();
}		}

TTI::OperandValueInfo BoUpSLP::getOperandInfo(ArrayRef<Value *> VL,		TTI::OperandValueInfo BoUpSLP::getOperandInfo(ArrayRef<Value *> VL,
▲ Show 20 Lines • Show All 6,733 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4sf(<4 x float> %t) {			define float @test_merge_anyof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4sf(			; CHECK-LABEL: @test_merge_anyof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x float> [[T:%.]]			; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x float> [[T:%.]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer			; CHECK-NEXT: [[TMP0:%.*]] = fcmp ogt <8 x float> [[SHUFFLE]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>
	; CHECK-NEXT: [[TMP1:%.*]] = fcmp ogt <4 x float> [[T_FR]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP1:%.*]] = fcmp olt <8 x float> [[SHUFFLE]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>
	; CHECK-NEXT: [[TMP2:%.*]] = or <4 x i1> [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4			; CHECK-NEXT: [[TMP3:%.*]] = freeze <8 x i1> [[TMP2]]
	; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i4 [[TMP3]], 0			; CHECK-NEXT: [[TMP4:%.*]] = bitcast <8 x i1> [[TMP3]] to i8
	; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[T_FR]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[T_FR]], [[SHIFT]]			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[T]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x float> [[TMP4]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[SHIFT]], [[T]]
				; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x float> [[TMP5]], i64 0
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[ADD]], float 0.000000e+00			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[ADD]], float 0.000000e+00
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %t, i32 0			%vecext = extractelement <4 x float> %t, i32 0
	%conv = fpext float %vecext to double			%conv = fpext float %vecext to double
	%cmp = fcmp olt double %conv, 0.000000e+00			%cmp = fcmp olt double %conv, 0.000000e+00
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false
	▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4si(<4 x i32> %t) {			define float @test_merge_anyof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4si(			; CHECK-LABEL: @test_merge_anyof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x i32> [[T:%.]]			; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[T:%.]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add <4 x i32> [[T_FR]], <i32 -256, i32 -256, i32 -256, i32 -256>			; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt <8 x i32> [[SHUFFLE]], <i32 255, i32 255, i32 255, i32 255, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <4 x i32> [[TMP0]], <i32 -255, i32 -255, i32 -255, i32 -255>			; CHECK-NEXT: [[TMP1:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], <i32 255, i32 255, i32 255, i32 255, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = bitcast <4 x i1> [[TMP1]] to i4			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i4 [[TMP2]], 0			; CHECK-NEXT: [[TMP3:%.*]] = freeze <8 x i1> [[TMP2]]
	; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T_FR]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = bitcast <8 x i1> [[TMP3]] to i8
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[T_FR]], [[SHIFT]]			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP4]], 0
	; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP3]], i64 0			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <4 x i32> [[SHIFT]], [[T]]
				; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP5]], i64 0
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[CONV]], float 0.000000e+00			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[CONV]], float 0.000000e+00
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	%cmp = icmp slt i32 %vecext, 1			%cmp = icmp slt i32 %vecext, 1
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false
	▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cmp-swapped-pred.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -S \| FileCheck %s

	define i16 @test(i16 %call37) {			define i16 @test(i16 %call37) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2			; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0>, i16 [[CALL37:%.]], i32 4			; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 poison, i16 poison>, i16 [[CALL37:%.]], i32 3
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i16> <i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 poison, i16 0>, i16 [[CALL37]], i32 3			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 3, i32 4, i32 3, i32 5>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i16> [[TMP2]], i16 [[CALL37]], i32 6			; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <8 x i16> [[SHUFFLE]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <8 x i16> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <8 x i16> [[SHUFFLE]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <8 x i16> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 5, i32 6, i32 15>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 12, i32 5, i32 14, i32 7>			; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i1> [[TMP4]] to <8 x i16>
	; CHECK-NEXT: [[TMP7:%.*]] = zext <8 x i1> [[TMP6]] to <8 x i16>			; CHECK-NEXT: [[TMP6:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP5]])
	; CHECK-NEXT: [[TMP8:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP7]])			; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP6]], 0
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP8]], 0
	; CHECK-NEXT: ret i16 [[OP_RDX]]			; CHECK-NEXT: ret i16 [[OP_RDX]]
	;			;
	entry:			entry:
	%call = load i16, i16* undef, align 2			%call = load i16, i16* undef, align 2
	%0 = icmp slt i16 %call, 0			%0 = icmp slt i16 %call, 0
	%cond = zext i1 %0 to i16			%cond = zext i1 %0 to i16
	%1 = add i16 %cond, 0			%1 = add i16 %cond, 0
	%2 = icmp slt i16 0, 0			%2 = icmp slt i16 0, 0
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp-as-alternate-ops.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	define { <2 x float>, <2 x float> } @test1(i32 %conv.i32.i.i.i) {			define { <2 x float>, <2 x float> } @test1(i32 %conv.i32.i.i.i) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV_I32_I_I_I1:%.*]] = fptosi float 0.000000e+00 to i32			; CHECK-NEXT: [[CONV_I32_I_I_I1:%.*]] = fptosi float 0.000000e+00 to i32
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 poison, i32 0, i32 poison, i32 0>, i32 [[CONV_I32_I_I_I:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 poison, i32 0, i32 poison, i32 0>, i32 [[CONV_I32_I_I_I:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[CONV_I32_I_I_I1]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[CONV_I32_I_I_I1]], i32 2
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[TMP1]], zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[TMP1]], zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[TMP1]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[TMP1]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i1> [[TMP2]], <4 x i1> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i1> [[TMP2]], <4 x i1> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x float> zeroinitializer, <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x float> zeroinitializer, <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> zeroinitializer, <2 x float> [[TMP7]], <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> zeroinitializer, <2 x float> [[TMP7]], <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> zeroinitializer, <2 x float> [[TMP9]], <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> zeroinitializer, <2 x float> [[TMP9]], <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } zeroinitializer, <2 x float> [[TMP8]], 0			; CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } zeroinitializer, <2 x float> [[TMP8]], 0
	; CHECK-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[DOTFCA_0_INSERT]], <2 x float> [[TMP10]], 1			; CHECK-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[DOTFCA_0_INSERT]], <2 x float> [[TMP10]], 1
	Show All 26 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	%struct.sw = type { float, float, float, float }			%struct.sw = type { float, float, float, float }

	define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {			define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0			; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16			; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef			; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP11]], 0
				; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP12]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	;			;
	entry:			entry:
	%0 = load float, float* undef, align 4			%0 = load float, float* undef, align 4
	%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0			%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0
	%1 = load float, float* %x, align 16			%1 = load float, float* %x, align 16
	%y = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 1			%y = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 1
	%2 = load float, float* %y, align 4			%2 = load float, float* %y, align 4
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
}		}

; TODO: This is better than all-scalar and still safe,		; TODO: This is better than all-scalar and still safe,
; but we want this to be 2 reductions with glue		; but we want this to be 2 reductions with glue
; logic...or a wide reduction?		; logic...or a wide reduction?

define i1 @logical_and_icmp_clamp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp(		; CHECK-LABEL: @logical_and_icmp_clamp(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[X:%.]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <8 x i32> [[SHUFFLE]], <i32 17, i32 17, i32 17, i32 17, i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP3:%.*]] = freeze <4 x i1> [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], <i32 17, i32 17, i32 17, i32 17, i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP4:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP3]])		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP1]]		; CHECK-NEXT: [[TMP4:%.*]] = freeze <8 x i1> [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])		; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP4]])
; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP4]], i1 [[TMP6]], i1 false		; CHECK-NEXT: ret i1 [[TMP5]]
		RKSimonUnsubmitted Not Done Reply Inline Actions The "<0,1,2,3,1,0,2,3>" swizzle in the upper half is a regression - I noticed it on a couple of other test changes as well. RKSimon: The "<0,1,2,3,1,0,2,3>" swizzle in the upper half is a regression - I noticed it on a couple…
; CHECK-NEXT: ret i1 [[OP_RDX]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 9 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_cmp(		; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_cmp(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[X:%.]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2		; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <8 x i32> [[SHUFFLE]], <i32 17, i32 17, i32 17, i32 17, i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: call void @use1(i1 [[TMP2]])		; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], <i32 17, i32 17, i32 17, i32 17, i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP4:%.*]] = freeze <4 x i1> [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i1> [[TMP3]], i32 6
; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP4]])		; CHECK-NEXT: call void @use1(i1 [[TMP4]])
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP1]]		; CHECK-NEXT: [[TMP5:%.*]] = freeze <8 x i1> [[TMP3]]
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])		; CHECK-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP5]])
; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP5]], i1 [[TMP7]], i1 false		; CHECK-NEXT: ret i1 [[TMP6]]
; CHECK-NEXT: ret i1 [[OP_RDX]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0
; CHECK-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1		; CHECK-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2		; CHECK-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2
; CHECK-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3		; CHECK-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3
; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0		; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0
; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1		; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1
; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2		; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2
; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3		; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[X1]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[X0]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[X1]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[X2]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[X2]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[X3]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[X3]], i32 3
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 1, i32 0, i32 2, i32 3>		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[Y0]], i32 4		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[Y0]], i32 4
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[Y1]], i32 5		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[Y1]], i32 5
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[Y2]], i32 6		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[Y2]], i32 6
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[Y3]], i32 7		; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[Y3]], i32 7
; CHECK-NEXT: [[TMP9:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], [[TMP8]]		; CHECK-NEXT: [[TMP9:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], [[TMP8]]
; CHECK-NEXT: [[TMP10:%.*]] = freeze <8 x i1> [[TMP9]]		; CHECK-NEXT: [[TMP10:%.*]] = freeze <8 x i1> [[TMP9]]
; CHECK-NEXT: [[TMP11:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP10]])		; CHECK-NEXT: [[TMP11:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP10]])
; CHECK-NEXT: ret i1 [[TMP11]]		; CHECK-NEXT: ret i1 [[TMP11]]
▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines