This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
3/8
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
-
landing_pad.ll
-
phi3.ll
-
pr35497.ll
-
remark_extract_broadcast.ll
-
vectorize-pair-path.ll

Differential D144689

[SLP]Improve handling gathers/buildvectors with undefs.
ClosedPublic

Authored by ABataev on Feb 23 2023, 5:45 PM.

Download Raw Diff

Details

Reviewers

RKSimon
vdmitrie

Commits

rGf1c8b72c13f1: [SLP]Improve handling gathers/buildvectors with undefs.

Summary

If have just one non-undef scalar in the buildvector/gather node, we try
to put it to be the very first element, which is profitable in most
cases. Do the preliminary estimation, if this more profitable during
graph rotation and do same for all elements, including extractelements.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,050 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

ABataev created this revision.Feb 23 2023, 5:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2023, 5:45 PM

Herald added subscribers: vporpo, hiraditya. · View Herald Transcript

ABataev requested review of this revision.Feb 23 2023, 5:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2023, 5:45 PM

Herald added a subscriber: • pcwang-thead. · View Herald Transcript

vdmitrie added inline comments.Feb 23 2023, 6:48 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1449	This needs a bit of explanation (a comment).
4188	Could you please clarify the difference between returning empty container vs std::nullopt? The description comment for getReorderingData method does not mention this distinction.
4192–4206	Just for the sake of better readability can you rearrange the code to add few variables and break down into pieces that jumbo if condition, please? Like for example here: ` unsigned Idx = std::distance(TE.Scalars.begin(), It); Order[Idx] = 0; ` ... ` InstructionCost PermuteCost = TopToBottom ? 0 : TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty, Mask); InstructionCost InsertFirstCost = TTI->getVectorInstrCost( Instruction::InsertElement, Ty, TTI::TCK_RecipThroughput, 0, PoisonValue::get(Ty), It); InstructionCost InsertIdxCost = TTI->getVectorInstrCost( Instruction::InsertElement, Ty, TTI::TCK_RecipThroughput, Idx, PoisonValue::get(Ty), It); if (InsertFirstCost + PermuteCost < InsertIdxCost) return Order; `

Harbormaster completed remote builds in B215630: Diff 500019.Feb 23 2023, 6:53 PM

ABataev added inline comments.Feb 24 2023, 8:27 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1449	We can easily combine `poison and extractelement <non-poison>` or `undef and extractelement <poison>`. But combining `undef + extractelement <non-poison-but-my-produce-poison>` requires some extra operations and it is not very effective to combine such elements (to preserve the difference between undefs and poison), rather than extractelement from the same EV1, even in reversed order.
4188	std::nullopt means that the ordering is not important for the node, empty - prefer identity order. I'll add it to the description of the function.

Address comments

vdmitrie accepted this revision.Feb 24 2023, 10:27 AM

vdmitrie added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1449	Thanks for explanation. I just meant to put it into the code as a comment.
4188	Thanks. That is what I thought but wasn't absolutely sure.

This revision is now accepted and ready to land.Feb 24 2023, 10:27 AM

ABataev added inline comments.Feb 24 2023, 10:28 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1449	Ok, will do

This revision was landed with ongoing or failed builds.Feb 24 2023, 1:21 PM

Closed by commit rGf1c8b72c13f1: [SLP]Improve handling gathers/buildvectors with undefs. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGf1c8b72c13f1: [SLP]Improve handling gathers/buildvectors with undefs..

Harbormaster completed remote builds in B215775: Diff 500219.Feb 24 2023, 4:58 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

49 lines

test/

Transforms/

SLPVectorizer/

X86/

landing_pad.ll

15 lines

phi3.ll

8 lines

pr35497.ll

23 lines

remark_extract_broadcast.ll

2 lines

vectorize-pair-path.ll

20 lines

Diff 500019

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 548 Lines • ▼ Show 20 Lines
tryToGatherExtractElements(SmallVectorImpl<Value *> &VL,		tryToGatherExtractElements(SmallVectorImpl<Value *> &VL,
SmallVectorImpl<int> &Mask) {		SmallVectorImpl<int> &Mask) {
// Scan list of gathered scalars for extractelements that can be represented		// Scan list of gathered scalars for extractelements that can be represented
// as shuffles.		// as shuffles.
MapVector<Value *, SmallVector<int>> VectorOpToIdx;		MapVector<Value *, SmallVector<int>> VectorOpToIdx;
SmallVector<int> UndefVectorExtracts;		SmallVector<int> UndefVectorExtracts;
for (int I = 0, E = VL.size(); I < E; ++I) {		for (int I = 0, E = VL.size(); I < E; ++I) {
auto *EI = dyn_cast<ExtractElementInst>(VL[I]);		auto *EI = dyn_cast<ExtractElementInst>(VL[I]);
if (!EI)		if (!EI) {
		if (isa<UndefValue>(VL[I]))
		UndefVectorExtracts.push_back(I);
continue;		continue;
		}
auto *VecTy = dyn_cast<FixedVectorType>(EI->getVectorOperandType());		auto *VecTy = dyn_cast<FixedVectorType>(EI->getVectorOperandType());
if (!VecTy \|\| !isa<ConstantInt, UndefValue>(EI->getIndexOperand()))		if (!VecTy \|\| !isa<ConstantInt, UndefValue>(EI->getIndexOperand()))
continue;		continue;
std::optional<unsigned> Idx = getExtractIndex(EI);		std::optional<unsigned> Idx = getExtractIndex(EI);
// Undefined index.		// Undefined index.
if (!Idx) {		if (!Idx) {
UndefVectorExtracts.push_back(I);		UndefVectorExtracts.push_back(I);
continue;		continue;
▲ Show 20 Lines • Show All 871 Lines • ▼ Show 20 Lines	int getShallowScore(Value V1, Value V2, Instruction U1, Instruction U2,
return LookAheadHeuristics::ScoreConstants;		return LookAheadHeuristics::ScoreConstants;

// Extracts from consecutive indexes of the same vector better score as		// Extracts from consecutive indexes of the same vector better score as
// the extracts could be optimized away.		// the extracts could be optimized away.
Value *EV1;		Value *EV1;
ConstantInt *Ex1Idx;		ConstantInt *Ex1Idx;
if (match(V1, m_ExtractElt(m_Value(EV1), m_ConstantInt(Ex1Idx)))) {		if (match(V1, m_ExtractElt(m_Value(EV1), m_ConstantInt(Ex1Idx)))) {
// Undefs are always profitable for extractelements.		// Undefs are always profitable for extractelements.
if (isa<UndefValue>(V2))		if (isa<UndefValue>(V2))
		vdmitrieUnsubmitted Not Done Reply Inline Actions This needs a bit of explanation (a comment). vdmitrie: This needs a bit of explanation (a comment).
		ABataevAuthorUnsubmitted Done Reply Inline Actions We can easily combine `poison and extractelement <non-poison>` or `undef and extractelement <poison>`. But combining `undef + extractelement <non-poison-but-my-produce-poison>` requires some extra operations and it is not very effective to combine such elements (to preserve the difference between undefs and poison), rather than extractelement from the same EV1, even in reversed order. ABataev: We can easily combine `poison and extractelement <non-poison>` or `undef and extractelement…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Thanks for explanation. I just meant to put it into the code as a comment. vdmitrie: Thanks for explanation. I just meant to put it into the code as a comment.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ok, will do ABataev: Ok, will do
return LookAheadHeuristics::ScoreConsecutiveExtracts;		return (isa<PoisonValue>(V2) \|\| isUndefVector(EV1).all())
		? LookAheadHeuristics::ScoreConsecutiveExtracts
		: LookAheadHeuristics::ScoreSameOpcode;
Value *EV2 = nullptr;		Value *EV2 = nullptr;
ConstantInt *Ex2Idx = nullptr;		ConstantInt *Ex2Idx = nullptr;
if (match(V2,		if (match(V2,
m_ExtractElt(m_Value(EV2), m_CombineOr(m_ConstantInt(Ex2Idx),		m_ExtractElt(m_Value(EV2), m_CombineOr(m_ConstantInt(Ex2Idx),
m_Undef())))) {		m_Undef())))) {
// Undefs are always profitable for extractelements.		// Undefs are always profitable for extractelements.
if (!Ex2Idx)		if (!Ex2Idx)
return LookAheadHeuristics::ScoreConsecutiveExtracts;		return LookAheadHeuristics::ScoreConsecutiveExtracts;
▲ Show 20 Lines • Show All 2,707 Lines • ▼ Show 20 Lines	if (((TE.getOpcode() == Instruction::ExtractElement &&
OrdersType CurrentOrder;		OrdersType CurrentOrder;
bool Reuse = canReuseExtract(TE.Scalars, TE.getMainOp(), CurrentOrder);		bool Reuse = canReuseExtract(TE.Scalars, TE.getMainOp(), CurrentOrder);
if (Reuse \|\| !CurrentOrder.empty()) {		if (Reuse \|\| !CurrentOrder.empty()) {
if (!CurrentOrder.empty())		if (!CurrentOrder.empty())
fixupOrderingIndices(CurrentOrder);		fixupOrderingIndices(CurrentOrder);
return CurrentOrder;		return CurrentOrder;
}		}
}		}
		// If the gather node is <undef, v, .., poison> and
		// insertelement poison, v, 0 [+ permute]
		// is cheaper than
		// insertelement poison, v, n - try to reorder.
		// If rotating the whole graph, exclude the permute cost, the whole graph
		// might be transformed.
		int Sz = TE.Scalars.size();
		if (isSplat(TE.Scalars) && !allConstant(TE.Scalars) &&
		count_if(TE.Scalars, UndefValue::classof) == Sz - 1) {
		const auto *It =
		find_if(TE.Scalars, [](Value *V) { return !isConstant(V); });
		if (It == TE.Scalars.begin())
		return {};
		vdmitrieUnsubmitted Not Done Reply Inline Actions Could you please clarify the difference between returning empty container vs std::nullopt? The description comment for getReorderingData method does not mention this distinction. vdmitrie: Could you please clarify the difference between returning empty container vs std::nullopt? The…
		ABataevAuthorUnsubmitted Done Reply Inline Actions std::nullopt means that the ordering is not important for the node, empty - prefer identity order. I'll add it to the description of the function. ABataev: std::nullopt means that the ordering is not important for the node, empty - prefer identity…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Thanks. That is what I thought but wasn't absolutely sure. vdmitrie: Thanks. That is what I thought but wasn't absolutely sure.
		auto *Ty = FixedVectorType::get(TE.Scalars.front()->getType(), Sz);
		if (It != TE.Scalars.end()) {
		OrdersType Order(Sz, Sz);
		Order[std::distance(TE.Scalars.begin(), It)] = 0;
		fixupOrderingIndices(Order);
		SmallVector<int> Mask;
		inversePermutation(Order, Mask);
		if (TTI->getVectorInstrCost(Instruction::InsertElement, Ty,
		TTI::TCK_RecipThroughput, 0,
		PoisonValue::get(Ty), *It) +
		(TopToBottom ? 0
		: TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
		Mask)) <
		TTI->getVectorInstrCost(Instruction::InsertElement, Ty,
		TTI::TCK_RecipThroughput,
		std::distance(TE.Scalars.begin(), It),
		PoisonValue::get(Ty), *It))
		return Order;
		vdmitrieUnsubmitted Not Done Reply Inline Actions Just for the sake of better readability can you rearrange the code to add few variables and break down into pieces that jumbo if condition, please? Like for example here: ` unsigned Idx = std::distance(TE.Scalars.begin(), It); Order[Idx] = 0; ` ... ` InstructionCost PermuteCost = TopToBottom ? 0 : TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty, Mask); InstructionCost InsertFirstCost = TTI->getVectorInstrCost( Instruction::InsertElement, Ty, TTI::TCK_RecipThroughput, 0, PoisonValue::get(Ty), It); InstructionCost InsertIdxCost = TTI->getVectorInstrCost( Instruction::InsertElement, Ty, TTI::TCK_RecipThroughput, Idx, PoisonValue::get(Ty), It); if (InsertFirstCost + PermuteCost < InsertIdxCost) return Order; ` vdmitrie: Just for the sake of better readability can you rearrange the code to add few variables and…
		}
		}
if (std::optional<OrdersType> CurrentOrder = findReusedOrderedScalars(TE))		if (std::optional<OrdersType> CurrentOrder = findReusedOrderedScalars(TE))
return CurrentOrder;		return CurrentOrder;
if (TE.Scalars.size() >= 4)		if (TE.Scalars.size() >= 4)
if (std::optional<OrdersType> Order = findPartiallyOrderedLoads(TE))		if (std::optional<OrdersType> Order = findPartiallyOrderedLoads(TE))
return Order;		return Order;
}		}
return std::nullopt;		return std::nullopt;
}		}
▲ Show 20 Lines • Show All 2,754 Lines • ▼ Show 20 Lines	if (isSplat(VL)) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return !isa<UndefValue>(V); });		find_if(VL, [](Value *V) { return !isa<UndefValue>(V); });
// If all values are undefs - consider cost free.		// If all values are undefs - consider cost free.
if (It == VL.end())		if (It == VL.end())
return TTI::TCC_Free;		return TTI::TCC_Free;
// Add broadcast for non-identity shuffle only.		// Add broadcast for non-identity shuffle only.
bool NeedShuffle =		bool NeedShuffle =
VL.front() != *It \|\| !all_of(VL.drop_front(), UndefValue::classof);		VL.front() != *It \|\| !all_of(VL.drop_front(), UndefValue::classof);
InstructionCost InsertCost =		InstructionCost InsertCost = TTI->getVectorInstrCost(
TTI->getVectorInstrCost(Instruction::InsertElement, VecTy, CostKind,		Instruction::InsertElement, VecTy, CostKind,
/Index=/0, PoisonValue::get(VecTy), *It);		NeedShuffle ? 0 : std::distance(VL.begin(), It),
		PoisonValue::get(VecTy), *It);
return InsertCost + (NeedShuffle		return InsertCost + (NeedShuffle
? TTI->getShuffleCost(		? TTI->getShuffleCost(
TargetTransformInfo::SK_Broadcast, VecTy,		TargetTransformInfo::SK_Broadcast, VecTy,
/Mask=/std::nullopt, CostKind,		/Mask=/std::nullopt, CostKind,
/Index=/0,		/Index=/0,
/SubTp=/nullptr, /Args=/VL[0])		/SubTp=/nullptr, /Args=/VL[0])
: TTI::TCC_Free);		: TTI::TCC_Free);
}		}
▲ Show 20 Lines • Show All 7,532 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/landing_pad.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -S \| FileCheck %s

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @foo() personality ptr @bar {			define void @foo() personality ptr @bar {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2.loopexit:			; CHECK: bb2.loopexit:
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = phi <4 x i32> [ [[SHUFFLE:%.]], [[BB9:%.]] ], [ poison, [[BB2_LOOPEXIT:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <4 x i32> [ [[TMP10:%.]], [[BB9:%.]] ], [ poison, [[BB2_LOOPEXIT:%.]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP5:%.]], [[BB6:%.]] ], [ poison, [[BB1:%.]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP5:%.]], [[BB6:%.]] ], [ poison, [[BB1:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = invoke i32 poison(ptr addrspace(1) nonnull poison, i32 0, i32 0, i32 poison) [ "deopt"() ]			; CHECK-NEXT: [[TMP2:%.*]] = invoke i32 poison(ptr addrspace(1) nonnull poison, i32 0, i32 0, i32 poison) [ "deopt"() ]
	; CHECK-NEXT: to label [[BB4:%.]] unwind label [[BB10:%.]]			; CHECK-NEXT: to label [[BB4:%.]] unwind label [[BB10:%.]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: br i1 poison, label [[BB11:%.]], label [[BB5:%.]]			; CHECK-NEXT: br i1 poison, label [[BB11:%.]], label [[BB5:%.]]
	; CHECK: bb5:			; CHECK: bb5:
	; CHECK-NEXT: br label [[BB7:%.*]]			; CHECK-NEXT: br label [[BB7:%.*]]
	; CHECK: bb6:			; CHECK: bb6:
	; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ <i32 0, i32 poison>, [[BB8:%.]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ <i32 0, i32 poison>, [[BB8:%.]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 1			; CHECK-NEXT: [[TMP5]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 1
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb7:			; CHECK: bb7:
	; CHECK-NEXT: [[LOCAL_5_84111:%.*]] = phi i32 [ poison, [[BB8]] ], [ poison, [[BB5]] ]			; CHECK-NEXT: [[LOCAL_5_84111:%.*]] = phi i32 [ poison, [[BB8]] ], [ poison, [[BB5]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOCAL_5_84111]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOCAL_5_84111]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = invoke i32 poison(ptr addrspace(1) nonnull poison, i32 poison, i32 poison, i32 poison) [ "deopt"() ]			; CHECK-NEXT: [[TMP7:%.*]] = invoke i32 poison(ptr addrspace(1) nonnull poison, i32 poison, i32 poison, i32 poison) [ "deopt"() ]
	; CHECK-NEXT: to label [[BB8]] unwind label [[BB12:%.*]]			; CHECK-NEXT: to label [[BB8]] unwind label [[BB12:%.*]]
	; CHECK: bb8:			; CHECK: bb8:
	; CHECK-NEXT: br i1 poison, label [[BB7]], label [[BB6]]			; CHECK-NEXT: br i1 poison, label [[BB7]], label [[BB6]]
	; CHECK: bb9:			; CHECK: bb9:
	; CHECK-NEXT: [[INDVARS_IV528799:%.*]] = phi i64 [ poison, [[BB10]] ], [ poison, [[BB12]] ]			; CHECK-NEXT: [[INDVARS_IV528799:%.*]] = phi i64 [ poison, [[BB10]] ], [ poison, [[BB12]] ]
	; CHECK-NEXT: [[TMP8:%.]] = phi <2 x i32> [ [[SHUFFLE1:%.]], [[BB10]] ], [ [[TMP11:%.*]], [[BB12]] ]			; CHECK-NEXT: [[TMP8:%.]] = phi <2 x i32> [ [[TMP11:%.]], [[BB10]] ], [ [[TMP12:%.*]], [[BB12]] ]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[SHUFFLE]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			; CHECK-NEXT: [[TMP10]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	; CHECK: bb10:			; CHECK: bb10:
	; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x i32> [ [[TMP1]], [[BB3]] ]			; CHECK-NEXT: [[TMP11]] = phi <2 x i32> [ [[TMP1]], [[BB3]] ]
	; CHECK-NEXT: [[LANDING_PAD68:%.*]] = landingpad { ptr, i32 }			; CHECK-NEXT: [[LANDING_PAD68:%.*]] = landingpad { ptr, i32 }
	; CHECK-NEXT: cleanup			; CHECK-NEXT: cleanup
	; CHECK-NEXT: [[SHUFFLE1]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: br label [[BB9]]			; CHECK-NEXT: br label [[BB9]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb12:			; CHECK: bb12:
	; CHECK-NEXT: [[TMP11]] = phi <2 x i32> [ [[TMP6]], [[BB7]] ]			; CHECK-NEXT: [[TMP12]] = phi <2 x i32> [ [[TMP6]], [[BB7]] ]
	; CHECK-NEXT: [[LANDING_PAD149:%.*]] = landingpad { ptr, i32 }			; CHECK-NEXT: [[LANDING_PAD149:%.*]] = landingpad { ptr, i32 }
	; CHECK-NEXT: cleanup			; CHECK-NEXT: cleanup
	; CHECK-NEXT: br label [[BB9]]			; CHECK-NEXT: br label [[BB9]]
	;			;
	bb1:			bb1:
	br label %bb3			br label %bb3

	bb2.loopexit:			bb2.loopexit:
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi3.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=slp-vectorizer,dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s		; RUN: opt < %s -passes=slp-vectorizer,dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"		target triple = "x86_64-apple-macosx10.8.0"

%struct.GPar.0.16.26 = type { [0 x double], double }		%struct.GPar.0.16.26 = type { [0 x double], double }

@d = external global double, align 8		@d = external global double, align 8

declare ptr @Rf_gpptr(...)		declare ptr @Rf_gpptr(...)

define void @Rf_GReset() {		define void @Rf_GReset() {
; CHECK-LABEL: @Rf_GReset(		; CHECK-LABEL: @Rf_GReset(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr @d, align 8		; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr @d, align 8
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP0]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, [[TMP1]]
; CHECK-NEXT: br i1 icmp eq (ptr inttoptr (i64 115 to ptr), ptr @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]		; CHECK-NEXT: br i1 icmp eq (ptr inttoptr (i64 115 to ptr), ptr @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef		; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef
; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], undef		; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], undef
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP6]], [[TMP5]]
; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]
; CHECK: if.then6:		; CHECK: if.then6:
; CHECK-NEXT: br label [[IF_END7]]		; CHECK-NEXT: br label [[IF_END7]]
; CHECK: if.end7:		; CHECK: if.end7:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%sub = fsub double -0.000000e+00, undef		%sub = fsub double -0.000000e+00, undef
Show All 16 Lines	if.end7: ; preds = %if.then6, %if.then, %entry
%g.0 = phi double [ 0.000000e+00, %if.then6 ], [ %sub, %if.then ], [ %sub, %entry ]		%g.0 = phi double [ 0.000000e+00, %if.then6 ], [ %sub, %if.then ], [ %sub, %entry ]
ret void		ret void
}		}

define void @Rf_GReset_unary_fneg() {		define void @Rf_GReset_unary_fneg() {
; CHECK-LABEL: @Rf_GReset_unary_fneg(		; CHECK-LABEL: @Rf_GReset_unary_fneg(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr @d, align 8		; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr @d, align 8
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP0]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = fneg <2 x double> [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = fneg <2 x double> [[TMP1]]
; CHECK-NEXT: br i1 icmp eq (ptr inttoptr (i64 115 to ptr), ptr @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]		; CHECK-NEXT: br i1 icmp eq (ptr inttoptr (i64 115 to ptr), ptr @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef		; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef
; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], undef		; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], undef
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP6]], [[TMP5]]
; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]
; CHECK: if.then6:		; CHECK: if.then6:
; CHECK-NEXT: br label [[IF_END7]]		; CHECK-NEXT: br label [[IF_END7]]
; CHECK: if.end7:		; CHECK: if.end7:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%sub = fneg double undef		%sub = fneg double undef
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; Function Attrs: norecurse nounwind uwtable			; Function Attrs: norecurse nounwind uwtable
	define void @pr35497() local_unnamed_addr #0 {			define void @pr35497() local_unnamed_addr #0 {
	; SSE-LABEL: @pr35497(			; SSE-LABEL: @pr35497(
	; SSE-NEXT: entry:			; SSE-NEXT: entry:
	; SSE-NEXT: [[TMP0:%.*]] = load i64, ptr undef, align 1			; SSE-NEXT: [[TMP0:%.*]] = load i64, ptr undef, align 1
	; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef			; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; SSE-NEXT: store i64 [[ADD]], ptr undef, align 1			; SSE-NEXT: store i64 [[ADD]], ptr undef, align 1
	; SSE-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4			; SSE-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4
	; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 poison, i64 undef>, i64 [[TMP0]], i32 0
	; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
	; SSE-NEXT: store <2 x i64> [[TMP4]], ptr undef, align 1			; SSE-NEXT: [[TMP5:%.*]] = add nuw nsw <2 x i64> [[TMP4]], zeroinitializer
	; SSE-NEXT: [[TMP5:%.*]] = insertelement <2 x i64> poison, i64 [[ADD]], i32 0			; SSE-NEXT: store <2 x i64> [[TMP5]], ptr undef, align 1
	; SSE-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>			; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[ADD]], i32 0
	; SSE-NEXT: [[TMP7:%.*]] = shl <2 x i64> [[TMP6]], <i64 2, i64 2>			; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
	; SSE-NEXT: [[TMP8:%.*]] = and <2 x i64> [[TMP7]], <i64 20, i64 20>			; SSE-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i64> [[TMP8]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>			; SSE-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>
	; SSE-NEXT: [[TMP10:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i64> [[TMP9]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
	; SSE-NEXT: [[TMP11:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP10]]			; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP5]], <i64 6, i64 6>
	; SSE-NEXT: store <2 x i64> [[TMP11]], ptr [[ARRAYIDX2_2]], align 1			; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP10]], [[TMP11]]
				; SSE-NEXT: store <2 x i64> [[TMP12]], ptr [[ARRAYIDX2_2]], align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @pr35497(			; AVX-LABEL: @pr35497(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.*]] = load i64, ptr undef, align 1			; AVX-NEXT: [[TMP0:%.*]] = load i64, ptr undef, align 1
	; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef			; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; AVX-NEXT: store i64 [[ADD]], ptr undef, align 1			; AVX-NEXT: store i64 [[ADD]], ptr undef, align 1
	; AVX-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4			; AVX-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	Show All 12 Lines
	; CHECK-NEXT: store <8 x i16> [[TMP0]], ptr [[PTR:%.*]], align 2			; CHECK-NEXT: store <8 x i16> [[TMP0]], ptr [[PTR:%.*]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'Stores SLP vectorized with cost '			; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-19'			; YAML-NEXT: - Cost: '-20'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '4'			; YAML-NEXT: - TreeSize: '4'

	entry:			entry:
	%LD = load <8 x i16>, ptr undef			%LD = load <8 x i16>, ptr undef
	%V0 = extractelement <8 x i16> %LD, i32 0			%V0 = extractelement <8 x i16> %LD, i32 0
	br label %t			br label %t

	Show All 33 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-pair-path.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -mattr=+avx2 -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -mattr=+avx2 -S \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; In this test case we start trying to vectorize reduction but end up			; In this test case we start trying to vectorize reduction but end up
	; in tryToVectorize() method which then can make several attempts to			; in tryToVectorize() method which then can make several attempts to
	; find a pair (roots) for a tree that can be vectorized.			; find a pair (roots) for a tree that can be vectorized.
	; The order (path) it makes probes for various pairs is predefined by			; The order (path) it makes probes for various pairs is predefined by
	; the method implementation and it is not guaranteed that the best option			; the method implementation and it is not guaranteed that the best option
	; encountered first (like here).			; encountered first (like here).

	define double @root_selection(double %a, double %b, double %c, double %d) local_unnamed_addr #0 {			define double @root_selection(double %a, double %b, double %c, double %d) local_unnamed_addr #0 {
	; CHECK-LABEL: @root_selection(			; CHECK-LABEL: @root_selection(
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[A:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[B:%.]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> [[TMP1]], double [[B:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> [[TMP1]], double [[A:%.]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = fdiv fast <2 x double> [[TMP2]], <double 7.000000e+00, double 5.000000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = fdiv fast <2 x double> [[TMP2]], <double 5.000000e+00, double 7.000000e+00>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 0
	; CHECK-NEXT: [[I09:%.*]] = fmul fast double [[TMP4]], undef			; CHECK-NEXT: [[I09:%.*]] = fmul fast double [[TMP4]], undef
	; CHECK-NEXT: [[I10:%.*]] = fsub fast double undef, [[I09]]			; CHECK-NEXT: [[I10:%.*]] = fsub fast double undef, [[I09]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <2 x double> [[TMP3]], <double 3.000000e+00, double undef>			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <2 x double> [[TMP3]], <double undef, double 3.000000e+00>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[I10]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[I10]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x double> [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x double> [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP7]], <double undef, double 1.100000e+01>			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP7]], <double 1.100000e+01, double undef>
	; CHECK-NEXT: [[TMP9:%.*]] = fmul fast <2 x double> [[TMP8]], <double 4.000000e+00, double 1.200000e+01>			; CHECK-NEXT: [[TMP9:%.*]] = fmul fast <2 x double> [[TMP8]], <double 1.200000e+01, double 4.000000e+00>
	; CHECK-NEXT: [[TMP10:%.*]] = fdiv fast <2 x double> [[TMP9]], <double 1.400000e+00, double 1.400000e+00>			; CHECK-NEXT: [[TMP10:%.*]] = fdiv fast <2 x double> [[TMP9]], <double 1.400000e+00, double 1.400000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP10]], i32 1
	; CHECK-NEXT: [[I07:%.*]] = fadd fast double undef, [[TMP11]]			; CHECK-NEXT: [[I07:%.*]] = fadd fast double undef, [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP10]], i32 0
	; CHECK-NEXT: [[I16:%.*]] = fadd fast double [[I07]], [[TMP12]]			; CHECK-NEXT: [[I16:%.*]] = fadd fast double [[I07]], [[TMP12]]
	; CHECK-NEXT: [[I17:%.]] = fadd fast double [[I16]], [[C:%.]]			; CHECK-NEXT: [[I17:%.]] = fadd fast double [[I16]], [[C:%.]]
	; CHECK-NEXT: [[I18:%.]] = fadd fast double [[I17]], [[D:%.]]			; CHECK-NEXT: [[I18:%.]] = fadd fast double [[I17]], [[D:%.]]
	; CHECK-NEXT: ret double [[I18]]			; CHECK-NEXT: ret double [[I18]]
	;			;
	%i01 = fdiv fast double %a, 7.0			%i01 = fdiv fast double %a, 7.0
	%i02 = fmul fast double %i01, 3.0			%i02 = fmul fast double %i01, 3.0
	%i03 = fmul fast double undef, %i02			%i03 = fmul fast double undef, %i02
	Show All 19 Lines