This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic

Authored by ABataev on Oct 1 2021, 4:10 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
anton-afanasyev
vporpo

Commits

rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph.

Summary

Currently we emit gathers for scalars being vectorized in the tre as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Oct 1 2021, 4:10 PM

Herald added subscribers: kerbowa, hiraditya, nhaehnle, jvesely. · View Herald TranscriptOct 1 2021, 4:10 PM

ABataev requested review of this revision.Oct 1 2021, 4:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2021, 4:10 PM

Harbormaster completed remote builds in B126755: Diff 376651.Oct 1 2021, 4:10 PM

Rebase

Harbormaster completed remote builds in B126915: Diff 377013.Oct 4 2021, 2:29 PM

RKSimon retitled this revision from [SLP]Improve gathering of the scals used in the graph. to [SLP]Improve gathering of the scalars used in the graph..Oct 5 2021, 6:35 AM

Rebase + bug fixes

Harbormaster completed remote builds in B133811: Diff 386648.Nov 11 2021, 2:47 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Rebase

Harbormaster completed remote builds in B135503: Diff 389033.Nov 22 2021, 7:36 PM

RKSimon added inline comments.Nov 29 2021, 9:13 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
312	Is it worth merging the isa<> and cast<> into a dyn_cast<>?
605	return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block?
6855	What targets are we still missing support for?

ABataev added inline comments.Nov 29 2021, 9:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6855	AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.

Rebase + address comments.

Harbormaster completed remote builds in B136480: Diff 390398.Nov 29 2021, 11:39 AM

Rebase

Harbormaster completed remote builds in B136694: Diff 390702.Nov 30 2021, 8:08 AM

Rebase

Harbormaster completed remote builds in B136747: Diff 390783.Nov 30 2021, 1:09 PM

Rebase

Harbormaster completed remote builds in B138215: Diff 392842.Dec 8 2021, 12:09 PM

Rebase

RKSimon added inline comments.Dec 14 2021, 8:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6650–6651	Wshadow warning vs Idx @ Line 4688?
6825–6826	Wshadow warning vs Idx @ Line 4688?

Address comments

Harbormaster completed remote builds in B139236: Diff 394269.Dec 14 2021, 9:48 AM

Rebase

Harbormaster completed remote builds in B141051: Diff 396715.Dec 30 2021, 2:15 PM

ABataev mentioned this in D123587: [SLP] Generate shuffles if we can reorder an existing node.Apr 12 2022, 12:05 PM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 7:51 AM

Herald added subscribers: • pcwang-thead, nlopes, kosarev. · View Herald Transcript

nlopes added inline comments.Aug 26 2022, 7:54 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
10841	Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you!

ABataev added inline comments.Aug 26 2022, 8:08 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
10841	Sure, thanks!

Address comments

Harbormaster completed remote builds in B183623: Diff 455933.Aug 26 2022, 10:50 AM

Rebase

Harbormaster completed remote builds in B186399: Diff 459790.Sep 13 2022, 11:19 AM

ABataev mentioned this in rG796af0c02728: [SLP] Move getInsertIndex function, NFC..Sep 14 2022, 6:24 AM

ABataev mentioned this in rGd647312e3f57: [SLP][NFC]Extract getLastInstructionInBundle function for better.Sep 14 2022, 8:44 AM

Rebase

Harbormaster completed remote builds in B192832: Diff 468668.Oct 18 2022, 1:42 PM

nhaehnle removed a subscriber: nhaehnle.Oct 19 2022, 2:00 AM

Large update.
Includes:

Unifies all shuffle builders and shuffle demission operands.
Generalizes emission and cost model estimation of the buildvectors/gathers.

Will be splitted into several smaller patches eventually.

Harbormaster completed remote builds in B201460: Diff 480583.Dec 6 2022, 9:34 PM

ABataev mentioned this in D139718: [SLP][NFC]Inital redesign of ShuffleInstructionBuilder, NFC..Dec 9 2022, 7:50 AM

ABataev mentioned this in rGecac8192dbf6: [SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC..Dec 13 2022, 9:54 AM

Rebase

Harbormaster completed remote builds in B202927: Diff 482594.Dec 13 2022, 1:17 PM

Restore accidentally removed code.

Harbormaster completed remote builds in B202945: Diff 482619.Dec 13 2022, 2:43 PM

Rebase

Harbormaster completed remote builds in B204383: Diff 484571.Dec 21 2022, 7:50 AM

ABataev mentioned this in D140499: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 21 2022, 1:54 PM

khchen added a subscriber: khchen.Dec 22 2022, 8:35 AM

ABataev mentioned this in rGac01ae71f0c4: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 28 2022, 6:11 AM

Rebase

Harbormaster completed remote builds in B206131: Diff 486895.Jan 6 2023, 10:07 AM

Rebase

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 9 2023, 9:43 AM

Harbormaster completed remote builds in B206577: Diff 487485.Jan 9 2023, 10:30 AM

ABataev mentioned this in D141512: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 11 2023, 8:33 AM

ABataev mentioned this in D141940: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Jan 17 2023, 8:01 AM

ABataev mentioned this in rG9bdcf8778a5c: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 19 2023, 1:50 PM

ABataev mentioned this in rG708eb1b96d9a: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Feb 20 2023, 6:16 AM

ABataev mentioned this in D144958: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Feb 28 2023, 5:21 AM

ABataev mentioned this in rGa611b3f3059e: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 7 2023, 12:47 PM

Rebase

Restore deleted code/update test

Harbormaster completed remote builds in B218206: Diff 503510.Mar 8 2023, 2:48 PM

ABataev mentioned this in D145732: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function..Mar 9 2023, 2:20 PM

hans mentioned this in rG3b3a4c270bcb: Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather….Mar 10 2023, 5:40 AM

ABataev mentioned this in rG93a9be0cea0a: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 10 2023, 1:22 PM

ABataev mentioned this in rGf3a68ac10c84: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector….Mar 13 2023, 6:27 AM

Rebase

RKSimon added inline comments.Mar 13 2023, 2:27 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7005	Any chance that we can use ShuffleVectorInst::isIdentityMask ?
7604	auto *
7606	auto *

ABataev added inline comments.Mar 13 2023, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7005	Sure, will do it later
7606	Both these cases are the existing code, just the diff is not quite correct because of the big differences.

Restore accidentally removed lines, address comments

Harbormaster completed remote builds in B219182: Diff 504861.Mar 13 2023, 5:18 PM

Rebase

Restore some deleted code

Harbormaster completed remote builds in B219617: Diff 505467.Mar 15 2023, 7:08 AM

ABataev mentioned this in D146167: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars..Mar 15 2023, 2:14 PM

ABataev mentioned this in rG0ad87ffdcc23: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining….Mar 17 2023, 11:21 AM

Rebase

Harbormaster completed remote builds in B220124: Diff 506162.Mar 17 2023, 12:55 PM

ABataev mentioned this in D146564: [SLP]Find reused scalars in buildvector sequences, if any..Mar 21 2023, 2:11 PM

ABataev mentioned this in rG40105a993399: [SLP]Find reused scalars in buildvector sequences, if any..Apr 5 2023, 9:39 AM

Rebase

Harbormaster completed remote builds in B224057: Diff 511474.Apr 6 2023, 11:37 AM

Rebase

Harbormaster completed remote builds in B224133: Diff 511560.Apr 6 2023, 5:26 PM

Rebase

Harbormaster completed remote builds in B224875: Diff 512589.Apr 11 2023, 3:26 PM

ABataev mentioned this in D148174: [SLP]Introduce gather cost estimation function..Apr 12 2023, 2:36 PM

ABataev mentioned this in rGf82eb7e066f3: [SLP]Introduce gather cost estimation function..Apr 13 2023, 10:19 AM

Rebase

Harbormaster completed remote builds in B225410: Diff 513316.Apr 13 2023, 12:33 PM

ABataev mentioned this in D148279: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions..Apr 13 2023, 4:42 PM

ABataev mentioned this in rGcd341f3f4878: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 5:55 AM

ABataev mentioned this in rG1ce4b26a21a0: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 11:54 AM

Rebase

Harbormaster completed remote builds in B227770: Diff 516462.Apr 24 2023, 11:19 AM

dtemirbulatov added a reviewer: vporpo.Apr 27 2023, 5:39 PM

Temp rebase, requires some extra work.

Harbormaster completed remote builds in B230224: Diff 519833.May 5 2023, 7:04 AM

Rebase

Herald added a subscriber: wangpc. · View Herald TranscriptNov 9 2023, 2:20 PM

Harbormaster completed remote builds in B258052: Diff 558067.Nov 9 2023, 6:17 PM

Rebase

Harbormaster completed remote builds in B258083: Diff 558113.Nov 16 2023, 10:49 AM

LGTM.

This revision is now accepted and ready to land.Thu, Nov 30, 7:34 AM

LGTM.

Rebase

Harbormaster completed remote builds in B258147: Diff 558197.Thu, Nov 30, 11:35 AM

Closed by commit rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph. (authored by ABataev). · Explain WhyFri, Dec 1, 11:26 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph..

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

In D110978#4657889, @vporpo wrote:

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

Ping @ABataev ! This is blocking our internal release at Google!

dtemirbulatov added a subscriber: dtemirbulatov.Tue, Dec 12, 1:54 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

437 lines

test/

Transforms/

SLPVectorizer/

AArch64/

scalarization-overhead.ll

64 lines

Diff 558067

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 303 Lines • ▼ Show 20 Lines
static std::optional<unsigned> getInsertIndex(const Value *InsertInst,		static std::optional<unsigned> getInsertIndex(const Value *InsertInst,
unsigned Offset = 0) {		unsigned Offset = 0) {
int Index = Offset;		int Index = Offset;
if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {		if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {
const auto *VT = dyn_cast<FixedVectorType>(IE->getType());		const auto *VT = dyn_cast<FixedVectorType>(IE->getType());
if (!VT)		if (!VT)
return std::nullopt;		return std::nullopt;
const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2));		const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
if (!CI)		if (!CI)
		RKSimonUnsubmitted Not Done Reply Inline Actions Is it worth merging the isa<> and cast<> into a dyn_cast<>? RKSimon: Is it worth merging the isa<> and cast<> into a dyn_cast<>?
return std::nullopt;		return std::nullopt;
if (CI->getValue().uge(VT->getNumElements()))		if (CI->getValue().uge(VT->getNumElements()))
return std::nullopt;		return std::nullopt;
Index *= VT->getNumElements();		Index *= VT->getNumElements();
Index += CI->getZExtValue();		Index += CI->getZExtValue();
return Index;		return Index;
}		}

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
/// ret <4 x i8> %ins4		/// ret <4 x i8> %ins4
/// can be transformed into:		/// can be transformed into:
/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,		/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,
/// i32 6>		/// i32 6>
/// %2 = mul <4 x i8> %1, %1		/// %2 = mul <4 x i8> %1, %1
/// ret <4 x i8> %2		/// ret <4 x i8> %2
/// We convert this initially to something like:
/// %x0 = extractelement <4 x i8> %x, i32 0
/// %x3 = extractelement <4 x i8> %x, i32 3
/// %y1 = extractelement <4 x i8> %y, i32 1
/// %y2 = extractelement <4 x i8> %y, i32 2
/// %1 = insertelement <4 x i8> poison, i8 %x0, i32 0
/// %2 = insertelement <4 x i8> %1, i8 %x3, i32 1
/// %3 = insertelement <4 x i8> %2, i8 %y1, i32 2
/// %4 = insertelement <4 x i8> %3, i8 %y2, i32 3
/// %5 = mul <4 x i8> %4, %4
/// %6 = extractelement <4 x i8> %5, i32 0
/// %ins1 = insertelement <4 x i8> poison, i8 %6, i32 0
/// %7 = extractelement <4 x i8> %5, i32 1
/// %ins2 = insertelement <4 x i8> %ins1, i8 %7, i32 1
/// %8 = extractelement <4 x i8> %5, i32 2
/// %ins3 = insertelement <4 x i8> %ins2, i8 %8, i32 2
/// %9 = extractelement <4 x i8> %5, i32 3
/// %ins4 = insertelement <4 x i8> %ins3, i8 %9, i32 3
/// ret <4 x i8> %ins4
/// InstCombiner transforms this into a shuffle and vector mul
/// Mask will return the Shuffle Mask equivalent to the extracted elements.		/// Mask will return the Shuffle Mask equivalent to the extracted elements.
/// TODO: Can we split off and reuse the shuffle mask detection from		/// TODO: Can we split off and reuse the shuffle mask detection from
/// ShuffleVectorInst/getShuffleCost?		/// ShuffleVectorInst/getShuffleCost?
static std::optional<TargetTransformInfo::ShuffleKind>		static std::optional<TargetTransformInfo::ShuffleKind>
isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {		isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });		find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });
if (It == VL.end())		if (It == VL.end())
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	if (V2 && PairMax < VectorOpToIdx[V1].size() + VectorOpToIdx[V2].size() +
PairVec = std::make_pair(V1, V2);		PairVec = std::make_pair(V1, V2);
}		}
}		}
if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)		if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)
return std::nullopt;		return std::nullopt;
// Check if better to perform a shuffle of 2 vectors or just of a single		// Check if better to perform a shuffle of 2 vectors or just of a single
// vector.		// vector.
SmallVector<Value *> SavedVL(VL.begin(), VL.end());		SmallVector<Value *> SavedVL(VL.begin(), VL.end());
SmallVector<Value *> GatheredExtracts(		SmallVector<Value *> GatheredExtracts(
		RKSimonUnsubmitted Not Done Reply Inline Actions return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block? RKSimon: return None instead to make it obvious it failed? Maybe do this as an early out instead of the…
VL.size(), PoisonValue::get(VL.front()->getType()));		VL.size(), PoisonValue::get(VL.front()->getType()));
if (SingleMax >= PairMax && SingleMax) {		if (SingleMax >= PairMax && SingleMax) {
for (int Idx : VectorOpToIdx[SingleVec])		for (int Idx : VectorOpToIdx[SingleVec])
std::swap(GatheredExtracts[Idx], VL[Idx]);		std::swap(GatheredExtracts[Idx], VL[Idx]);
} else {		} else {
for (Value *V : {PairVec.first, PairVec.second})		for (Value *V : {PairVec.first, PairVec.second})
for (int Idx : VectorOpToIdx[V])		for (int Idx : VectorOpToIdx[V])
std::swap(GatheredExtracts[Idx], VL[Idx]);		std::swap(GatheredExtracts[Idx], VL[Idx]);
▲ Show 20 Lines • Show All 6,028 Lines • ▼ Show 20 Lines	if (auto *MainCI = dyn_cast<CmpInst>(MainOp)) {
return MainP != P && MainP != SwappedP;		return MainP != P && MainP != SwappedP;
}		}
return I->getOpcode() == AltOp->getOpcode();		return I->getOpcode() == AltOp->getOpcode();
}		}

TTI::OperandValueInfo BoUpSLP::getOperandInfo(ArrayRef<Value *> Ops) {		TTI::OperandValueInfo BoUpSLP::getOperandInfo(ArrayRef<Value *> Ops) {
assert(!Ops.empty());		assert(!Ops.empty());
const auto *Op0 = Ops.front();		const auto *Op0 = Ops.front();

const bool IsConstant = all_of(Ops, [](Value *V) {		const bool IsConstant = all_of(Ops, [](Value *V) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
// TODO: We should allow undef elements here		// TODO: We should allow undef elements here
return isConstant(V) && !isa<UndefValue>(V);		return isConstant(V) && !isa<UndefValue>(V);
});		});
const bool IsUniform = all_of(Ops, [=](Value *V) {		const bool IsUniform = all_of(Ops, [=](Value *V) {
// TODO: We should allow undef elements here		// TODO: We should allow undef elements here
return V == Op0;		return V == Op0;
});		});
const bool IsPowerOfTwo = all_of(Ops, [](Value *V) {		const bool IsPowerOfTwo = all_of(Ops, [](Value *V) {
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
dyn_cast<FixedVectorType>(SV->getOperand(0)->getType()))		dyn_cast<FixedVectorType>(SV->getOperand(0)->getType()))
LocalVF = SVOpTy->getNumElements();		LocalVF = SVOpTy->getNumElements();
SmallVector<int> ExtMask(Mask.size(), PoisonMaskElem);		SmallVector<int> ExtMask(Mask.size(), PoisonMaskElem);
for (auto [Idx, I] : enumerate(Mask)) {		for (auto [Idx, I] : enumerate(Mask)) {
if (I == PoisonMaskElem \|\|		if (I == PoisonMaskElem \|\|
static_cast<unsigned>(I) >= SV->getShuffleMask().size())		static_cast<unsigned>(I) >= SV->getShuffleMask().size())
continue;		continue;
ExtMask[Idx] = SV->getMaskValue(I);		ExtMask[Idx] = SV->getMaskValue(I);
}		}
bool IsOp1Undef =		bool IsOp1Undef =
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
isUndefVector(SV->getOperand(0),		isUndefVector(SV->getOperand(0),
buildUseMask(LocalVF, ExtMask, UseMask::FirstArg))		buildUseMask(LocalVF, ExtMask, UseMask::FirstArg))
.all();		.all();
bool IsOp2Undef =		bool IsOp2Undef =
isUndefVector(SV->getOperand(1),		isUndefVector(SV->getOperand(1),
buildUseMask(LocalVF, ExtMask, UseMask::SecondArg))		buildUseMask(LocalVF, ExtMask, UseMask::SecondArg))
.all();		.all();
if (!IsOp1Undef && !IsOp2Undef) {		if (!IsOp1Undef && !IsOp2Undef) {
Show All 12 Lines	while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
combineMasks(LocalVF, ShuffleMask, Mask);		combineMasks(LocalVF, ShuffleMask, Mask);
Mask.swap(ShuffleMask);		Mask.swap(ShuffleMask);
if (IsOp2Undef)		if (IsOp2Undef)
Op = SV->getOperand(0);		Op = SV->getOperand(0);
else		else
Op = SV->getOperand(1);		Op = SV->getOperand(1);
}		}
if (auto *OpTy = dyn_cast<FixedVectorType>(Op->getType());		if (auto *OpTy = dyn_cast<FixedVectorType>(Op->getType());
!OpTy \|\| !isIdentityMask(Mask, OpTy, SinglePermute) \|\|		!OpTy \|\| !isIdentityMask(Mask, OpTy, SinglePermute) \|\|
		RKSimonUnsubmitted Not Done Reply Inline Actions What targets are we still missing support for? RKSimon: What targets are we still missing support for?
		ABataevAuthorUnsubmitted Done Reply Inline Actions AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts. ABataev: AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.
ShuffleVectorInst::isZeroEltSplatMask(Mask, Mask.size())) {		ShuffleVectorInst::isZeroEltSplatMask(Mask, Mask.size())) {
if (IdentityOp) {		if (IdentityOp) {
V = IdentityOp;		V = IdentityOp;
assert(Mask.size() == IdentityMask.size() &&		assert(Mask.size() == IdentityMask.size() &&
"Expected masks of same sizes.");		"Expected masks of same sizes.");
// Clear known poison elements.		// Clear known poison elements.
for (auto [I, Idx] : enumerate(Mask))		for (auto [I, Idx] : enumerate(Mask))
if (Idx == PoisonMaskElem)		if (Idx == PoisonMaskElem)
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	static T createShuffle(Value V1, Value V2, ArrayRef<int> Mask,
return Builder.createIdentity(V1);		return Builder.createIdentity(V1);
}		}
};		};
} // namespace		} // namespace

/// Merges shuffle masks and emits final shuffle instruction, if required. It		/// Merges shuffle masks and emits final shuffle instruction, if required. It
/// supports shuffling of 2 input vectors. It implements lazy shuffles emission,		/// supports shuffling of 2 input vectors. It implements lazy shuffles emission,
/// when the actual shuffle instruction is generated only if this is actually		/// when the actual shuffle instruction is generated only if this is actually
/// required. Otherwise, the shuffle instruction emission is delayed till the		/// required. Otherwise, the shuffle instruction emission is delayed till the
		RKSimonUnsubmitted Not Done Reply Inline Actions Any chance that we can use ShuffleVectorInst::isIdentityMask ? RKSimon: Any chance that we can use ShuffleVectorInst::isIdentityMask ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, will do it later ABataev: Sure, will do it later
/// end of the process, to reduce the number of emitted instructions and further		/// end of the process, to reduce the number of emitted instructions and further
/// analysis/transformations.		/// analysis/transformations.
class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {		class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
bool IsFinalized = false;		bool IsFinalized = false;
SmallVector<int> CommonMask;		SmallVector<int> CommonMask;
SmallVector<PointerUnion<Value , const TreeEntry >, 2> InVectors;		SmallVector<PointerUnion<Value , const TreeEntry >, 2> InVectors;
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
InstructionCost Cost = 0;		InstructionCost Cost = 0;
Show All 25 Lines	InstructionCost getBuildVectorCost(ArrayRef<Value > VL, Value Root) {
if ((!Root && allConstant(VL)) \|\| all_of(VL, UndefValue::classof))		if ((!Root && allConstant(VL)) \|\| all_of(VL, UndefValue::classof))
return TTI::TCC_Free;		return TTI::TCC_Free;
auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());		auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());
InstructionCost GatherCost = 0;		InstructionCost GatherCost = 0;
SmallVector<Value *> Gathers(VL.begin(), VL.end());		SmallVector<Value *> Gathers(VL.begin(), VL.end());
// Improve gather cost for gather of loads, if we can group some of the		// Improve gather cost for gather of loads, if we can group some of the
// loads into vector loads.		// loads into vector loads.
InstructionsState S = getSameOpcode(VL, *R.TLI);		InstructionsState S = getSameOpcode(VL, *R.TLI);
if (VL.size() > 2 && S.getOpcode() == Instruction::Load &&		const unsigned Sz = R.DL->getTypeSizeInBits(VL.front()->getType());
!S.isAltShuffle() &&		unsigned MinVF = R.getMinVF(2 * Sz);
		if (VL.size() > 2 &&
		((S.getOpcode() == Instruction::Load && !S.isAltShuffle()) \|\|
		(InVectors.empty() &&
		any_of(seq<unsigned>(0, VL.size() / MinVF),
		[&](unsigned Idx) {
		ArrayRef<Value > SubVL = VL.slice(Idx MinVF, MinVF);
		InstructionsState S = getSameOpcode(SubVL, *R.TLI);
		return S.getOpcode() == Instruction::Load &&
		!S.isAltShuffle();
		}))) &&
!all_of(Gathers, [&](Value *V) { return R.getTreeEntry(V); }) &&		!all_of(Gathers, [&](Value *V) { return R.getTreeEntry(V); }) &&
!isSplat(Gathers)) {		!isSplat(Gathers)) {
BoUpSLP::ValueSet VectorizedLoads;		SetVector<Value *> VectorizedLoads;
		SmallVector<LoadInst *> VectorizedStarts;
		SmallVector<std::pair<unsigned, unsigned>> ScatterVectorized;
unsigned StartIdx = 0;		unsigned StartIdx = 0;
unsigned VF = VL.size() / 2;		unsigned VF = VL.size() / 2;
unsigned VectorizedCnt = 0;		for (; VF >= MinVF; VF /= 2) {
unsigned ScatterVectorizeCnt = 0;
const unsigned Sz = R.DL->getTypeSizeInBits(S.MainOp->getType());
for (unsigned MinVF = R.getMinVF(2 * Sz); VF >= MinVF; VF /= 2) {
for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;		for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;
Cnt += VF) {		Cnt += VF) {
ArrayRef<Value *> Slice = VL.slice(Cnt, VF);		ArrayRef<Value *> Slice = VL.slice(Cnt, VF);
		if (S.getOpcode() != Instruction::Load \|\| S.isAltShuffle()) {
		InstructionsState SliceS = getSameOpcode(Slice, *R.TLI);
		if (SliceS.getOpcode() != Instruction::Load \|\|
		SliceS.isAltShuffle())
		continue;
		}
if (!VectorizedLoads.count(Slice.front()) &&		if (!VectorizedLoads.count(Slice.front()) &&
!VectorizedLoads.count(Slice.back()) && allSameBlock(Slice)) {		!VectorizedLoads.count(Slice.back()) && allSameBlock(Slice)) {
SmallVector<Value *> PointerOps;		SmallVector<Value *> PointerOps;
OrdersType CurrentOrder;		OrdersType CurrentOrder;
LoadsState LS =		LoadsState LS =
canVectorizeLoads(Slice, Slice.front(), TTI, R.DL, R.SE,		canVectorizeLoads(Slice, Slice.front(), TTI, R.DL, R.SE,
R.LI, R.TLI, CurrentOrder, PointerOps);		R.LI, R.TLI, CurrentOrder, PointerOps);
switch (LS) {		switch (LS) {
case LoadsState::Vectorize:		case LoadsState::Vectorize:
case LoadsState::ScatterVectorize:		case LoadsState::ScatterVectorize:
case LoadsState::PossibleStridedVectorize:		case LoadsState::PossibleStridedVectorize:
// Mark the vectorized loads so that we don't vectorize them		// Mark the vectorized loads so that we don't vectorize them
// again.		// again.
if (LS == LoadsState::Vectorize)		if (LS == LoadsState::Vectorize && CurrentOrder.empty())
++VectorizedCnt;		VectorizedStarts.push_back(cast<LoadInst>(Slice.front()));
else		else
++ScatterVectorizeCnt;		ScatterVectorized.emplace_back(Cnt, VF);
VectorizedLoads.insert(Slice.begin(), Slice.end());		VectorizedLoads.insert(Slice.begin(), Slice.end());
// If we vectorized initial block, no need to try to vectorize		// If we vectorized initial block, no need to try to vectorize
// it again.		// it again.
if (Cnt == StartIdx)		if (Cnt == StartIdx)
StartIdx += VF;		StartIdx += VF;
break;		break;
case LoadsState::Gather:		case LoadsState::Gather:
break;		break;
Show All 14 Lines	if (VL.size() > 2 &&
// Get the cost for gathered loads.		// Get the cost for gathered loads.
for (unsigned I = 0, End = VL.size(); I < End; I += VF) {		for (unsigned I = 0, End = VL.size(); I < End; I += VF) {
if (VectorizedLoads.contains(VL[I]))		if (VectorizedLoads.contains(VL[I]))
continue;		continue;
GatherCost += getBuildVectorCost(VL.slice(I, VF), Root);		GatherCost += getBuildVectorCost(VL.slice(I, VF), Root);
}		}
// Exclude potentially vectorized loads from list of gathered		// Exclude potentially vectorized loads from list of gathered
// scalars.		// scalars.
auto *LI = cast<LoadInst>(S.MainOp);		Gathers.assign(Gathers.size(), PoisonValue::get(VL.front()->getType()));
Gathers.assign(Gathers.size(), PoisonValue::get(LI->getType()));
// The cost for vectorized loads.		// The cost for vectorized loads.
InstructionCost ScalarsCost = 0;		InstructionCost ScalarsCost = 0;
for (Value *V : VectorizedLoads) {		for (Value *V : VectorizedLoads) {
auto *LI = cast<LoadInst>(V);		auto *LI = cast<LoadInst>(V);
ScalarsCost +=		ScalarsCost +=
TTI.getMemoryOpCost(Instruction::Load, LI->getType(),		TTI.getMemoryOpCost(Instruction::Load, LI->getType(),
LI->getAlign(), LI->getPointerAddressSpace(),		LI->getAlign(), LI->getPointerAddressSpace(),
CostKind, TTI::OperandValueInfo(), LI);		CostKind, TTI::OperandValueInfo(), LI);
}		}
auto *LoadTy = FixedVectorType::get(LI->getType(), VF);		auto *LoadTy = FixedVectorType::get(VL.front()->getType(), VF);
		for (LoadInst *LI : VectorizedStarts) {
Align Alignment = LI->getAlign();		Align Alignment = LI->getAlign();
GatherCost +=		GatherCost +=
VectorizedCnt *
TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment,		TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment,
LI->getPointerAddressSpace(), CostKind,		LI->getPointerAddressSpace(), CostKind,
TTI::OperandValueInfo(), LI);		TTI::OperandValueInfo(), LI);
GatherCost += ScatterVectorizeCnt *		}
TTI.getGatherScatterOpCost(		for (std::pair<unsigned, unsigned> P : ScatterVectorized) {
Instruction::Load, LoadTy, LI->getPointerOperand(),		auto *LI0 = cast<LoadInst>(VL[P.first]);
/VariableMask=/false, Alignment, CostKind, LI);		Align CommonAlignment = LI0->getAlign();
		for (Value *V : VL.slice(P.first + 1, VF - 1))
		CommonAlignment =
		std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());
		GatherCost += TTI.getGatherScatterOpCost(
		Instruction::Load, LoadTy, LI0->getPointerOperand(),
		/VariableMask=/false, CommonAlignment, CostKind, LI0);
		}
if (NeedInsertSubvectorAnalysis) {		if (NeedInsertSubvectorAnalysis) {
// Add the cost for the subvectors insert.		// Add the cost for the subvectors insert.
for (int I = VF, E = VL.size(); I < E; I += VF)		for (int I = VF, E = VL.size(); I < E; I += VF)
GatherCost += TTI.getShuffleCost(TTI::SK_InsertSubvector, VecTy,		GatherCost += TTI.getShuffleCost(TTI::SK_InsertSubvector, VecTy,
std::nullopt, CostKind, I, LoadTy);		std::nullopt, CostKind, I, LoadTy);
}		}
GatherCost -= ScalarsCost;		GatherCost -= ScalarsCost;
}		}
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	if (!V1 && !V2 && !P2.isNull()) {
}		}
CommonVF = VF;		CommonVF = VF;
}		}
V1 = Constant::getNullValue(		V1 = Constant::getNullValue(
FixedVectorType::get(E2->Scalars.front()->getType(), CommonVF));		FixedVectorType::get(E2->Scalars.front()->getType(), CommonVF));
V2 = getAllOnesValue(		V2 = getAllOnesValue(
*R.DL,		*R.DL,
FixedVectorType::get(E2->Scalars.front()->getType(), CommonVF));		FixedVectorType::get(E2->Scalars.front()->getType(), CommonVF));
		} else if (!V1 && V2) {
		// Shuffle vector and tree node.
		unsigned VF = cast<FixedVectorType>(V2->getType())->getNumElements();
		const TreeEntry E1 = P1.get<const TreeEntry >();
		CommonVF = std::max(VF, E1->getVectorFactor());
		assert(all_of(Mask,
		[=](int Idx) {
		return Idx < 2 * static_cast<int>(CommonVF);
		}) &&
		"All elements in mask must be less than 2 * CommonVF.");
		if (E1->Scalars.size() == VF && VF != CommonVF) {
		SmallVector<int> E1Mask = E1->getCommonMask();
		assert(!E1Mask.empty() && "Expected non-empty common mask.");
		for (int &Idx : CommonMask) {
		if (Idx == PoisonMaskElem)
		continue;
		if (Idx >= static_cast<int>(CommonVF))
		Idx = E1Mask[Idx - CommonVF] + VF;
		}
		CommonVF = VF;
		}
		V1 = Constant::getNullValue(
		FixedVectorType::get(E1->Scalars.front()->getType(), CommonVF));
		V2 = getAllOnesValue(
		*R.DL,
		FixedVectorType::get(E1->Scalars.front()->getType(), CommonVF));
} else {		} else {
assert(V1 && V2 && "Expected both vectors.");		assert(V1 && V2 && "Expected both vectors.");
unsigned VF = cast<FixedVectorType>(V1->getType())->getNumElements();		unsigned VF = cast<FixedVectorType>(V1->getType())->getNumElements();
CommonVF =		CommonVF =
std::max(VF, cast<FixedVectorType>(V2->getType())->getNumElements());		std::max(VF, cast<FixedVectorType>(V2->getType())->getNumElements());
assert(all_of(Mask,		assert(all_of(Mask,
[=](int Idx) {		[=](int Idx) {
return Idx < 2 * static_cast<int>(CommonVF);		return Idx < 2 * static_cast<int>(CommonVF);
Show All 20 Lines
public:		public:
ShuffleCostEstimator(TargetTransformInfo &TTI,		ShuffleCostEstimator(TargetTransformInfo &TTI,
ArrayRef<Value *> VectorizedVals, BoUpSLP &R,		ArrayRef<Value *> VectorizedVals, BoUpSLP &R,
SmallPtrSetImpl<Value *> &CheckedExtracts)		SmallPtrSetImpl<Value *> &CheckedExtracts)
: TTI(TTI), VectorizedVals(VectorizedVals.begin(), VectorizedVals.end()),		: TTI(TTI), VectorizedVals(VectorizedVals.begin(), VectorizedVals.end()),
R(R), CheckedExtracts(CheckedExtracts) {}		R(R), CheckedExtracts(CheckedExtracts) {}
Value adjustExtracts(const TreeEntry E, MutableArrayRef<int> Mask,		Value adjustExtracts(const TreeEntry E, MutableArrayRef<int> Mask,
ArrayRef<std::optional<TTI::ShuffleKind>> ShuffleKinds,		ArrayRef<std::optional<TTI::ShuffleKind>> ShuffleKinds,
unsigned NumParts) {		unsigned NumParts, bool &UseVecBaseAsInput) {
		UseVecBaseAsInput = false;
if (Mask.empty())		if (Mask.empty())
return nullptr;		return nullptr;
Value *VecBase = nullptr;		Value *VecBase = nullptr;
ArrayRef<Value *> VL = E->Scalars;		ArrayRef<Value *> VL = E->Scalars;
// If the resulting type is scalarized, do not adjust the cost.		// If the resulting type is scalarized, do not adjust the cost.
if (NumParts == VL.size())		if (NumParts == VL.size())
return nullptr;		return nullptr;
// Check if it can be considered reused if same extractelements were		// Check if it can be considered reused if same extractelements were
// vectorized already.		// vectorized already.
bool PrevNodeFound = any_of(		bool PrevNodeFound = any_of(
ArrayRef(R.VectorizableTree).take_front(E->Idx),		ArrayRef(R.VectorizableTree).take_front(E->Idx),
[&](const std::unique_ptr<TreeEntry> &TE) {		[&](const std::unique_ptr<TreeEntry> &TE) {
return ((!TE->isAltShuffle() &&		return ((!TE->isAltShuffle() &&
TE->getOpcode() == Instruction::ExtractElement) \|\|		TE->getOpcode() == Instruction::ExtractElement) \|\|
TE->State == TreeEntry::NeedToGather) &&		TE->State == TreeEntry::NeedToGather) &&
all_of(enumerate(TE->Scalars), [&](auto &&Data) {		all_of(enumerate(TE->Scalars), [&](auto &&Data) {
return VL.size() > Data.index() &&		return VL.size() > Data.index() &&
(Mask[Data.index()] == PoisonMaskElem \|\|		(Mask[Data.index()] == PoisonMaskElem \|\|
isa<UndefValue>(VL[Data.index()]) \|\|		isa<UndefValue>(VL[Data.index()]) \|\|
Data.value() == VL[Data.index()]);		Data.value() == VL[Data.index()]);
});		});
});		});
		SmallPtrSet<Value *, 4> UniqueBases;
unsigned SliceSize = VL.size() / NumParts;		unsigned SliceSize = VL.size() / NumParts;
for (unsigned Part = 0; Part < NumParts; ++Part) {		for (unsigned Part = 0; Part < NumParts; ++Part) {
ArrayRef<int> SubMask = Mask.slice(Part * SliceSize, SliceSize);		ArrayRef<int> SubMask = Mask.slice(Part * SliceSize, SliceSize);
for (auto [I, V] : enumerate(VL.slice(Part * SliceSize, SliceSize))) {		for (auto [I, V] : enumerate(VL.slice(Part * SliceSize, SliceSize))) {
// Ignore non-extractelement scalars.		// Ignore non-extractelement scalars.
if (isa<UndefValue>(V) \|\|		if (isa<UndefValue>(V) \|\|
(!SubMask.empty() && SubMask[I] == PoisonMaskElem))		(!SubMask.empty() && SubMask[I] == PoisonMaskElem))
continue;		continue;
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
// Also, avoid adjusting the cost for extractelements with multiple uses		// Also, avoid adjusting the cost for extractelements with multiple uses
// in different graph entries.		// in different graph entries.
		auto *EE = cast<ExtractElementInst>(V);
		VecBase = EE->getVectorOperand();
		UniqueBases.insert(VecBase);
const TreeEntry *VE = R.getTreeEntry(V);		const TreeEntry *VE = R.getTreeEntry(V);
if (!CheckedExtracts.insert(V).second \|\|		if (!CheckedExtracts.insert(V).second \|\|
!R.areAllUsersVectorized(cast<Instruction>(V), &VectorizedVals) \|\|		!R.areAllUsersVectorized(cast<Instruction>(V), &VectorizedVals) \|\|
(VE && VE != E))		(VE && VE != E))
continue;		continue;
auto *EE = cast<ExtractElementInst>(V);
VecBase = EE->getVectorOperand();
std::optional<unsigned> EEIdx = getExtractIndex(EE);		std::optional<unsigned> EEIdx = getExtractIndex(EE);
if (!EEIdx)		if (!EEIdx)
continue;		continue;
unsigned Idx = *EEIdx;		unsigned Idx = *EEIdx;
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EE->hasOneUse() \|\| !PrevNodeFound) {		if (EE->hasOneUse() \|\| !PrevNodeFound) {
Instruction *Ext = EE->user_back();		Instruction *Ext = EE->user_back();
if (isa<SExtInst, ZExtInst>(Ext) && all_of(Ext->users(), [](User *U) {		if (isa<SExtInst, ZExtInst>(Ext) && all_of(Ext->users(), [](User *U) {
Show All 22 Lines	Value adjustExtracts(const TreeEntry E, MutableArrayRef<int> Mask,
// single input vector or of 2 input vectors.		// single input vector or of 2 input vectors.
// Done for reused if same extractelements were vectorized already.		// Done for reused if same extractelements were vectorized already.
if (!PrevNodeFound)		if (!PrevNodeFound)
Cost += computeExtractCost(VL, Mask, ShuffleKinds, NumParts);		Cost += computeExtractCost(VL, Mask, ShuffleKinds, NumParts);
InVectors.assign(1, E);		InVectors.assign(1, E);
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
transformMaskAfterShuffle(CommonMask, CommonMask);		transformMaskAfterShuffle(CommonMask, CommonMask);
SameNodesEstimated = false;		SameNodesEstimated = false;
		if (NumParts != 1 && UniqueBases.size() != 1) {
		UseVecBaseAsInput = true;
		VecBase = Constant::getNullValue(
		FixedVectorType::get(VL.front()->getType(), CommonMask.size()));
		}
return VecBase;		return VecBase;
}		}
		/// Checks if the specified entry \p E needs to be delayed because of its
		/// dependency nodes.
		std::optional<InstructionCost>
		needToDelay(const TreeEntry *,
		ArrayRef<SmallVector<const TreeEntry *>>) const {
		// No need to delay the cost estimation during analysis.
		return std::nullopt;
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
void add(const TreeEntry &E1, const TreeEntry &E2, ArrayRef<int> Mask) {		void add(const TreeEntry &E1, const TreeEntry &E2, ArrayRef<int> Mask) {
if (&E1 == &E2) {		if (&E1 == &E2) {
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
		ABataevAuthorUnsubmitted Done Reply Inline Actions Both these cases are the existing code, just the diff is not quite correct because of the big differences. ABataev: Both these cases are the existing code, just the diff is not quite correct because of the big…
assert(all_of(Mask,		assert(all_of(Mask,
[&](int Idx) {		[&](int Idx) {
return Idx < static_cast<int>(E1.getVectorFactor());		return Idx < static_cast<int>(E1.getVectorFactor());
}) &&		}) &&
"Expected single vector shuffle mask.");		"Expected single vector shuffle mask.");
add(E1, Mask);		add(E1, Mask);
return;		return;
}		}
Show All 29 Lines	void add(const TreeEntry &E1, ArrayRef<int> Mask) {
unsigned SliceSize = Mask.size() / NumParts;		unsigned SliceSize = Mask.size() / NumParts;
const auto *It =		const auto *It =
find_if(Mask, [](int Idx) { return Idx != PoisonMaskElem; });		find_if(Mask, [](int Idx) { return Idx != PoisonMaskElem; });
unsigned Part = std::distance(Mask.begin(), It) / SliceSize;		unsigned Part = std::distance(Mask.begin(), It) / SliceSize;
estimateNodesPermuteCost(E1, nullptr, Mask, Part, SliceSize);		estimateNodesPermuteCost(E1, nullptr, Mask, Part, SliceSize);
if (!SameNodesEstimated && InVectors.size() == 1)		if (!SameNodesEstimated && InVectors.size() == 1)
InVectors.emplace_back(&E1);		InVectors.emplace_back(&E1);
}		}
		/// Adds 2 input vectors and the mask for their shuffling.
		void add(Value V1, Value V2, ArrayRef<int> Mask) {
		// May come only for shuffling of 2 vectors with extractelements, already
		// handled in adjustExtracts.
		assert(InVectors.size() == 1 &&
		all_of(enumerate(CommonMask),
		[&](auto P) {
		if (P.value() == PoisonMaskElem)
		return Mask[P.index()] == PoisonMaskElem;
		auto *EI =
		cast<ExtractElementInst>(InVectors.front()
		.get<const TreeEntry *>()
		->Scalars[P.index()]);
		return EI->getVectorOperand() == V1 \|\|
		EI->getVectorOperand() == V2;
		}) &&
		"Expected extractelement vectors.");
		}
/// Adds another one input vector and the mask for the shuffling.		/// Adds another one input vector and the mask for the shuffling.
void add(Value *V1, ArrayRef<int> Mask) {		void add(Value *V1, ArrayRef<int> Mask, bool ForExtracts = false) {
if (InVectors.empty()) {		if (InVectors.empty()) {
assert(CommonMask.empty() && "Expected empty input mask/vectors.");		assert(CommonMask.empty() && !ForExtracts &&
		"Expected empty input mask/vectors.");
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
InVectors.assign(1, V1);		InVectors.assign(1, V1);
return;		return;
}		}
assert(InVectors.size() == 1 && InVectors.front().is<const TreeEntry *>() &&		if (ForExtracts) {
!CommonMask.empty() && "Expected only single entry from extracts.");		// No need to add vectors here, already handled them in adjustExtracts.
		assert(InVectors.size() == 1 &&
		InVectors.front().is<const TreeEntry *>() && !CommonMask.empty() &&
		all_of(enumerate(CommonMask),
		[&](auto P) {
		Value *Scalar = InVectors.front()
		.get<const TreeEntry *>()
		->Scalars[P.index()];
		if (P.value() == PoisonMaskElem)
		return P.value() == Mask[P.index()] \|\|
		isa<UndefValue>(Scalar);
		if (isa<Constant>(V1))
		return true;
		auto *EI = cast<ExtractElementInst>(Scalar);
		return EI->getVectorOperand() == V1;
		}) &&
		"Expected only tree entry for extractelement vectors.");
		return;
		}
		assert(!InVectors.empty() && !CommonMask.empty() &&
		"Expected only tree entries from extracts/reused buildvectors.");
		unsigned VF = cast<FixedVectorType>(V1->getType())->getNumElements();
		if (InVectors.size() == 2) {
		Cost += createShuffle(InVectors.front(), InVectors.back(), CommonMask);
		transformMaskAfterShuffle(CommonMask, CommonMask);
		VF = std::max<unsigned>(VF, CommonMask.size());
		} else if (const auto *InTE =
		InVectors.front().dyn_cast<const TreeEntry *>()) {
		VF = std::max(VF, InTE->getVectorFactor());
		} else {
		VF = std::max(
		VF, cast<FixedVectorType>(InVectors.front().get<Value *>()->getType())
		->getNumElements());
		}
InVectors.push_back(V1);		InVectors.push_back(V1);
unsigned VF = CommonMask.size();		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
for (unsigned Idx = 0; Idx < VF; ++Idx)
if (Mask[Idx] != PoisonMaskElem && CommonMask[Idx] == PoisonMaskElem)		if (Mask[Idx] != PoisonMaskElem && CommonMask[Idx] == PoisonMaskElem)
CommonMask[Idx] = Mask[Idx] + VF;		CommonMask[Idx] = Mask[Idx] + VF;
}		}
Value gather(ArrayRef<Value > VL, Value *Root = nullptr) {		Value gather(ArrayRef<Value > VL, unsigned MaskVF = 0,
		Value *Root = nullptr) {
Cost += getBuildVectorCost(VL, Root);		Cost += getBuildVectorCost(VL, Root);
if (!Root) {		if (!Root) {
assert(InVectors.empty() && "Unexpected input vectors for buildvector.");
// FIXME: Need to find a way to avoid use of getNullValue here.		// FIXME: Need to find a way to avoid use of getNullValue here.
SmallVector<Constant *> Vals;		SmallVector<Constant *> Vals;
for (Value *V : VL) {		unsigned VF = VL.size();
		if (MaskVF != 0)
		VF = std::min(VF, MaskVF);
		for (Value *V : VL.take_front(VF)) {
if (isa<UndefValue>(V)) {		if (isa<UndefValue>(V)) {
Vals.push_back(cast<Constant>(V));		Vals.push_back(cast<Constant>(V));
continue;		continue;
}		}
Vals.push_back(Constant::getNullValue(V->getType()));		Vals.push_back(Constant::getNullValue(V->getType()));
}		}
return ConstantVector::get(Vals);		return ConstantVector::get(Vals);
}		}
return ConstantVector::getSplat(		return ConstantVector::getSplat(
ElementCount::getFixed(VL.size()),		ElementCount::getFixed(
		cast<FixedVectorType>(Root->getType())->getNumElements()),
getAllOnesValue(*R.DL, VL.front()->getType()));		getAllOnesValue(*R.DL, VL.front()->getType()));
}		}
		InstructionCost createFreeze(InstructionCost Cost) { return Cost; }
/// Finalize emission of the shuffles.		/// Finalize emission of the shuffles.
InstructionCost		InstructionCost
finalize(ArrayRef<int> ExtMask, unsigned VF = 0,		finalize(ArrayRef<int> ExtMask, unsigned VF = 0,
function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {		function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {
IsFinalized = true;		IsFinalized = true;
if (Action) {		if (Action) {
const PointerUnion<Value , const TreeEntry > &Vec = InVectors.front();		const PointerUnion<Value , const TreeEntry > &Vec = InVectors.front();
if (InVectors.size() == 2)		if (InVectors.size() == 2)
Cost += createShuffle(Vec, InVectors.back(), CommonMask);		Cost += createShuffle(Vec, InVectors.back(), CommonMask);
else		else
Cost += createShuffle(Vec, nullptr, CommonMask);		Cost += createShuffle(Vec, nullptr, CommonMask);
for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
if (CommonMask[Idx] != PoisonMaskElem)		if (CommonMask[Idx] != PoisonMaskElem)
CommonMask[Idx] = Idx;		CommonMask[Idx] = Idx;
assert(VF > 0 &&		assert(VF > 0 &&
"Expected vector length for the final value before action.");		"Expected vector length for the final value before action.");
Value V = Vec.get<Value >();		Value V = Vec.get<Value >();
Action(V, CommonMask);		Action(V, CommonMask);
InVectors.front() = V;		InVectors.front() = V;
}		}
::addMask(CommonMask, ExtMask, /ExtendingManyInputs=/true);		::addMask(CommonMask, ExtMask, /ExtendingManyInputs=/true);
if (CommonMask.empty())		if (CommonMask.empty()) {
		assert(InVectors.size() == 1 && "Expected only one vector with no mask");
return Cost;		return Cost;
		}
return Cost +		return Cost +
createShuffle(InVectors.front(),		createShuffle(InVectors.front(),
InVectors.size() == 2 ? InVectors.back() : nullptr,		InVectors.size() == 2 ? InVectors.back() : nullptr,
CommonMask);		CommonMask);
}		}

~ShuffleCostEstimator() {		~ShuffleCostEstimator() {
assert((IsFinalized \|\| CommonMask.empty()) &&		assert((IsFinalized \|\| CommonMask.empty()) &&
"Shuffle construction must be finalized.");		"Shuffle construction must be finalized.");
}		}
};		};

InstructionCost		InstructionCost
BoUpSLP::getEntryCost(const TreeEntry E, ArrayRef<Value > VectorizedVals,		BoUpSLP::getEntryCost(const TreeEntry E, ArrayRef<Value > VectorizedVals,
SmallPtrSetImpl<Value *> &CheckedExtracts) {		SmallPtrSetImpl<Value *> &CheckedExtracts) {
ArrayRef<Value *> VL = E->Scalars;		ArrayRef<Value *> VL = E->Scalars;

Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (E->State != TreeEntry::NeedToGather) {		if (E->State != TreeEntry::NeedToGather) {
Show All 20 Lines	BoUpSLP::getEntryCost(const TreeEntry E, ArrayRef<Value > VectorizedVals,
auto *FinalVecTy = FixedVectorType::get(ScalarTy, EntryVF);		auto *FinalVecTy = FixedVectorType::get(ScalarTy, EntryVF);

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))		if (allConstant(VL))
return 0;		return 0;
if (isa<InsertElementInst>(VL[0]))		if (isa<InsertElementInst>(VL[0]))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();
// The gather nodes use small bitwidth only if all operands use the same		return processBuildVector<ShuffleCostEstimator, InstructionCost>(
// bitwidth. Otherwise - use the original one.		E, TTI, VectorizedVals, this, CheckedExtracts);
if (It != MinBWs.end() && any_of(VL.drop_front(), [&](Value *V) {
auto VIt = MinBWs.find(V);
return VIt == MinBWs.end() \|\| VIt->second.first != It->second.first \|\|
VIt->second.second != It->second.second;
})) {
ScalarTy = VL.front()->getType();
VecTy = FixedVectorType::get(ScalarTy, VL.size());
}
ShuffleCostEstimator Estimator(TTI, VectorizedVals, this,
CheckedExtracts);
unsigned VF = E->getVectorFactor();
SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),
E->ReuseShuffleIndices.end());
SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());
// Build a mask out of the reorder indices and reorder scalars per this
// mask.
SmallVector<int> ReorderMask;
inversePermutation(E->ReorderIndices, ReorderMask);
if (!ReorderMask.empty())
reorderScalars(GatheredScalars, ReorderMask);
SmallVector<int> Mask;
SmallVector<int> ExtractMask;
SmallVector<std::optional<TargetTransformInfo::ShuffleKind>> GatherShuffles;
SmallVector<SmallVector<const TreeEntry *>> Entries;
SmallVector<std::optional<TTI::ShuffleKind>> ExtractShuffles;
// Check for gathered extracts.
bool Resized = false;
unsigned NumParts = TTI->getNumberOfParts(VecTy);
if (NumParts == 0 \|\| NumParts >= GatheredScalars.size())
NumParts = 1;
if (!all_of(GatheredScalars, UndefValue::classof)) {
ExtractShuffles =
tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);
if (!ExtractShuffles.empty()) {
if (Value *VecBase = Estimator.adjustExtracts(
E, ExtractMask, ExtractShuffles, NumParts)) {
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() &&
GatheredScalars.size() != VF) {
Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));
}
}
}

// Do not try to look for reshuffled loads for gathered loads (they will
// be handled later), for vectorized scalars, and cases, which are
// definitely not profitable (splats and small gather nodes.)
if (!ExtractShuffles.empty() \|\| E->getOpcode() != Instruction::Load \|\|
E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2))
GatherShuffles =
isGatherShuffledEntry(E, GatheredScalars, Mask, Entries, NumParts);
}
if (!GatherShuffles.empty()) {
if (GatherShuffles.size() == 1 &&
*GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&
Entries.front().front()->isSame(E->Scalars)) {
// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.
LLVM_DEBUG(
dbgs()
<< "SLP: perfect diamond match for gather bundle "
<< shortBundleName(VL) << ".\n");
// Restore the mask for previous partially matched values.
for (auto [I, V] : enumerate(E->Scalars)) {
if (isa<PoisonValue>(V)) {
Mask[I] = PoisonMaskElem;
continue;
}
if (Mask[I] == PoisonMaskElem)
Mask[I] = Entries.front().front()->findLaneForValue(V);
}
Estimator.add(*Entries.front().front(), Mask);
return Estimator.finalize(E->ReuseShuffleIndices);
}
if (!Resized) {
if (GatheredScalars.size() != VF &&
any_of(Entries, [&](ArrayRef<const TreeEntry *> TEs) {
return any_of(TEs, [&](const TreeEntry *TE) {
return TE->getVectorFactor() == VF;
});
}))
GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));
}
// Remove shuffled elements from list of gathers.
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
if (Mask[I] != PoisonMaskElem)
GatheredScalars[I] = PoisonValue::get(ScalarTy);
}
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
<< " entries for bundle "
<< shortBundleName(VL) << ".\n");
unsigned SliceSize = E->Scalars.size() / NumParts;
SmallVector<int> VecMask(Mask.size(), PoisonMaskElem);
for (const auto [I, TEs] : enumerate(Entries)) {
if (TEs.empty()) {
assert(!GatherShuffles[I] &&
"No shuffles with empty entries list expected.");
continue;
}
assert((TEs.size() == 1 \|\| TEs.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");
auto SubMask = ArrayRef(Mask).slice(I * SliceSize, SliceSize);
VecMask.assign(VecMask.size(), PoisonMaskElem);
copy(SubMask, std::next(VecMask.begin(), I * SliceSize));
Estimator.add(TEs.front(), TEs.back(), VecMask);
}
if (all_of(GatheredScalars, PoisonValue ::classof))
return Estimator.finalize(E->ReuseShuffleIndices);
return Estimator.finalize(
E->ReuseShuffleIndices, E->Scalars.size(),
[&](Value *&Vec, SmallVectorImpl<int> &Mask) {
Vec = Estimator.gather(GatheredScalars,
Constant::getNullValue(FixedVectorType::get(
ScalarTy, GatheredScalars.size())));
});
}
if (!all_of(GatheredScalars, PoisonValue::classof)) {
auto Gathers = ArrayRef(GatheredScalars).take_front(VL.size());
bool SameGathers = VL.equals(Gathers);
if (!SameGathers)
return Estimator.finalize(
E->ReuseShuffleIndices, E->Scalars.size(),
[&](Value *&Vec, SmallVectorImpl<int> &Mask) {
Vec = Estimator.gather(
GatheredScalars, Constant::getNullValue(FixedVectorType::get(
ScalarTy, GatheredScalars.size())));
});
Value *BV = Estimator.gather(Gathers);
SmallVector<int> ReuseMask(Gathers.size(), PoisonMaskElem);
std::iota(ReuseMask.begin(), ReuseMask.end(), 0);
Estimator.add(BV, ReuseMask);
}
return Estimator.finalize(E->ReuseShuffleIndices);
}		}
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!E->ReorderIndices.empty() &&		if (!E->ReorderIndices.empty() &&
E->State != TreeEntry::PossibleStridedVectorize) {		E->State != TreeEntry::PossibleStridedVectorize) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
if (E->getOpcode() == Instruction::Store) {		if (E->getOpcode() == Instruction::Store) {
// For stores the order is actually a mask.		// For stores the order is actually a mask.
▲ Show 20 Lines • Show All 1,465 Lines • ▼ Show 20 Lines	SmallVector<const TreeEntry *> FirstEntries(UsedTEs.front().begin(),
UsedTEs.front().end());		UsedTEs.front().end());
sort(FirstEntries, [](const TreeEntry TE1, const TreeEntry TE2) {		sort(FirstEntries, [](const TreeEntry TE1, const TreeEntry TE2) {
return TE1->Idx < TE2->Idx;		return TE1->Idx < TE2->Idx;
});		});
// Try to find the perfect match in another gather node at first.		// Try to find the perfect match in another gather node at first.
auto It = find_if(FirstEntries, [=](const TreeEntry EntryPtr) {		auto It = find_if(FirstEntries, [=](const TreeEntry EntryPtr) {
return EntryPtr->isSame(VL) \|\| EntryPtr->isSame(TE->Scalars);		return EntryPtr->isSame(VL) \|\| EntryPtr->isSame(TE->Scalars);
});		});
if (It != FirstEntries.end() && (*It)->getVectorFactor() == VL.size()) {		if (It != FirstEntries.end() &&
		((*It)->getVectorFactor() == VL.size() \|\|
		((*It)->getVectorFactor() == TE->Scalars.size() &&
		TE->ReuseShuffleIndices.size() == VL.size() &&
		(*It)->isSame(TE->Scalars)))) {
Entries.push_back(*It);		Entries.push_back(*It);
		if ((*It)->getVectorFactor() == VL.size()) {
std::iota(std::next(Mask.begin(), Part * VL.size()),		std::iota(std::next(Mask.begin(), Part * VL.size()),
std::next(Mask.begin(), (Part + 1) * VL.size()), 0);		std::next(Mask.begin(), (Part + 1) * VL.size()), 0);
		} else {
		SmallVector<int> CommonMask = TE->getCommonMask();
		copy(CommonMask, Mask.begin());
		}
// Clear undef scalars.		// Clear undef scalars.
for (int I = 0, Sz = VL.size(); I < Sz; ++I)		for (int I = 0, Sz = VL.size(); I < Sz; ++I)
if (isa<PoisonValue>(VL[I]))		if (isa<PoisonValue>(VL[I]))
Mask[I] = PoisonMaskElem;		Mask[I] = PoisonMaskElem;
return TargetTransformInfo::SK_PermuteSingleSrc;		return TargetTransformInfo::SK_PermuteSingleSrc;
}		}
// No perfect match, just shuffle, so choose the first tree node from the		// No perfect match, just shuffle, so choose the first tree node from the
// tree.		// tree.
▲ Show 20 Lines • Show All 659 Lines • ▼ Show 20 Lines	class BoUpSLP::ShuffleInstructionBuilder final : public BaseShuffleAnalysis {
}		}

public:		public:
ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)		ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)
: Builder(Builder), R(R) {}		: Builder(Builder), R(R) {}

/// Adjusts extractelements after reusing them.		/// Adjusts extractelements after reusing them.
Value adjustExtracts(const TreeEntry E, MutableArrayRef<int> Mask,		Value adjustExtracts(const TreeEntry E, MutableArrayRef<int> Mask,
		ArrayRef<std::optional<TTI::ShuffleKind>> ShuffleKinds,
unsigned NumParts, bool &UseVecBaseAsInput) {		unsigned NumParts, bool &UseVecBaseAsInput) {
UseVecBaseAsInput = false;		UseVecBaseAsInput = false;
SmallPtrSet<Value *, 4> UniqueBases;		SmallPtrSet<Value *, 4> UniqueBases;
Value *VecBase = nullptr;		Value *VecBase = nullptr;
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {		for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
int Idx = Mask[I];		int Idx = Mask[I];
if (Idx == PoisonMaskElem)		if (Idx == PoisonMaskElem)
continue;		continue;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	#endif // NDEBUG
TransformToIdentity(VecMask);		TransformToIdentity(VecMask);
}		}
}		}
copy(VecMask, Mask.begin());		copy(VecMask, Mask.begin());
return Vec;		return Vec;
}		}
/// Checks if the specified entry \p E needs to be delayed because of its		/// Checks if the specified entry \p E needs to be delayed because of its
/// dependency nodes.		/// dependency nodes.
Value needToDelay(const TreeEntry E,		std::optional<Value *>
ArrayRef<SmallVector<const TreeEntry *>> Deps) {		needToDelay(const TreeEntry *E,
		ArrayRef<SmallVector<const TreeEntry *>> Deps) const {
// No need to delay emission if all deps are ready.		// No need to delay emission if all deps are ready.
if (all_of(Deps, [](ArrayRef<const TreeEntry *> TEs) {		if (all_of(Deps, [](ArrayRef<const TreeEntry *> TEs) {
return all_of(		return all_of(
TEs, [](const TreeEntry *TE) { return TE->VectorizedValue; });		TEs, [](const TreeEntry *TE) { return TE->VectorizedValue; });
}))		}))
return nullptr;		return std::nullopt;
// Postpone gather emission, will be emitted after the end of the		// Postpone gather emission, will be emitted after the end of the
// process to keep correct order.		// process to keep correct order.
auto *VecTy = FixedVectorType::get(E->Scalars.front()->getType(),		auto *VecTy = FixedVectorType::get(E->Scalars.front()->getType(),
E->getVectorFactor());		E->getVectorFactor());
return Builder.CreateAlignedLoad(		return Builder.CreateAlignedLoad(
VecTy, PoisonValue::get(PointerType::getUnqual(VecTy->getContext())),		VecTy, PoisonValue::get(PointerType::getUnqual(VecTy->getContext())),
MaybeAlign());		MaybeAlign());
}		}
		void add(const TreeEntry &E1, const TreeEntry &E2, ArrayRef<int> Mask) {
		add(E1.VectorizedValue, E2.VectorizedValue, Mask);
		}
		void add(const TreeEntry &E1, ArrayRef<int> Mask) {
		add(E1.VectorizedValue, Mask);
		}
/// Adds 2 input vectors and the mask for their shuffling.		/// Adds 2 input vectors and the mask for their shuffling.
void add(Value V1, Value V2, ArrayRef<int> Mask) {		void add(Value V1, Value V2, ArrayRef<int> Mask) {
assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");		assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");
if (InVectors.empty()) {		if (InVectors.empty()) {
InVectors.push_back(V1);		InVectors.push_back(V1);
InVectors.push_back(V2);		InVectors.push_back(V2);
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
return;		return;
Show All 13 Lines	for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
CommonMask[Idx] = Idx + Sz;		CommonMask[Idx] = Idx + Sz;
InVectors.front() = Vec;		InVectors.front() = Vec;
if (InVectors.size() == 2)		if (InVectors.size() == 2)
InVectors.back() = V1;		InVectors.back() = V1;
else		else
InVectors.push_back(V1);		InVectors.push_back(V1);
}		}
/// Adds another one input vector and the mask for the shuffling.		/// Adds another one input vector and the mask for the shuffling.
void add(Value *V1, ArrayRef<int> Mask) {		void add(Value *V1, ArrayRef<int> Mask, bool = false) {
if (InVectors.empty()) {		if (InVectors.empty()) {
if (!isa<FixedVectorType>(V1->getType())) {		if (!isa<FixedVectorType>(V1->getType())) {
V1 = createShuffle(V1, nullptr, CommonMask);		V1 = createShuffle(V1, nullptr, CommonMask);
CommonMask.assign(Mask.size(), PoisonMaskElem);		CommonMask.assign(Mask.size(), PoisonMaskElem);
transformMaskAfterShuffle(CommonMask, Mask);		transformMaskAfterShuffle(CommonMask, Mask);
}		}
InVectors.push_back(V1);		InVectors.push_back(V1);
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);		CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);
}		}
/// Adds another one input vector and the mask for the shuffling.		/// Adds another one input vector and the mask for the shuffling.
void addOrdered(Value *V1, ArrayRef<unsigned> Order) {		void addOrdered(Value *V1, ArrayRef<unsigned> Order) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
inversePermutation(Order, NewMask);		inversePermutation(Order, NewMask);
add(V1, NewMask);		add(V1, NewMask);
}		}
Value gather(ArrayRef<Value > VL, Value *Root = nullptr) {		Value gather(ArrayRef<Value > VL, unsigned MaskVF = 0,
		Value *Root = nullptr) {
return R.gather(VL, Root);		return R.gather(VL, Root);
}		}
Value createFreeze(Value V) { return Builder.CreateFreeze(V); }		Value createFreeze(Value V) { return Builder.CreateFreeze(V); }
/// Finalize emission of the shuffles.		/// Finalize emission of the shuffles.
/// \param Action the action (if any) to be performed before final applying of		/// \param Action the action (if any) to be performed before final applying of
/// the \p ExtMask mask.		/// the \p ExtMask mask.
Value *		Value *
finalize(ArrayRef<int> ExtMask, unsigned VF = 0,		finalize(ArrayRef<int> ExtMask, unsigned VF = 0,
▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	if (NumParts == 0 \|\| NumParts >= GatheredScalars.size())
NumParts = 1;		NumParts = 1;
if (!all_of(GatheredScalars, UndefValue::classof)) {		if (!all_of(GatheredScalars, UndefValue::classof)) {
// Check for gathered extracts.		// Check for gathered extracts.
bool Resized = false;		bool Resized = false;
ExtractShuffles =		ExtractShuffles =
tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);		tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);
if (!ExtractShuffles.empty()) {		if (!ExtractShuffles.empty()) {
if (Value *VecBase = ShuffleBuilder.adjustExtracts(		if (Value *VecBase = ShuffleBuilder.adjustExtracts(
E, ExtractMask, NumParts, UseVecBaseAsInput)) {		E, ExtractMask, ExtractShuffles, NumParts, UseVecBaseAsInput)) {
ExtractVecBase = VecBase;		ExtractVecBase = VecBase;
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))		if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() &&		if (VF == VecBaseTy->getNumElements() &&
GatheredScalars.size() != VF) {		GatheredScalars.size() != VF) {
Resized = true;		Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
}		}
}		}
}		}
// Gather extracts after we check for full matched gathers only.		// Gather extracts after we check for full matched gathers only.
if (!ExtractShuffles.empty() \|\| E->getOpcode() != Instruction::Load \|\|		if (!ExtractShuffles.empty() \|\| E->getOpcode() != Instruction::Load \|\|
E->isAltShuffle() \|\|		E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|		all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|		isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {		(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {
GatherShuffles =		GatherShuffles =
isGatherShuffledEntry(E, GatheredScalars, Mask, Entries, NumParts);		isGatherShuffledEntry(E, GatheredScalars, Mask, Entries, NumParts);
}		}
if (!GatherShuffles.empty()) {		if (!GatherShuffles.empty()) {
if (Value *Delayed = ShuffleBuilder.needToDelay(E, Entries)) {		if (std::optional<ResTy> Delayed =
		ShuffleBuilder.needToDelay(E, Entries)) {
// Delay emission of gathers which are not ready yet.		// Delay emission of gathers which are not ready yet.
PostponedGathers.insert(E);		PostponedGathers.insert(E);
// Postpone gather emission, will be emitted after the end of the		// Postpone gather emission, will be emitted after the end of the
// process to keep correct order.		// process to keep correct order.
return Delayed;		return *Delayed;
}		}
if (GatherShuffles.size() == 1 &&		if (GatherShuffles.size() == 1 &&
*GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&		*GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&
Entries.front().front()->isSame(E->Scalars)) {		Entries.front().front()->isSame(E->Scalars)) {
// Perfect match in the graph, will reuse the previously vectorized		// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.		// node. Cost is 0.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "SLP: perfect diamond match for gather bundle "		<< "SLP: perfect diamond match for gather bundle "
<< shortBundleName(E->Scalars) << ".\n");		<< shortBundleName(E->Scalars) << ".\n");
// Restore the mask for previous partially matched values.		// Restore the mask for previous partially matched values.
		Mask.resize(E->Scalars.size());
const TreeEntry *FrontTE = Entries.front().front();		const TreeEntry *FrontTE = Entries.front().front();
if (FrontTE->ReorderIndices.empty() &&		if (FrontTE->ReorderIndices.empty() &&
((FrontTE->ReuseShuffleIndices.empty() &&		((FrontTE->ReuseShuffleIndices.empty() &&
E->Scalars.size() == FrontTE->Scalars.size()) \|\|		E->Scalars.size() == FrontTE->Scalars.size()) \|\|
(E->Scalars.size() == FrontTE->ReuseShuffleIndices.size()))) {		(E->Scalars.size() == FrontTE->ReuseShuffleIndices.size()))) {
std::iota(Mask.begin(), Mask.end(), 0);		std::iota(Mask.begin(), Mask.end(), 0);
} else {		} else {
for (auto [I, V] : enumerate(E->Scalars)) {		for (auto [I, V] : enumerate(E->Scalars)) {
if (isa<PoisonValue>(V)) {		if (isa<PoisonValue>(V)) {
Mask[I] = PoisonMaskElem;		Mask[I] = PoisonMaskElem;
continue;		continue;
}		}
Mask[I] = FrontTE->findLaneForValue(V);		Mask[I] = FrontTE->findLaneForValue(V);
}		}
}		}
ShuffleBuilder.add(FrontTE->VectorizedValue, Mask);		ShuffleBuilder.add(*FrontTE, Mask);
Res = ShuffleBuilder.finalize(E->getCommonMask());		Res = ShuffleBuilder.finalize(E->getCommonMask());
return Res;		return Res;
}		}
if (!Resized) {		if (!Resized) {
if (GatheredScalars.size() != VF &&		if (GatheredScalars.size() != VF &&
any_of(Entries, [&](ArrayRef<const TreeEntry *> TEs) {		any_of(Entries, [&](ArrayRef<const TreeEntry *> TEs) {
return any_of(TEs, [&](const TreeEntry *TE) {		return any_of(TEs, [&](const TreeEntry *TE) {
return TE->getVectorFactor() == VF;		return TE->getVectorFactor() == VF;
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	if (!ExtractShuffles.empty()) {
IsUsedInExpr = false;		IsUsedInExpr = false;
IsNonPoisoned &=		IsNonPoisoned &=
isGuaranteedNotToBePoison(Vec1) && isGuaranteedNotToBePoison(Vec2);		isGuaranteedNotToBePoison(Vec1) && isGuaranteedNotToBePoison(Vec2);
ShuffleBuilder.add(Vec1, Vec2, ExtractMask);		ShuffleBuilder.add(Vec1, Vec2, ExtractMask);
} else if (Vec1) {		} else if (Vec1) {
IsUsedInExpr &= FindReusedSplat(		IsUsedInExpr &= FindReusedSplat(
ExtractMask,		ExtractMask,
cast<FixedVectorType>(Vec1->getType())->getNumElements());		cast<FixedVectorType>(Vec1->getType())->getNumElements());
ShuffleBuilder.add(Vec1, ExtractMask);		ShuffleBuilder.add(Vec1, ExtractMask, /ForExtracts=/true);
IsNonPoisoned &= isGuaranteedNotToBePoison(Vec1);		IsNonPoisoned &= isGuaranteedNotToBePoison(Vec1);
} else {		} else {
IsUsedInExpr = false;		IsUsedInExpr = false;
ShuffleBuilder.add(PoisonValue::get(FixedVectorType::get(		ShuffleBuilder.add(PoisonValue::get(FixedVectorType::get(
ScalarTy, GatheredScalars.size())),		ScalarTy, GatheredScalars.size())),
ExtractMask);		ExtractMask, /ForExtracts=/true);
}		}
}		}
if (!GatherShuffles.empty()) {		if (!GatherShuffles.empty()) {
unsigned SliceSize = E->Scalars.size() / NumParts;		unsigned SliceSize = E->Scalars.size() / NumParts;
SmallVector<int> VecMask(Mask.size(), PoisonMaskElem);		SmallVector<int> VecMask(Mask.size(), PoisonMaskElem);
for (const auto [I, TEs] : enumerate(Entries)) {		for (const auto [I, TEs] : enumerate(Entries)) {
if (TEs.empty()) {		if (TEs.empty()) {
assert(!GatherShuffles[I] &&		assert(!GatherShuffles[I] &&
"No shuffles with empty entries list expected.");		"No shuffles with empty entries list expected.");
continue;		continue;
}		}
assert((TEs.size() == 1 \|\| TEs.size() == 2) &&		assert((TEs.size() == 1 \|\| TEs.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");		"Expected shuffle of 1 or 2 entries.");
auto SubMask = ArrayRef(Mask).slice(I * SliceSize, SliceSize);		auto SubMask = ArrayRef(Mask).slice(I * SliceSize, SliceSize);
VecMask.assign(VecMask.size(), PoisonMaskElem);		VecMask.assign(VecMask.size(), PoisonMaskElem);
copy(SubMask, std::next(VecMask.begin(), I * SliceSize));		copy(SubMask, std::next(VecMask.begin(), I * SliceSize));
if (TEs.size() == 1) {		if (TEs.size() == 1) {
IsUsedInExpr &= FindReusedSplat(		IsUsedInExpr &=
VecMask,		FindReusedSplat(VecMask, TEs.front()->getVectorFactor());
cast<FixedVectorType>(TEs.front()->VectorizedValue->getType())		ShuffleBuilder.add(*TEs.front(), VecMask);
->getNumElements());		if (TEs.front()->VectorizedValue)
ShuffleBuilder.add(TEs.front()->VectorizedValue, VecMask);
IsNonPoisoned &=		IsNonPoisoned &=
isGuaranteedNotToBePoison(TEs.front()->VectorizedValue);		isGuaranteedNotToBePoison(TEs.front()->VectorizedValue);
} else {		} else {
IsUsedInExpr = false;		IsUsedInExpr = false;
ShuffleBuilder.add(TEs.front()->VectorizedValue,		ShuffleBuilder.add(TEs.front(), TEs.back(), VecMask);
TEs.back()->VectorizedValue, VecMask);		if (TEs.front()->VectorizedValue && TEs.back()->VectorizedValue)
IsNonPoisoned &=		IsNonPoisoned &=
isGuaranteedNotToBePoison(TEs.front()->VectorizedValue) &&		isGuaranteedNotToBePoison(TEs.front()->VectorizedValue) &&
isGuaranteedNotToBePoison(TEs.back()->VectorizedValue);		isGuaranteedNotToBePoison(TEs.back()->VectorizedValue);
}		}
}		}
}		}
// Try to figure out best way to combine values: build a shuffle and insert		// Try to figure out best way to combine values: build a shuffle and insert
// elements or just build several shuffles.		// elements or just build several shuffles.
// Insert non-constant scalars.		// Insert non-constant scalars.
SmallVector<Value *> NonConstants(GatheredScalars);		SmallVector<Value *> NonConstants(GatheredScalars);
int EMSz = ExtractMask.size();		int EMSz = ExtractMask.size();
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
NonConstants[I] = PoisonValue::get(ScalarTy);		NonConstants[I] = PoisonValue::get(ScalarTy);
else		else
GatheredScalars[I] = PoisonValue::get(ScalarTy);		GatheredScalars[I] = PoisonValue::get(ScalarTy);
}		}
// Generate constants for final shuffle and build a mask for them.		// Generate constants for final shuffle and build a mask for them.
if (!all_of(GatheredScalars, PoisonValue::classof)) {		if (!all_of(GatheredScalars, PoisonValue::classof)) {
SmallVector<int> BVMask(GatheredScalars.size(), PoisonMaskElem);		SmallVector<int> BVMask(GatheredScalars.size(), PoisonMaskElem);
TryPackScalars(GatheredScalars, BVMask, /IsRootPoison=/true);		TryPackScalars(GatheredScalars, BVMask, /IsRootPoison=/true);
Value *BV = ShuffleBuilder.gather(GatheredScalars);		Value *BV = ShuffleBuilder.gather(GatheredScalars, BVMask.size());
ShuffleBuilder.add(BV, BVMask);		ShuffleBuilder.add(BV, BVMask);
}		}
if (all_of(NonConstants, [=](Value *V) {		if (all_of(NonConstants, [=](Value *V) {
return isa<PoisonValue>(V) \|\|		return isa<PoisonValue>(V) \|\|
(IsSingleShuffle && ((IsIdentityShuffle &&		(IsSingleShuffle && ((IsIdentityShuffle &&
IsNonPoisoned) \|\| IsUsedInExpr) && isa<UndefValue>(V));		IsNonPoisoned) \|\| IsUsedInExpr) && isa<UndefValue>(V));
}))		}))
Res = ShuffleBuilder.finalize(E->ReuseShuffleIndices);		Res = ShuffleBuilder.finalize(E->ReuseShuffleIndices);
else		else
Res = ShuffleBuilder.finalize(		Res = ShuffleBuilder.finalize(
E->ReuseShuffleIndices, E->Scalars.size(),		E->ReuseShuffleIndices, E->Scalars.size(),
[&](Value *&Vec, SmallVectorImpl<int> &Mask) {		[&](Value *&Vec, SmallVectorImpl<int> &Mask) {
TryPackScalars(NonConstants, Mask, /IsRootPoison=/false);		TryPackScalars(NonConstants, Mask, /IsRootPoison=/false);
Vec = ShuffleBuilder.gather(NonConstants, Vec);		Vec = ShuffleBuilder.gather(NonConstants, Mask.size(), Vec);
});		});
} else if (!allConstant(GatheredScalars)) {		} else if (!allConstant(GatheredScalars)) {
// Gather unique scalars and all constants.		// Gather unique scalars and all constants.
SmallVector<int> ReuseMask(GatheredScalars.size(), PoisonMaskElem);		SmallVector<int> ReuseMask(GatheredScalars.size(), PoisonMaskElem);
TryPackScalars(GatheredScalars, ReuseMask, /IsRootPoison=/true);		TryPackScalars(GatheredScalars, ReuseMask, /IsRootPoison=/true);
Value *BV = ShuffleBuilder.gather(GatheredScalars);		Value *BV = ShuffleBuilder.gather(GatheredScalars, ReuseMask.size());
ShuffleBuilder.add(BV, ReuseMask);		ShuffleBuilder.add(BV, ReuseMask);
Res = ShuffleBuilder.finalize(E->ReuseShuffleIndices);		Res = ShuffleBuilder.finalize(E->ReuseShuffleIndices);
} else {		} else {
// Gather all constants.		// Gather all constants.
SmallVector<int> Mask(E->Scalars.size(), PoisonMaskElem);		SmallVector<int> Mask(E->Scalars.size(), PoisonMaskElem);
for (auto [I, V] : enumerate(E->Scalars)) {		for (auto [I, V] : enumerate(E->Scalars)) {
if (!isa<PoisonValue>(V))		if (!isa<PoisonValue>(V))
Mask[I] = I;		Mask[I] = I;
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	Value BoUpSLP::vectorizeTree(TreeEntry E, bool PostponedPHIs) {
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();
if (auto *Store = dyn_cast<StoreInst>(VL0))		if (auto *Store = dyn_cast<StoreInst>(VL0))
ScalarTy = Store->getValueOperand()->getType();		ScalarTy = Store->getValueOperand()->getType();
else if (auto *IE = dyn_cast<InsertElementInst>(VL0))		else if (auto *IE = dyn_cast<InsertElementInst>(VL0))
ScalarTy = IE->getOperand(1)->getType();		ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());		auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());
		nlopesUnsubmitted Not Done Reply Inline Actions Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you! nlopes: Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, thanks! ABataev: Sure, thanks!
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
assert((E->ReorderIndices.empty() \|\|		assert((E->ReorderIndices.empty() \|\|
E != VectorizableTree.front().get() \|\|		E != VectorizableTree.front().get() \|\|
!E->UserTreeIndices.empty()) &&		!E->UserTreeIndices.empty()) &&
"PHI reordering is free.");		"PHI reordering is free.");
if (PostponedPHIs && E->VectorizedValue)		if (PostponedPHIs && E->VectorizedValue)
return E->VectorizedValue;		return E->VectorizedValue;
▲ Show 20 Lines • Show All 5,183 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/scalarization-overhead.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=arm64-apple-macosx11.0.0 -passes=slp-vectorizer -S < %s \| FileCheck %s			; RUN: opt -mtriple=arm64-apple-macosx11.0.0 -passes=slp-vectorizer -S < %s \| FileCheck %s

	; Test case reported on D134605 where the vectorization was causing a slowdown due to an underestimation in the cost of the extractions.			; Test case reported on D134605 where the vectorization was causing a slowdown due to an underestimation in the cost of the extractions.

	define fastcc i64 @zot(float %arg, float %arg1, float %arg2, float %arg3, float %arg4, ptr %arg5, i1 %arg6, i1 %arg7, i1 %arg8) {			define fastcc i64 @zot(float %arg, float %arg1, float %arg2, float %arg3, float %arg4, ptr %arg5, i1 %arg6, i1 %arg7, i1 %arg8) {
	; CHECK-LABEL: @zot(			; CHECK-LABEL: @zot(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[VAL:%.*]] = fmul fast float 0.000000e+00, 0.000000e+00			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x float> <float 0.000000e+00, float poison, float poison, float poison>, float [[ARG:%.]], i32 1
	; CHECK-NEXT: [[VAL9:%.]] = fmul fast float 0.000000e+00, [[ARG:%.]]			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x float> [[TMP0]], float [[ARG3:%.]], i32 2
	; CHECK-NEXT: [[VAL10:%.]] = fmul fast float [[ARG3:%.]], 1.000000e+00			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
	; CHECK-NEXT: [[VAL11:%.*]] = fmul fast float [[ARG3]], 1.000000e+00			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> <float 0.000000e+00, float 0.000000e+00, float 1.000000e+00, float 1.000000e+00>, [[TMP2]]
	; CHECK-NEXT: [[VAL12:%.*]] = fadd fast float [[ARG3]], 1.000000e+00			; CHECK-NEXT: [[VAL12:%.*]] = fadd fast float [[ARG3]], 1.000000e+00
	; CHECK-NEXT: [[VAL13:%.*]] = fadd fast float [[VAL12]], 2.000000e+00			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP2]], float [[VAL12]], i32 0
	; CHECK-NEXT: [[VAL14:%.*]] = fadd fast float 0.000000e+00, 0.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float 0.000000e+00, i32 1
	; CHECK-NEXT: [[VAL15:%.*]] = fadd fast float [[VAL14]], 1.000000e+00			; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <4 x float> [[TMP5]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[VAL16:%.*]] = fadd fast float [[ARG3]], 1.000000e+00
	; CHECK-NEXT: [[VAL17:%.*]] = fadd fast float [[ARG3]], 1.000000e+00
	; CHECK-NEXT: br i1 [[ARG6:%.]], label [[BB18:%.]], label [[BB57:%.*]]			; CHECK-NEXT: br i1 [[ARG6:%.]], label [[BB18:%.]], label [[BB57:%.*]]
	; CHECK: bb18:			; CHECK: bb18:
	; CHECK-NEXT: [[VAL19:%.]] = phi float [ [[VAL13]], [[BB:%.]] ]			; CHECK-NEXT: [[TMP7:%.]] = phi <4 x float> [ [[TMP6]], [[BB:%.]] ]
	; CHECK-NEXT: [[VAL20:%.*]] = phi float [ [[VAL15]], [[BB]] ]			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[TMP6]], i32 2
	; CHECK-NEXT: [[VAL21:%.*]] = phi float [ [[VAL16]], [[BB]] ]			; CHECK-NEXT: [[VAL23:%.*]] = fmul fast float [[TMP8]], 2.000000e+00
	; CHECK-NEXT: [[VAL22:%.*]] = phi float [ [[VAL17]], [[BB]] ]			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x float> [[TMP6]], i32 3
	; CHECK-NEXT: [[VAL23:%.*]] = fmul fast float [[VAL16]], 2.000000e+00			; CHECK-NEXT: [[VAL24:%.*]] = fmul fast float [[TMP9]], 3.000000e+00
	; CHECK-NEXT: [[VAL24:%.*]] = fmul fast float [[VAL17]], 3.000000e+00
	; CHECK-NEXT: br i1 [[ARG7:%.]], label [[BB25:%.]], label [[BB57]]			; CHECK-NEXT: br i1 [[ARG7:%.]], label [[BB25:%.]], label [[BB57]]
	; CHECK: bb25:			; CHECK: bb25:
	; CHECK-NEXT: [[VAL26:%.*]] = phi float [ [[VAL19]], [[BB18]] ]			; CHECK-NEXT: [[TMP10:%.*]] = phi <4 x float> [ [[TMP7]], [[BB18]] ]
	; CHECK-NEXT: [[VAL27:%.*]] = phi float [ [[VAL20]], [[BB18]] ]			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[VAL28:%.*]] = phi float [ [[VAL21]], [[BB18]] ]
	; CHECK-NEXT: [[VAL29:%.*]] = phi float [ [[VAL22]], [[BB18]] ]
	; CHECK-NEXT: br label [[BB30:%.*]]			; CHECK-NEXT: br label [[BB30:%.*]]
	; CHECK: bb30:			; CHECK: bb30:
	; CHECK-NEXT: [[VAL31:%.]] = phi float [ [[VAL55:%.]], [[BB30]] ], [ 0.000000e+00, [[BB25]] ]			; CHECK-NEXT: [[VAL31:%.]] = phi float [ [[VAL55:%.]], [[BB30]] ], [ 0.000000e+00, [[BB25]] ]
	; CHECK-NEXT: [[VAL32:%.*]] = phi float [ [[VAL9]], [[BB30]] ], [ 0.000000e+00, [[BB25]] ]			; CHECK-NEXT: [[VAL32:%.*]] = phi float [ [[TMP11]], [[BB30]] ], [ 0.000000e+00, [[BB25]] ]
	; CHECK-NEXT: [[VAL33:%.]] = load i8, ptr [[ARG5:%.]], align 1			; CHECK-NEXT: [[TMP12:%.]] = load <4 x i8>, ptr [[ARG5:%.]], align 1
	; CHECK-NEXT: [[VAL34:%.*]] = uitofp i8 [[VAL33]] to float			; CHECK-NEXT: [[TMP13:%.*]] = uitofp <4 x i8> [[TMP12]] to <4 x float>
	; CHECK-NEXT: [[VAL35:%.*]] = getelementptr inbounds i8, ptr [[ARG5]], i64 1			; CHECK-NEXT: [[TMP14:%.*]] = fsub fast <4 x float> [[TMP13]], [[TMP3]]
	; CHECK-NEXT: [[VAL36:%.*]] = load i8, ptr [[VAL35]], align 1			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x float> [[TMP14]], [[TMP10]]
	; CHECK-NEXT: [[VAL37:%.*]] = uitofp i8 [[VAL36]] to float			; CHECK-NEXT: [[TMP16:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP15]])
	; CHECK-NEXT: [[VAL38:%.*]] = getelementptr inbounds i8, ptr [[ARG5]], i64 2
	; CHECK-NEXT: [[VAL39:%.*]] = load i8, ptr [[VAL38]], align 1
	; CHECK-NEXT: [[VAL40:%.*]] = uitofp i8 [[VAL39]] to float
	; CHECK-NEXT: [[VAL41:%.*]] = getelementptr inbounds i8, ptr [[ARG5]], i64 3
	; CHECK-NEXT: [[VAL42:%.*]] = load i8, ptr [[VAL41]], align 1
	; CHECK-NEXT: [[VAL43:%.*]] = uitofp i8 [[VAL42]] to float
	; CHECK-NEXT: [[VAL44:%.*]] = fsub fast float [[VAL34]], [[VAL]]
	; CHECK-NEXT: [[VAL45:%.*]] = fsub fast float [[VAL37]], [[VAL9]]
	; CHECK-NEXT: [[VAL46:%.*]] = fsub fast float [[VAL40]], [[VAL10]]
	; CHECK-NEXT: [[VAL47:%.*]] = fsub fast float [[VAL43]], [[VAL11]]
	; CHECK-NEXT: [[VAL48:%.*]] = fmul fast float [[VAL44]], [[VAL26]]
	; CHECK-NEXT: [[VAL49:%.*]] = fmul fast float [[VAL45]], [[VAL27]]
	; CHECK-NEXT: [[VAL50:%.*]] = fadd fast float [[VAL49]], [[VAL48]]
	; CHECK-NEXT: [[VAL51:%.*]] = fmul fast float [[VAL46]], [[VAL28]]
	; CHECK-NEXT: [[VAL52:%.*]] = fadd fast float [[VAL50]], [[VAL51]]
	; CHECK-NEXT: [[VAL53:%.*]] = fmul fast float [[VAL47]], [[VAL29]]
	; CHECK-NEXT: [[VAL54:%.*]] = fadd fast float [[VAL52]], [[VAL53]]
	; CHECK-NEXT: [[VAL55]] = tail call fast float @llvm.minnum.f32(float [[VAL31]], float [[ARG1:%.*]])			; CHECK-NEXT: [[VAL55]] = tail call fast float @llvm.minnum.f32(float [[VAL31]], float [[ARG1:%.*]])
	; CHECK-NEXT: [[VAL56:%.]] = tail call fast float @llvm.maxnum.f32(float [[ARG2:%.]], float [[VAL54]])			; CHECK-NEXT: [[VAL56:%.]] = tail call fast float @llvm.maxnum.f32(float [[ARG2:%.]], float [[TMP16]])
	; CHECK-NEXT: call void @ham(float [[VAL55]], float [[VAL56]])			; CHECK-NEXT: call void @ham(float [[VAL55]], float [[VAL56]])
	; CHECK-NEXT: br i1 [[ARG8:%.*]], label [[BB30]], label [[BB57]]			; CHECK-NEXT: br i1 [[ARG8:%.*]], label [[BB30]], label [[BB57]]
	; CHECK: bb57:			; CHECK: bb57:
	; CHECK-NEXT: ret i64 0			; CHECK-NEXT: ret i64 0
	;			;
	bb:			bb:
	%val = fmul fast float 0.000000e+00, 0.000000e+00			%val = fmul fast float 0.000000e+00, 0.000000e+00
	%val9 = fmul fast float 0.000000e+00, %arg			%val9 = fmul fast float 0.000000e+00, %arg
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 558067

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/scalarization-overhead.ll

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic