This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic

Authored by ABataev on Oct 1 2021, 4:10 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
anton-afanasyev
vporpo

Commits

rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph.

Summary

Currently we emit gathers for scalars being vectorized in the tre as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Oct 1 2021, 4:10 PM

Herald added subscribers: kerbowa, hiraditya, nhaehnle, jvesely. · View Herald TranscriptOct 1 2021, 4:10 PM

ABataev requested review of this revision.Oct 1 2021, 4:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2021, 4:10 PM

Harbormaster completed remote builds in B126755: Diff 376651.Oct 1 2021, 4:10 PM

Rebase

Harbormaster completed remote builds in B126915: Diff 377013.Oct 4 2021, 2:29 PM

RKSimon retitled this revision from [SLP]Improve gathering of the scals used in the graph. to [SLP]Improve gathering of the scalars used in the graph..Oct 5 2021, 6:35 AM

Rebase + bug fixes

Harbormaster completed remote builds in B133811: Diff 386648.Nov 11 2021, 2:47 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Rebase

Harbormaster completed remote builds in B135503: Diff 389033.Nov 22 2021, 7:36 PM

RKSimon added inline comments.Nov 29 2021, 9:13 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
303	Is it worth merging the isa<> and cast<> into a dyn_cast<>?
596	return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block?
6558	What targets are we still missing support for?

ABataev added inline comments.Nov 29 2021, 9:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6558	AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.

Rebase + address comments.

Harbormaster completed remote builds in B136480: Diff 390398.Nov 29 2021, 11:39 AM

Rebase

Harbormaster completed remote builds in B136694: Diff 390702.Nov 30 2021, 8:08 AM

Rebase

Harbormaster completed remote builds in B136747: Diff 390783.Nov 30 2021, 1:09 PM

Rebase

Harbormaster completed remote builds in B138215: Diff 392842.Dec 8 2021, 12:09 PM

Rebase

RKSimon added inline comments.Dec 14 2021, 8:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6402–6403	Wshadow warning vs Idx @ Line 4688?
6528–6529	Wshadow warning vs Idx @ Line 4688?

Address comments

Harbormaster completed remote builds in B139236: Diff 394269.Dec 14 2021, 9:48 AM

Rebase

Harbormaster completed remote builds in B141051: Diff 396715.Dec 30 2021, 2:15 PM

ABataev mentioned this in D123587: [SLP] Generate shuffles if we can reorder an existing node.Apr 12 2022, 12:05 PM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 7:51 AM

Herald added subscribers: • pcwang-thead, nlopes, kosarev. · View Herald Transcript

nlopes added inline comments.Aug 26 2022, 7:54 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
10073	Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you!

ABataev added inline comments.Aug 26 2022, 8:08 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
10073	Sure, thanks!

Address comments

Harbormaster completed remote builds in B183623: Diff 455933.Aug 26 2022, 10:50 AM

Rebase

Harbormaster completed remote builds in B186399: Diff 459790.Sep 13 2022, 11:19 AM

ABataev mentioned this in rG796af0c02728: [SLP] Move getInsertIndex function, NFC..Sep 14 2022, 6:24 AM

ABataev mentioned this in rGd647312e3f57: [SLP][NFC]Extract getLastInstructionInBundle function for better.Sep 14 2022, 8:44 AM

Rebase

Harbormaster completed remote builds in B192832: Diff 468668.Oct 18 2022, 1:42 PM

nhaehnle removed a subscriber: nhaehnle.Oct 19 2022, 2:00 AM

Large update.
Includes:

Unifies all shuffle builders and shuffle demission operands.
Generalizes emission and cost model estimation of the buildvectors/gathers.

Will be splitted into several smaller patches eventually.

Harbormaster completed remote builds in B201460: Diff 480583.Dec 6 2022, 9:34 PM

ABataev mentioned this in D139718: [SLP][NFC]Inital redesign of ShuffleInstructionBuilder, NFC..Dec 9 2022, 7:50 AM

ABataev mentioned this in rGecac8192dbf6: [SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC..Dec 13 2022, 9:54 AM

Rebase

Harbormaster completed remote builds in B202927: Diff 482594.Dec 13 2022, 1:17 PM

Restore accidentally removed code.

Harbormaster completed remote builds in B202945: Diff 482619.Dec 13 2022, 2:43 PM

Rebase

Harbormaster completed remote builds in B204383: Diff 484571.Dec 21 2022, 7:50 AM

ABataev mentioned this in D140499: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 21 2022, 1:54 PM

khchen added a subscriber: khchen.Dec 22 2022, 8:35 AM

ABataev mentioned this in rGac01ae71f0c4: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 28 2022, 6:11 AM

Rebase

Harbormaster completed remote builds in B206131: Diff 486895.Jan 6 2023, 10:07 AM

Rebase

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 9 2023, 9:43 AM

Harbormaster completed remote builds in B206577: Diff 487485.Jan 9 2023, 10:30 AM

ABataev mentioned this in D141512: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 11 2023, 8:33 AM

ABataev mentioned this in D141940: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Jan 17 2023, 8:01 AM

ABataev mentioned this in rG9bdcf8778a5c: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 19 2023, 1:50 PM

ABataev mentioned this in rG708eb1b96d9a: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Feb 20 2023, 6:16 AM

ABataev mentioned this in D144958: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Feb 28 2023, 5:21 AM

ABataev mentioned this in rGa611b3f3059e: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 7 2023, 12:47 PM

Rebase

Restore deleted code/update test

Harbormaster completed remote builds in B218206: Diff 503510.Mar 8 2023, 2:48 PM

ABataev mentioned this in D145732: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function..Mar 9 2023, 2:20 PM

hans mentioned this in rG3b3a4c270bcb: Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather….Mar 10 2023, 5:40 AM

ABataev mentioned this in rG93a9be0cea0a: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 10 2023, 1:22 PM

ABataev mentioned this in rGf3a68ac10c84: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector….Mar 13 2023, 6:27 AM

Rebase

RKSimon added inline comments.Mar 13 2023, 2:27 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6870	Any chance that we can use ShuffleVectorInst::isIdentityMask ?
7362	auto *
7364	auto *

ABataev added inline comments.Mar 13 2023, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6870	Sure, will do it later
7364	Both these cases are the existing code, just the diff is not quite correct because of the big differences.

Restore accidentally removed lines, address comments

Harbormaster completed remote builds in B219182: Diff 504861.Mar 13 2023, 5:18 PM

Rebase

Restore some deleted code

Harbormaster completed remote builds in B219617: Diff 505467.Mar 15 2023, 7:08 AM

ABataev mentioned this in D146167: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars..Mar 15 2023, 2:14 PM

ABataev mentioned this in rG0ad87ffdcc23: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining….Mar 17 2023, 11:21 AM

Rebase

Harbormaster completed remote builds in B220124: Diff 506162.Mar 17 2023, 12:55 PM

ABataev mentioned this in D146564: [SLP]Find reused scalars in buildvector sequences, if any..Mar 21 2023, 2:11 PM

ABataev mentioned this in rG40105a993399: [SLP]Find reused scalars in buildvector sequences, if any..Apr 5 2023, 9:39 AM

Rebase

Harbormaster completed remote builds in B224057: Diff 511474.Apr 6 2023, 11:37 AM

Rebase

Harbormaster completed remote builds in B224133: Diff 511560.Apr 6 2023, 5:26 PM

Rebase

Harbormaster completed remote builds in B224875: Diff 512589.Apr 11 2023, 3:26 PM

ABataev mentioned this in D148174: [SLP]Introduce gather cost estimation function..Apr 12 2023, 2:36 PM

ABataev mentioned this in rGf82eb7e066f3: [SLP]Introduce gather cost estimation function..Apr 13 2023, 10:19 AM

Rebase

Harbormaster completed remote builds in B225410: Diff 513316.Apr 13 2023, 12:33 PM

ABataev mentioned this in D148279: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions..Apr 13 2023, 4:42 PM

ABataev mentioned this in rGcd341f3f4878: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 5:55 AM

ABataev mentioned this in rG1ce4b26a21a0: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 11:54 AM

Rebase

Harbormaster completed remote builds in B227770: Diff 516462.Apr 24 2023, 11:19 AM

dtemirbulatov added a reviewer: vporpo.Apr 27 2023, 5:39 PM

Temp rebase, requires some extra work.

Harbormaster completed remote builds in B230224: Diff 519833.May 5 2023, 7:04 AM

Rebase

Herald added a subscriber: wangpc. · View Herald TranscriptNov 9 2023, 2:20 PM

Harbormaster completed remote builds in B258052: Diff 558067.Nov 9 2023, 6:17 PM

Rebase

Harbormaster completed remote builds in B258083: Diff 558113.Nov 16 2023, 10:49 AM

LGTM.

This revision is now accepted and ready to land.Thu, Nov 30, 7:34 AM

LGTM.

Rebase

Harbormaster completed remote builds in B258147: Diff 558197.Thu, Nov 30, 11:35 AM

Closed by commit rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph. (authored by ABataev). · Explain WhyFri, Dec 1, 11:26 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph..

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

In D110978#4657889, @vporpo wrote:

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

Ping @ABataev ! This is blocking our internal release at Google!

dtemirbulatov added a subscriber: dtemirbulatov.Tue, Dec 12, 1:54 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

424 lines

test/

DebugInfo/

Generic/

assignment-tracking/

slp-vectorizer/

merge-scalars.ll

7 lines

Transforms/

SLPVectorizer/

AArch64/

extractelements-to-shuffle.ll

36 lines

transpose-inseltpoison.ll

28 lines

transpose.ll

28 lines

tsc-s116.ll

23 lines

X86/

commutativity.ll

29 lines

pr47623.ll

27 lines

reused-extractelements.ll

23 lines

Diff 519833

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
static std::optional<unsigned> getInsertIndex(const Value *InsertInst,		static std::optional<unsigned> getInsertIndex(const Value *InsertInst,
unsigned Offset = 0) {		unsigned Offset = 0) {
int Index = Offset;		int Index = Offset;
if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {		if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {
const auto *VT = dyn_cast<FixedVectorType>(IE->getType());		const auto *VT = dyn_cast<FixedVectorType>(IE->getType());
if (!VT)		if (!VT)
return std::nullopt;		return std::nullopt;
const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2));		const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
if (!CI)		if (!CI)
		RKSimonUnsubmitted Not Done Reply Inline Actions Is it worth merging the isa<> and cast<> into a dyn_cast<>? RKSimon: Is it worth merging the isa<> and cast<> into a dyn_cast<>?
return std::nullopt;		return std::nullopt;
if (CI->getValue().uge(VT->getNumElements()))		if (CI->getValue().uge(VT->getNumElements()))
return std::nullopt;		return std::nullopt;
Index *= VT->getNumElements();		Index *= VT->getNumElements();
Index += CI->getZExtValue();		Index += CI->getZExtValue();
return Index;		return Index;
}		}

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
/// ret <4 x i8> %ins4		/// ret <4 x i8> %ins4
/// can be transformed into:		/// can be transformed into:
/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,		/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,
/// i32 6>		/// i32 6>
/// %2 = mul <4 x i8> %1, %1		/// %2 = mul <4 x i8> %1, %1
/// ret <4 x i8> %2		/// ret <4 x i8> %2
/// We convert this initially to something like:
/// %x0 = extractelement <4 x i8> %x, i32 0
/// %x3 = extractelement <4 x i8> %x, i32 3
/// %y1 = extractelement <4 x i8> %y, i32 1
/// %y2 = extractelement <4 x i8> %y, i32 2
/// %1 = insertelement <4 x i8> poison, i8 %x0, i32 0
/// %2 = insertelement <4 x i8> %1, i8 %x3, i32 1
/// %3 = insertelement <4 x i8> %2, i8 %y1, i32 2
/// %4 = insertelement <4 x i8> %3, i8 %y2, i32 3
/// %5 = mul <4 x i8> %4, %4
/// %6 = extractelement <4 x i8> %5, i32 0
/// %ins1 = insertelement <4 x i8> poison, i8 %6, i32 0
/// %7 = extractelement <4 x i8> %5, i32 1
/// %ins2 = insertelement <4 x i8> %ins1, i8 %7, i32 1
/// %8 = extractelement <4 x i8> %5, i32 2
/// %ins3 = insertelement <4 x i8> %ins2, i8 %8, i32 2
/// %9 = extractelement <4 x i8> %5, i32 3
/// %ins4 = insertelement <4 x i8> %ins3, i8 %9, i32 3
/// ret <4 x i8> %ins4
/// InstCombiner transforms this into a shuffle and vector mul
/// Mask will return the Shuffle Mask equivalent to the extracted elements.		/// Mask will return the Shuffle Mask equivalent to the extracted elements.
/// TODO: Can we split off and reuse the shuffle mask detection from		/// TODO: Can we split off and reuse the shuffle mask detection from
/// ShuffleVectorInst/getShuffleCost?		/// ShuffleVectorInst/getShuffleCost?
static std::optional<TargetTransformInfo::ShuffleKind>		static std::optional<TargetTransformInfo::ShuffleKind>
isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {		isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });		find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });
if (It == VL.end())		if (It == VL.end())
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	if (V2 && PairMax < VectorOpToIdx[V1].size() + VectorOpToIdx[V2].size() +
PairVec = std::make_pair(V1, V2);		PairVec = std::make_pair(V1, V2);
}		}
}		}
if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)		if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)
return std::nullopt;		return std::nullopt;
// Check if better to perform a shuffle of 2 vectors or just of a single		// Check if better to perform a shuffle of 2 vectors or just of a single
// vector.		// vector.
SmallVector<Value *> SavedVL(VL.begin(), VL.end());		SmallVector<Value *> SavedVL(VL.begin(), VL.end());
SmallVector<Value *> GatheredExtracts(		SmallVector<Value *> GatheredExtracts(
		RKSimonUnsubmitted Not Done Reply Inline Actions return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block? RKSimon: return None instead to make it obvious it failed? Maybe do this as an early out instead of the…
VL.size(), PoisonValue::get(VL.front()->getType()));		VL.size(), PoisonValue::get(VL.front()->getType()));
if (SingleMax >= PairMax && SingleMax) {		if (SingleMax >= PairMax && SingleMax) {
for (int Idx : VectorOpToIdx[SingleVec])		for (int Idx : VectorOpToIdx[SingleVec])
std::swap(GatheredExtracts[Idx], VL[Idx]);		std::swap(GatheredExtracts[Idx], VL[Idx]);
} else {		} else {
for (Value *V : {PairVec.first, PairVec.second})		for (Value *V : {PairVec.first, PairVec.second})
for (int Idx : VectorOpToIdx[V])		for (int Idx : VectorOpToIdx[V])
std::swap(GatheredExtracts[Idx], VL[Idx]);		std::swap(GatheredExtracts[Idx], VL[Idx]);
▲ Show 20 Lines • Show All 1,828 Lines • ▼ Show 20 Lines	private:

/// Vectorize a single entry in the tree, the \p Idx-th operand of the entry		/// Vectorize a single entry in the tree, the \p Idx-th operand of the entry
/// \p E.		/// \p E.
Value vectorizeOperand(TreeEntry E, unsigned NodeIdx);		Value vectorizeOperand(TreeEntry E, unsigned NodeIdx);

/// Create a new vector from a list of scalar values. Produces a sequence		/// Create a new vector from a list of scalar values. Produces a sequence
/// which exploits values reused across lanes, and arranges the inserts		/// which exploits values reused across lanes, and arranges the inserts
/// for ease of later optimization.		/// for ease of later optimization.
		template <typename BVTy, typename ResTy, typename... Args>
		ResTy processBuildVector(const TreeEntry *E, Args &...Params);

		/// Create a new vector from a list of scalar values. Produces a sequence
		/// which exploits values reused across lanes, and arranges the inserts
		/// for ease of later optimization.
Value createBuildVector(const TreeEntry E);		Value createBuildVector(const TreeEntry E);

/// Returns the instruction in the bundle, which can be used as a base point		/// Returns the instruction in the bundle, which can be used as a base point
/// for scheduling. Usually it is the last instruction in the bundle, except		/// for scheduling. Usually it is the last instruction in the bundle, except
/// for the case when all operands are external (in this case, it is the first		/// for the case when all operands are external (in this case, it is the first
/// instruction in the list).		/// instruction in the list).
Instruction &getLastInstructionInBundle(const TreeEntry *E);		Instruction &getLastInstructionInBundle(const TreeEntry *E);

▲ Show 20 Lines • Show All 3,939 Lines • ▼ Show 20 Lines	TTI::OperandValueInfo BoUpSLP::getOperandInfo(ArrayRef<Value *> VL,
else if (IsConstant)		else if (IsConstant)
VK = TTI::OK_NonUniformConstantValue;		VK = TTI::OK_NonUniformConstantValue;
else if (IsUniform)		else if (IsUniform)
VK = TTI::OK_UniformValue;		VK = TTI::OK_UniformValue;

TTI::OperandValueProperties VP = TTI::OP_None;		TTI::OperandValueProperties VP = TTI::OP_None;
VP = IsPowerOfTwo ? TTI::OP_PowerOf2 : VP;		VP = IsPowerOfTwo ? TTI::OP_PowerOf2 : VP;
VP = IsNegatedPowerOfTwo ? TTI::OP_NegatedPowerOf2 : VP;		VP = IsNegatedPowerOfTwo ? TTI::OP_NegatedPowerOf2 : VP;

return {VK, VP};		return {VK, VP};
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
}		}

namespace {		namespace {
/// The base class for shuffle instruction emission and shuffle cost estimation.		/// The base class for shuffle instruction emission and shuffle cost estimation.
class BaseShuffleAnalysis {		class BaseShuffleAnalysis {
protected:		protected:
/// Checks if the mask is an identity mask.		/// Checks if the mask is an identity mask.
/// \param IsStrict if is true the function returns false if mask size does		/// \param IsStrict if is true the function returns false if mask size does
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
dyn_cast<FixedVectorType>(SV->getOperand(0)->getType()))		dyn_cast<FixedVectorType>(SV->getOperand(0)->getType()))
LocalVF = SVOpTy->getNumElements();		LocalVF = SVOpTy->getNumElements();
SmallVector<int> ExtMask(Mask.size(), PoisonMaskElem);		SmallVector<int> ExtMask(Mask.size(), PoisonMaskElem);
for (auto [Idx, I] : enumerate(Mask)) {		for (auto [Idx, I] : enumerate(Mask)) {
if (I == PoisonMaskElem \|\|		if (I == PoisonMaskElem \|\|
static_cast<unsigned>(I) >= SV->getShuffleMask().size())		static_cast<unsigned>(I) >= SV->getShuffleMask().size())
continue;		continue;
ExtMask[Idx] = SV->getMaskValue(I);		ExtMask[Idx] = SV->getMaskValue(I);
}		}
bool IsOp1Undef =		bool IsOp1Undef =
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
isUndefVector(SV->getOperand(0),		isUndefVector(SV->getOperand(0),
buildUseMask(LocalVF, ExtMask, UseMask::FirstArg))		buildUseMask(LocalVF, ExtMask, UseMask::FirstArg))
.all();		.all();
bool IsOp2Undef =		bool IsOp2Undef =
isUndefVector(SV->getOperand(1),		isUndefVector(SV->getOperand(1),
buildUseMask(LocalVF, ExtMask, UseMask::SecondArg))		buildUseMask(LocalVF, ExtMask, UseMask::SecondArg))
.all();		.all();
if (!IsOp1Undef && !IsOp2Undef) {		if (!IsOp1Undef && !IsOp2Undef) {
Show All 12 Lines	while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
combineMasks(LocalVF, ShuffleMask, Mask);		combineMasks(LocalVF, ShuffleMask, Mask);
Mask.swap(ShuffleMask);		Mask.swap(ShuffleMask);
if (IsOp2Undef)		if (IsOp2Undef)
Op = SV->getOperand(0);		Op = SV->getOperand(0);
else		else
Op = SV->getOperand(1);		Op = SV->getOperand(1);
}		}
if (auto *OpTy = dyn_cast<FixedVectorType>(Op->getType());		if (auto *OpTy = dyn_cast<FixedVectorType>(Op->getType());
!OpTy \|\| !isIdentityMask(Mask, OpTy, SinglePermute) \|\|		!OpTy \|\| !isIdentityMask(Mask, OpTy, SinglePermute) \|\|
		RKSimonUnsubmitted Not Done Reply Inline Actions What targets are we still missing support for? RKSimon: What targets are we still missing support for?
		ABataevAuthorUnsubmitted Done Reply Inline Actions AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts. ABataev: AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.
ShuffleVectorInst::isZeroEltSplatMask(Mask)) {		ShuffleVectorInst::isZeroEltSplatMask(Mask)) {
if (IdentityOp) {		if (IdentityOp) {
V = IdentityOp;		V = IdentityOp;
assert(Mask.size() == IdentityMask.size() &&		assert(Mask.size() == IdentityMask.size() &&
"Expected masks of same sizes.");		"Expected masks of same sizes.");
// Clear known poison elements.		// Clear known poison elements.
for (auto [I, Idx] : enumerate(Mask))		for (auto [I, Idx] : enumerate(Mask))
if (Idx == PoisonMaskElem)		if (Idx == PoisonMaskElem)
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
/// when the actual shuffle instruction is generated only if this is actually		/// when the actual shuffle instruction is generated only if this is actually
/// required. Otherwise, the shuffle instruction emission is delayed till the		/// required. Otherwise, the shuffle instruction emission is delayed till the
/// end of the process, to reduce the number of emitted instructions and further		/// end of the process, to reduce the number of emitted instructions and further
/// analysis/transformations.		/// analysis/transformations.
class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {		class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
bool IsFinalized = false;		bool IsFinalized = false;
SmallVector<int> CommonMask;		SmallVector<int> CommonMask;
SmallVector<PointerUnion<Value , const TreeEntry > , 2> InVectors;		SmallVector<PointerUnion<Value , const TreeEntry > , 2> InVectors;
		std::optional<const TreeEntry *> SingleEntry;
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
InstructionCost Cost = 0;		InstructionCost Cost = 0;
ArrayRef<Value *> VectorizedVals;		ArrayRef<Value *> VectorizedVals;
BoUpSLP &R;		BoUpSLP &R;
SmallPtrSetImpl<Value *> &CheckedExtracts;		SmallPtrSetImpl<Value *> &CheckedExtracts;
constexpr static TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		constexpr static TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

InstructionCost getBuildVectorCost(ArrayRef<Value > VL, Value Root) {		InstructionCost getBuildVectorCost(ArrayRef<Value > VL, Value Root) {
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	InstructionCost getBuildVectorCost(ArrayRef<Value > VL, Value Root) {
return GatherCost +		return GatherCost +
(all_of(Gathers, UndefValue::classof)		(all_of(Gathers, UndefValue::classof)
? TTI::TCC_Free		? TTI::TCC_Free
: R.getGatherCost(Gathers, !Root && VL.equals(Gathers)));		: R.getGatherCost(Gathers, !Root && VL.equals(Gathers)));
};		};

/// Compute the cost of creating a vector of type \p VecTy containing the		/// Compute the cost of creating a vector of type \p VecTy containing the
/// extracted values from \p VL.		/// extracted values from \p VL.
InstructionCost computeExtractCost(ArrayRef<Value *> VL, ArrayRef<int> Mask,		InstructionCost computeExtractCost(ArrayRef<Value *> VL, ArrayRef<int> Mask) {
TTI::ShuffleKind ShuffleKind) {
auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());		auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());
unsigned NumOfParts = TTI.getNumberOfParts(VecTy);		unsigned NumOfParts = TTI.getNumberOfParts(VecTy);

if (ShuffleKind != TargetTransformInfo::SK_PermuteSingleSrc \|\|		constexpr TTI::ShuffleKind ShuffleKind = TTI::SK_PermuteSingleSrc;
!NumOfParts \|\| VecTy->getNumElements() < NumOfParts)		if (!NumOfParts \|\| VecTy->getNumElements() < NumOfParts)
return TTI.getShuffleCost(ShuffleKind, VecTy, Mask);		return TTI.getShuffleCost(ShuffleKind, VecTy, Mask);

bool AllConsecutive = true;		bool AllConsecutive = true;
unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;		unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;
unsigned Idx = -1;		unsigned Idx = -1;
InstructionCost Cost = 0;		InstructionCost Cost = 0;

// Process extracts in blocks of EltsPerVector to check if the source vector		// Process extracts in blocks of EltsPerVector to check if the source vector
// operand can be re-used directly. If not, add the cost of creating a		// operand can be re-used directly. If not, add the cost of creating a
// shuffle to extract the values into a vector register.		// shuffle to extract the values into a vector register.
SmallVector<int> RegMask(EltsPerVector, PoisonMaskElem);		SmallVector<int> RegMask(EltsPerVector, PoisonMaskElem);
for (auto *V : VL) {		for (auto *V : VL) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Any chance that we can use ShuffleVectorInst::isIdentityMask ? RKSimon: Any chance that we can use ShuffleVectorInst::isIdentityMask ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, will do it later ABataev: Sure, will do it later
++Idx;		++Idx;

// Reached the start of a new vector registers.		// Reached the start of a new vector registers.
if (Idx % EltsPerVector == 0) {		if (Idx % EltsPerVector == 0) {
RegMask.assign(EltsPerVector, PoisonMaskElem);		RegMask.assign(EltsPerVector, PoisonMaskElem);
AllConsecutive = true;		AllConsecutive = true;
continue;		continue;
}		}
Show All 18 Lines	for (auto *V : VL) {
// Skip all indices, except for the last index per vector block.		// Skip all indices, except for the last index per vector block.
if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())		if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())
continue;		continue;

// If we have a series of extracts which are not consecutive and hence		// If we have a series of extracts which are not consecutive and hence
// cannot re-use the source vector register directly, compute the shuffle		// cannot re-use the source vector register directly, compute the shuffle
// cost to extract the vector with EltsPerVector elements.		// cost to extract the vector with EltsPerVector elements.
Cost += TTI.getShuffleCost(		Cost += TTI.getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc,		ShuffleKind,
FixedVectorType::get(VecTy->getElementType(), EltsPerVector),		FixedVectorType::get(VecTy->getElementType(), EltsPerVector),
RegMask);		RegMask);
}		}
return Cost;		return Cost;
}		}

class ShuffleCostBuilder {		class ShuffleCostBuilder {
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (!V2 && !P2.isNull()) {
VF = CommonVF;		VF = CommonVF;
V2 = Constant::getNullValue(		V2 = Constant::getNullValue(
FixedVectorType::get(E->Scalars.front()->getType(), VF));		FixedVectorType::get(E->Scalars.front()->getType(), VF));
}		}
return BaseShuffleAnalysis::createShuffle<InstructionCost>(V1, V2, Mask,		return BaseShuffleAnalysis::createShuffle<InstructionCost>(V1, V2, Mask,
Builder);		Builder);
}		}

		unsigned
		getNumElements(const PointerUnion<Value , const TreeEntry > &P) const {
		if (const auto E = P.get<const TreeEntry >())
		return (E->State != TreeEntry::NeedToGather &&
		E->Scalars.size() == CommonMask.size())
		? E->Scalars.size()
		: E->getVectorFactor();
		return cast<FixedVectorType>(P.get<Value *>()->getType())->getNumElements();
		}

public:		public:
ShuffleCostEstimator(TargetTransformInfo &TTI,		ShuffleCostEstimator(TargetTransformInfo &TTI,
ArrayRef<Value *> VectorizedVals, BoUpSLP &R,		ArrayRef<Value *> VectorizedVals, BoUpSLP &R,
SmallPtrSetImpl<Value *> &CheckedExtracts)		SmallPtrSetImpl<Value *> &CheckedExtracts)
: TTI(TTI), VectorizedVals(VectorizedVals), R(R),		: TTI(TTI), VectorizedVals(VectorizedVals), R(R),
CheckedExtracts(CheckedExtracts) {}		CheckedExtracts(CheckedExtracts) {}
Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask,		Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask,
TTI::ShuffleKind ShuffleKind) {		TTI::ShuffleKind ShuffleKind) {
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	for (const auto &Data : ExtractVectorsTys) {
Cost += TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,		Cost += TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, std::nullopt, CostKind, Idx, SubVT);		EEVTy, std::nullopt, CostKind, Idx, SubVT);
}		}
} else {		} else {
Cost += TTI.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,		Cost += TTI.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,
VecTy, std::nullopt, CostKind, 0, EEVTy);		VecTy, std::nullopt, CostKind, 0, EEVTy);
}		}
}		}
// Check that gather of extractelements can be represented as just a
// shuffle of a single/two vectors the scalars are extracted from.
// Found the bunch of extractelement instructions that must be gathered
// into a vector and can be represented as a permutation elements in a
// single input vector or of 2 input vectors.
Cost += computeExtractCost(VL, Mask, ShuffleKind);
InVectors.assign(1, E);		InVectors.assign(1, E);
		if (ShuffleKind == TTI::SK_PermuteSingleSrc)
		SingleEntry = E;
return VecBase;		return VecBase;
}		}
		std::optional<InstructionCost>
		needToDelay(const TreeEntry , ArrayRef<const TreeEntry >) const {
		// No need to delay the cost estimation during analysis.
		return std::nullopt;
		}
		InstructionCost getSameNode(const TreeEntry *E) { return TTI::TCC_Free; }
void add(const TreeEntry E1, const TreeEntry E2, ArrayRef<int> Mask) {		void add(const TreeEntry E1, const TreeEntry E2, ArrayRef<int> Mask) {
CommonMask.assign(Mask.begin(), Mask.end());		// Use zeroinitializer instead of actual vector value here, since they are
InVectors.assign({E1, E2});		// not ready yet.
		add(Constant::getNullValue(FixedVectorType::get(
		E1->Scalars.front()->getType(), E1->getVectorFactor())),
		Constant::getNullValue(FixedVectorType::get(
		E2->Scalars.front()->getType(), E2->getVectorFactor())),
		Mask);
}		}
void add(const TreeEntry *E1, ArrayRef<int> Mask) {		void add(const TreeEntry *E1, ArrayRef<int> Mask) {
		// Use zeroinitializer instead of actual vector value here, since they are
		// not ready yet.
		add(Constant::getNullValue(FixedVectorType::get(
		E1->Scalars.front()->getType(), E1->getVectorFactor())),
		Mask);
		}
		/// Adds 2 input vectors and the mask for their shuffling.
		void add(Value V1, Value V2, ArrayRef<int> Mask) {
		assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");
		SingleEntry = nullptr;
		if (InVectors.empty()) {
		InVectors.push_back(V1);
		InVectors.push_back(V2);
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
InVectors.assign(1, E1);		return;
		}
		const PointerUnion<Value , const TreeEntry > &Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Cost += createShuffle(Vec, InVectors.back(), CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = Idx;
		} else if (getNumElements(Vec) != Mask.size()) {
		Cost += createShuffle(Vec, nullptr, CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = Idx;
}		}
void gather(ArrayRef<Value > VL, Value Root = nullptr) {		Cost += createShuffle(V1, V2, Mask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = Idx + Sz;
		InVectors.front() = Vec;
		if (InVectors.size() == 2)
		InVectors.back() = V1;
		else
		InVectors.push_back(V1);
		}
		/// Adds another one input vector and the mask for the shuffling.
		void add(Value *V1, ArrayRef<int> Mask) {
		if (InVectors.empty()) {
		if (!isa<FixedVectorType>(V1->getType())) {
		Cost += createShuffle(V1, nullptr, CommonMask);
		CommonMask.assign(Mask.size(), PoisonMaskElem);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = Idx;
		}
		InVectors.push_back(V1);
		CommonMask.assign(Mask.begin(), Mask.end());
		return;
		}
		SingleEntry = nullptr;
		const auto *It = find_if(
		InVectors, [=](const PointerUnion<Value , const TreeEntry > &V) {
		return V.get<Value *>() == V1;
		});
		if (It == InVectors.end()) {
		if (InVectors.size() == 2 \|\| !isa<FixedVectorType>(V1->getType()) \|\|
		getNumElements(InVectors.front()) !=
		cast<FixedVectorType>(V1->getType())->getNumElements()) {
		const PointerUnion<Value , const TreeEntry > &V = InVectors.front();
		if (InVectors.size() == 2) {
		Cost +=
		createShuffle(InVectors.front(), InVectors.back(), CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = Idx;
		} else if (getNumElements(V) != CommonMask.size()) {
		Cost += createShuffle(InVectors.front(), nullptr, CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = Idx;
		}
		unsigned VVF = getNumElements(V);
		unsigned V1VF = getNumElements(V1);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] == PoisonMaskElem && Mask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = VVF != V1VF ? Idx + Sz : Mask[Idx] + VVF;
		if (VVF != V1VF)
		Cost += createShuffle(V1, nullptr, Mask);
		InVectors.front() = V;
		if (InVectors.size() == 2)
		InVectors.back() = V1;
		else
		InVectors.push_back(V1);
		return;
		}
		// Check if second vector is required if the used elements are already
		// used from the first one.
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != PoisonMaskElem && CommonMask[Idx] == PoisonMaskElem) {
		InVectors.push_back(V1);
		break;
		}
		}
		int VF = CommonMask.size();
		if (auto *FTy = dyn_cast<FixedVectorType>(V1->getType()))
		VF = FTy->getNumElements();
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != PoisonMaskElem && CommonMask[Idx] == PoisonMaskElem)
		CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);
		}
		/// Adds another one input vector and the mask for the shuffling.
		void addOrdered(Value *V1, ArrayRef<unsigned> Order) {
		SmallVector<int, 4> NewMask;
		inversePermutation(Order, NewMask);
		add(V1, NewMask);
		SingleEntry = nullptr;
		}
		Value gather(ArrayRef<Value > VL, Value *Root = nullptr) {
Cost += getBuildVectorCost(VL, Root);		Cost += getBuildVectorCost(VL, Root);
if (!Root) {		if (!Root) {
assert(InVectors.empty() && "Unexpected input vectors for buildvector.");		SmallVector<Constant *> Vals;
// FIXME: Need to find a way to avoid use of getNullValue here.		for (Value *V : VL) {
InVectors.assign(1, Constant::getNullValue(FixedVectorType::get(		if (isa<UndefValue>(V)) {
VL.front()->getType(), VL.size())));		Vals.push_back(cast<Constant>(V));
		continue;
		}
		Vals.push_back(Constant::getNullValue(V->getType()));
}		}
		return ConstantVector::get(Vals);
}		}
		return ConstantVector::getSplat(
		ElementCount::getFixed(VL.size()),
		Constant::getNullValue(VL.front()->getType()));
		}
		InstructionCost createFreeze(InstructionCost Cost) { return Cost; }
/// Finalize emission of the shuffles.		/// Finalize emission of the shuffles.
InstructionCost finalize(ArrayRef<int> ExtMask) {		InstructionCost
		finalize(ArrayRef<int> ExtMask, unsigned VF = 0,
		function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {
IsFinalized = true;		IsFinalized = true;
		if (Action) {
		const PointerUnion<Value , const TreeEntry > &Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Cost += createShuffle(Vec, InVectors.back(), CommonMask);
		InVectors.pop_back();
		} else {
		Cost += createShuffle(Vec, nullptr, CommonMask);
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != PoisonMaskElem)
		CommonMask[Idx] = Idx;
		assert(VF > 0 &&
		"Expected vector length for the final value before action.");
		Value V = Vec.get<Value >();
		if (!Vec.isNull() && !V)
		V = Constant::getNullValue(FixedVectorType::get(
		Vec.get<const TreeEntry *>()->Scalars.front()->getType(),
		CommonMask.size()));
		Action(V, CommonMask);
		SingleEntry = nullptr;
		}
::addMask(CommonMask, ExtMask, /ExtendingManyInputs=/true);		::addMask(CommonMask, ExtMask, /ExtendingManyInputs=/true);
if (CommonMask.empty())		if (CommonMask.empty()) {
		assert(InVectors.size() == 1 && "Expected only one vector with no mask");
return Cost;		return Cost;
		}
		if (InVectors.size() == 2)
		return Cost +
		createShuffle(InVectors.front(), InVectors.back(), CommonMask);
int Limit = CommonMask.size() * 2;		int Limit = CommonMask.size() * 2;
if (all_of(CommonMask, [=](int Idx) { return Idx < Limit; }) &&		if (all_of(CommonMask, [=](int Idx) { return Idx < Limit; }) &&
ShuffleVectorInst::isIdentityMask(CommonMask))		ShuffleVectorInst::isIdentityMask(CommonMask))
return Cost;		return Cost;
		if (const TreeEntry *TE = SingleEntry.value_or(nullptr);
		TE && CommonMask.size() == TE->Scalars.size()) {
		// Check that gather of extractelements can be represented as just a
		// shuffle of a single/two vectors the scalars are extracted from.
		// Found the bunch of extractelement instructions that must be gathered
		// into a vector and can be represented as a permutation elements in a
		// single input vector or of 2 input vectors.
		return Cost + computeExtractCost(TE->Scalars, CommonMask);
		}
return Cost +		return Cost +
createShuffle(InVectors.front(),		createShuffle(InVectors.front(),
InVectors.size() == 2 ? InVectors.back() : nullptr,		InVectors.size() == 2 ? InVectors.back() : nullptr,
CommonMask);		CommonMask);
}		}

~ShuffleCostEstimator() {		~ShuffleCostEstimator() {
assert((IsFinalized \|\| CommonMask.empty()) &&		assert((IsFinalized \|\| CommonMask.empty()) &&
Show All 25 Lines	BoUpSLP::getEntryCost(const TreeEntry E, ArrayRef<Value > VectorizedVals,
auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);		auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))		if (allConstant(VL))
return 0;		return 0;
if (isa<InsertElementInst>(VL[0]))		if (isa<InsertElementInst>(VL[0]))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();
ShuffleCostEstimator Estimator(TTI, VectorizedVals, this,		return processBuildVector<ShuffleCostEstimator, InstructionCost>(
CheckedExtracts);		E, TTI, VectorizedVals, this, CheckedExtracts);
unsigned VF = E->getVectorFactor();
SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),
E->ReuseShuffleIndices.end());
SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());
// Build a mask out of the reorder indices and reorder scalars per this
// mask.
SmallVector<int> ReorderMask;
inversePermutation(E->ReorderIndices, ReorderMask);
if (!ReorderMask.empty())
reorderScalars(GatheredScalars, ReorderMask);
SmallVector<int> Mask;
SmallVector<int> ExtractMask;
std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;
std::optional<TargetTransformInfo::ShuffleKind> GatherShuffle;
SmallVector<const TreeEntry *> Entries;
Type *ScalarTy = GatheredScalars.front()->getType();
// Check for gathered extracts.
ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);
SmallVector<Value *> IgnoredVals;
if (UserIgnoreList)
IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());

bool Resized = false;
if (Value *VecBase = Estimator.adjustExtracts(
E, ExtractMask, ExtractShuffle.value_or(TTI::SK_PermuteTwoSrc)))
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {
Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));
}

// Do not try to look for reshuffled loads for gathered loads (they will be
// handled later), for vectorized scalars, and cases, which are definitely
// not profitable (splats and small gather nodes.)
if (ExtractShuffle \|\| E->getOpcode() != Instruction::Load \|\|
E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2))
GatherShuffle = isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);
if (GatherShuffle) {
assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");
if (*GatherShuffle == TTI::SK_PermuteSingleSrc &&
Entries.front()->isSame(E->Scalars)) {
// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.
LLVM_DEBUG(
dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "
<< *VL.front() << ".\n");
return 0;
}
if (!Resized) {
unsigned VF1 = Entries.front()->getVectorFactor();
unsigned VF2 = Entries.back()->getVectorFactor();
if ((VF == VF1 \|\| VF == VF2) && GatheredScalars.size() != VF)
GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));
}
// Remove shuffled elements from list of gathers.
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
if (Mask[I] != PoisonMaskElem)
GatheredScalars[I] = PoisonValue::get(ScalarTy);
}
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
<< " entries for bundle that starts with "
<< *VL.front() << ".\n";);
if (Entries.size() == 1)
Estimator.add(Entries.front(), Mask);
else
Estimator.add(Entries.front(), Entries.back(), Mask);
Estimator.gather(
GatheredScalars,
Constant::getNullValue(FixedVectorType::get(
GatheredScalars.front()->getType(), GatheredScalars.size())));
return Estimator.finalize(E->ReuseShuffleIndices);
}
Estimator.gather(
GatheredScalars,
VL.equals(GatheredScalars)
? nullptr
: Constant::getNullValue(FixedVectorType::get(
GatheredScalars.front()->getType(), GatheredScalars.size())));
return Estimator.finalize(E->ReuseShuffleIndices);
}		}
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!E->ReorderIndices.empty()) {		if (!E->ReorderIndices.empty()) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
if (E->getOpcode() == Instruction::Store) {		if (E->getOpcode() == Instruction::Store) {
// For stores the order is actually a mask.		// For stores the order is actually a mask.
NewMask.resize(E->ReorderIndices.size());		NewMask.resize(E->ReorderIndices.size());
copy(E->ReorderIndices, NewMask.begin());		copy(E->ReorderIndices, NewMask.begin());
} else {		} else {
inversePermutation(E->ReorderIndices, NewMask);		inversePermutation(E->ReorderIndices, NewMask);
}		}
::addMask(Mask, NewMask);		::addMask(Mask, NewMask);
}		}
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
::addMask(Mask, E->ReuseShuffleIndices);		::addMask(Mask, E->ReuseShuffleIndices);
if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask))		if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask))
CommonCost =		CommonCost =
TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
assert(E->getOpcode() &&		assert(E->getOpcode() &&
((allSameType(VL) && allSameBlock(VL)) \|\|		((allSameType(VL) && allSameBlock(VL)) \|\|
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
(E->getOpcode() == Instruction::GetElementPtr &&		(E->getOpcode() == Instruction::GetElementPtr &&
E->getMainOp()->getType()->isPointerTy())) &&		E->getMainOp()->getType()->isPointerTy())) &&
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
		ABataevAuthorUnsubmitted Done Reply Inline Actions Both these cases are the existing code, just the diff is not quite correct because of the big differences. ABataev: Both these cases are the existing code, just the diff is not quite correct because of the big…
"Invalid VL");		"Invalid VL");
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
const unsigned Sz = VL.size();		const unsigned Sz = VL.size();
auto GetCostDiff =		auto GetCostDiff =
[=](function_ref<InstructionCost(unsigned)> ScalarEltCost,		[=](function_ref<InstructionCost(unsigned)> ScalarEltCost,
function_ref<InstructionCost(InstructionCost)> VectorCost) {		function_ref<InstructionCost(InstructionCost)> VectorCost) {
▲ Show 20 Lines • Show All 2,000 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
CommonMask[Idx] = Idx;		CommonMask[Idx] = Idx;
}		}

public:		public:
ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)		ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)
: Builder(Builder), R(R) {}		: Builder(Builder), R(R) {}

/// Adjusts extractelements after reusing them.		/// Adjusts extractelements after reusing them.
Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask) {		Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask,
		TTI::ShuffleKind) {
Value *VecBase = nullptr;		Value *VecBase = nullptr;
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {		for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
int Idx = Mask[I];		int Idx = Mask[I];
if (Idx == PoisonMaskElem)		if (Idx == PoisonMaskElem)
continue;		continue;
auto *EI = cast<ExtractElementInst>(E->Scalars[I]);		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
VecBase = EI->getVectorOperand();		VecBase = EI->getVectorOperand();
// If the only one use is vectorized - can delete the extractelement		// If the only one use is vectorized - can delete the extractelement
// itself.		// itself.
if (!EI->hasOneUse() \|\| any_of(EI->users(), [&](User *U) {		if (!EI->hasOneUse() \|\| any_of(EI->users(), [&](User *U) {
return !R.ScalarToTreeEntry.count(U);		return !R.ScalarToTreeEntry.count(U);
}))		}))
continue;		continue;
R.eraseInstruction(EI);		R.eraseInstruction(EI);
}		}
return VecBase;		return VecBase;
}		}
/// Checks if the specified entry \p E needs to be delayed because of its		/// Checks if the specified entry \p E needs to be delayed because of its
/// dependency nodes.		/// dependency nodes.
Value needToDelay(const TreeEntry E, ArrayRef<const TreeEntry *> Deps) {		std::optional<Value > needToDelay(const TreeEntry E,
		ArrayRef<const TreeEntry *> Deps) const {
// No need to delay emission if all deps are ready.		// No need to delay emission if all deps are ready.
if (all_of(Deps, [](const TreeEntry *TE) { return TE->VectorizedValue; }))		if (all_of(Deps, [](const TreeEntry *TE) { return TE->VectorizedValue; }))
return nullptr;		return std::nullopt;
// Postpone gather emission, will be emitted after the end of the		// Postpone gather emission, will be emitted after the end of the
// process to keep correct order.		// process to keep correct order.
auto *VecTy = FixedVectorType::get(E->Scalars.front()->getType(),		auto *VecTy = FixedVectorType::get(E->Scalars.front()->getType(),
E->getVectorFactor());		E->getVectorFactor());
Value *Vec = Builder.CreateAlignedLoad(		Value *Vec = Builder.CreateAlignedLoad(
VecTy, PoisonValue::get(VecTy->getPointerTo()), MaybeAlign());		VecTy, PoisonValue::get(VecTy->getPointerTo()), MaybeAlign());
return Vec;		return Vec;
}		}
		Value getSameNode(const TreeEntry E) { return E->VectorizedValue; }
		void add(const TreeEntry E1, const TreeEntry E2, ArrayRef<int> Mask) {
		add(E1->VectorizedValue, E2->VectorizedValue, Mask);
		}
		void add(const TreeEntry *E1, ArrayRef<int> Mask) {
		add(E1->VectorizedValue, Mask);
		}
/// Adds 2 input vectors and the mask for their shuffling.		/// Adds 2 input vectors and the mask for their shuffling.
void add(Value V1, Value V2, ArrayRef<int> Mask) {		void add(Value V1, Value V2, ArrayRef<int> Mask) {
assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");		assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");
if (InVectors.empty()) {		if (InVectors.empty()) {
InVectors.push_back(V1);		InVectors.push_back(V1);
InVectors.push_back(V2);		InVectors.push_back(V2);
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
return;		return;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);		CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);
}		}
/// Adds another one input vector and the mask for the shuffling.		/// Adds another one input vector and the mask for the shuffling.
void addOrdered(Value *V1, ArrayRef<unsigned> Order) {		void addOrdered(Value *V1, ArrayRef<unsigned> Order) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
inversePermutation(Order, NewMask);		inversePermutation(Order, NewMask);
add(V1, NewMask);		add(V1, NewMask);
}		}
		Value gather(ArrayRef<Value > VL, Value *Root = nullptr) {
		return R.gather(VL, Root);
		}
		Value createFreeze(Value V) { return Builder.CreateFreeze(V); }
/// Finalize emission of the shuffles.		/// Finalize emission of the shuffles.
/// \param Action the action (if any) to be performed before final applying of		/// \param Action the action (if any) to be performed before final applying of
/// the \p ExtMask mask.		/// the \p ExtMask mask.
Value *		Value *
finalize(ArrayRef<int> ExtMask, unsigned VF = 0,		finalize(ArrayRef<int> ExtMask, unsigned VF = 0,
function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {		function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {
IsFinalized = true;		IsFinalized = true;
if (Action) {		if (Action) {
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	if (E->getOpcode() != Instruction::InsertElement &&
E->getOpcode() != Instruction::PHI) {		E->getOpcode() != Instruction::PHI) {
Instruction *LastInst = EntryToLastInstruction.lookup(E);		Instruction *LastInst = EntryToLastInstruction.lookup(E);
assert(LastInst && "Failed to find last instruction in bundle");		assert(LastInst && "Failed to find last instruction in bundle");
Builder.SetInsertPoint(LastInst);		Builder.SetInsertPoint(LastInst);
}		}
return vectorizeTree(I->get());		return vectorizeTree(I->get());
}		}

Value BoUpSLP::createBuildVector(const TreeEntry E) {		template <typename BVTy, typename ResTy, typename... Args>
		ResTy BoUpSLP::processBuildVector(const TreeEntry *E, Args &...Params) {
assert(E->State == TreeEntry::NeedToGather && "Expected gather node.");		assert(E->State == TreeEntry::NeedToGather && "Expected gather node.");
unsigned VF = E->getVectorFactor();		unsigned VF = E->getVectorFactor();

bool NeedFreeze = false;		bool NeedFreeze = false;
SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),		SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),
E->ReuseShuffleIndices.end());		E->ReuseShuffleIndices.end());
SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());		SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());
// Build a mask out of the reorder indices and reorder scalars per this		// Build a mask out of the reorder indices and reorder scalars per this
Show All 24 Lines	auto FindReusedSplat = [&](SmallVectorImpl<int> &Mask) {
int Sz = Mask.size();		int Sz = Mask.size();
if (all_of(Mask, [Sz](int Idx) { return Idx < 2 * Sz; }) &&		if (all_of(Mask, [Sz](int Idx) { return Idx < 2 * Sz; }) &&
ShuffleVectorInst::isIdentityMask(Mask))		ShuffleVectorInst::isIdentityMask(Mask))
std::iota(Mask.begin(), Mask.end(), 0);		std::iota(Mask.begin(), Mask.end(), 0);
else		else
std::fill(Mask.begin(), Mask.end(), I);		std::fill(Mask.begin(), Mask.end(), I);
return true;		return true;
};		};
ShuffleInstructionBuilder ShuffleBuilder(Builder, *this);		BVTy ShuffleBuilder(Params...);
Value *Vec = nullptr;		ResTy Res = ResTy();
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<int> ExtractMask;		SmallVector<int> ExtractMask;
std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;		std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;
std::optional<TargetTransformInfo::ShuffleKind> GatherShuffle;		std::optional<TargetTransformInfo::ShuffleKind> GatherShuffle;
SmallVector<const TreeEntry *> Entries;		SmallVector<const TreeEntry *> Entries;
Type *ScalarTy = GatheredScalars.front()->getType();		Type *ScalarTy = GatheredScalars.front()->getType();
if (!all_of(GatheredScalars, UndefValue::classof)) {		if (!all_of(GatheredScalars, UndefValue::classof)) {
// Check for gathered extracts.		// Check for gathered extracts.
ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);		ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);
SmallVector<Value *> IgnoredVals;		SmallVector<Value *> IgnoredVals;
if (UserIgnoreList)		if (UserIgnoreList)
IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());		IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());
bool Resized = false;		bool Resized = false;
if (Value *VecBase = ShuffleBuilder.adjustExtracts(E, ExtractMask))		if (Value *VecBase = ShuffleBuilder.adjustExtracts(
		E, ExtractMask, ExtractShuffle.value_or(TTI::SK_PermuteTwoSrc)))
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))		if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {		if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {
Resized = true;		Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
}		}
// Gather extracts after we check for full matched gathers only.		// Gather extracts after we check for full matched gathers only.
if (ExtractShuffle \|\| E->getOpcode() != Instruction::Load \|\|		if (ExtractShuffle \|\| E->getOpcode() != Instruction::Load \|\|
E->isAltShuffle() \|\|		E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|		all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|		isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {		(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {
GatherShuffle = isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);		GatherShuffle = isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);
}		}
if (GatherShuffle) {		if (GatherShuffle) {
if (Value *Delayed = ShuffleBuilder.needToDelay(E, Entries)) {		if (std::optional<ResTy> Delayed =
		ShuffleBuilder.needToDelay(E, Entries)) {
// Delay emission of gathers which are not ready yet.		// Delay emission of gathers which are not ready yet.
PostponedGathers.insert(E);		PostponedGathers.insert(E);
// Postpone gather emission, will be emitted after the end of the		// Postpone gather emission, will be emitted after the end of the
// process to keep correct order.		// process to keep correct order.
return Delayed;		return *Delayed;
}		}
assert((Entries.size() == 1 \|\| Entries.size() == 2) &&		assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");		"Expected shuffle of 1 or 2 entries.");
if (*GatherShuffle == TTI::SK_PermuteSingleSrc &&		if (*GatherShuffle == TTI::SK_PermuteSingleSrc &&
Entries.front()->isSame(E->Scalars)) {		Entries.front()->isSame(E->Scalars)) {
// Perfect match in the graph, will reuse the previously vectorized		// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.		// node. Cost is 0.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "		<< "SLP: perfect diamond match for gather bundle that starts with "
<< *E->Scalars.front() << ".\n");		<< *E->Scalars.front() << ".\n");
// Restore the mask for previous partially matched values.		Res = ShuffleBuilder.getSameNode(Entries.front());
for (auto [I, V] : enumerate(E->Scalars)) {		return Res;
if (isa<PoisonValue>(V)) {
Mask[I] = PoisonMaskElem;
continue;
}
if (Mask[I] == PoisonMaskElem)
Mask[I] = Entries.front()->findLaneForValue(V);
}
ShuffleBuilder.add(Entries.front()->VectorizedValue, Mask);
Vec = ShuffleBuilder.finalize(E->ReuseShuffleIndices);
return Vec;
}		}
if (!Resized) {		if (!Resized) {
unsigned VF1 = Entries.front()->getVectorFactor();		unsigned VF1 = Entries.front()->getVectorFactor();
unsigned VF2 = Entries.back()->getVectorFactor();		unsigned VF2 = Entries.back()->getVectorFactor();
if ((VF == VF1 \|\| VF == VF2) && GatheredScalars.size() != VF)		if ((VF == VF1 \|\| VF == VF2) && GatheredScalars.size() != VF)
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
}		}
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	if (ExtractShuffle) {
ShuffleBuilder.add(PoisonValue::get(FixedVectorType::get(		ShuffleBuilder.add(PoisonValue::get(FixedVectorType::get(
ScalarTy, GatheredScalars.size())),		ScalarTy, GatheredScalars.size())),
ExtractMask);		ExtractMask);
}		}
}		}
if (GatherShuffle) {		if (GatherShuffle) {
if (Entries.size() == 1) {		if (Entries.size() == 1) {
IsUsedInExpr = FindReusedSplat(Mask);		IsUsedInExpr = FindReusedSplat(Mask);
ShuffleBuilder.add(Entries.front()->VectorizedValue, Mask);		ShuffleBuilder.add(Entries.front(), Mask);
		if (Entries.front()->VectorizedValue)
IsNonPoisoned &=		IsNonPoisoned &=
isGuaranteedNotToBePoison(Entries.front()->VectorizedValue);		isGuaranteedNotToBePoison(Entries.front()->VectorizedValue);
} else {		} else {
ShuffleBuilder.add(Entries.front()->VectorizedValue,		ShuffleBuilder.add(Entries.front(), Entries.back(), Mask);
Entries.back()->VectorizedValue, Mask);		if (Entries.front()->VectorizedValue && Entries.back()->VectorizedValue)
IsNonPoisoned &=		IsNonPoisoned &=
isGuaranteedNotToBePoison(Entries.front()->VectorizedValue) &&		isGuaranteedNotToBePoison(Entries.front()->VectorizedValue) &&
isGuaranteedNotToBePoison(Entries.back()->VectorizedValue);		isGuaranteedNotToBePoison(Entries.back()->VectorizedValue);
}		}
}		}
// Try to figure out best way to combine values: build a shuffle and insert		// Try to figure out best way to combine values: build a shuffle and insert
// elements or just build several shuffles.		// elements or just build several shuffles.
// Insert non-constant scalars.		// Insert non-constant scalars.
SmallVector<Value *> NonConstants(GatheredScalars);		SmallVector<Value *> NonConstants(GatheredScalars);
int EMSz = ExtractMask.size();		int EMSz = ExtractMask.size();
int MSz = Mask.size();		int MSz = Mask.size();
Show All 33 Lines	for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
NonConstants[I] = PoisonValue::get(ScalarTy);		NonConstants[I] = PoisonValue::get(ScalarTy);
else		else
GatheredScalars[I] = PoisonValue::get(ScalarTy);		GatheredScalars[I] = PoisonValue::get(ScalarTy);
}		}
// Generate constants for final shuffle and build a mask for them.		// Generate constants for final shuffle and build a mask for them.
if (!all_of(GatheredScalars, PoisonValue::classof)) {		if (!all_of(GatheredScalars, PoisonValue::classof)) {
SmallVector<int> BVMask(GatheredScalars.size(), PoisonMaskElem);		SmallVector<int> BVMask(GatheredScalars.size(), PoisonMaskElem);
TryPackScalars(GatheredScalars, BVMask, /IsRootPoison=/true);		TryPackScalars(GatheredScalars, BVMask, /IsRootPoison=/true);
Value *BV = gather(GatheredScalars);		Value *BV = ShuffleBuilder.gather(GatheredScalars);
ShuffleBuilder.add(BV, BVMask);		ShuffleBuilder.add(BV, BVMask);
}		}
if (all_of(NonConstants, [=](Value *V) {		if (all_of(NonConstants, [=](Value *V) {
return isa<PoisonValue>(V) \|\|		return isa<PoisonValue>(V) \|\|
(IsSingleShuffle && ((IsIdentityShuffle &&		(IsSingleShuffle && ((IsIdentityShuffle &&
IsNonPoisoned) \|\| IsUsedInExpr) && isa<UndefValue>(V));		IsNonPoisoned) \|\| IsUsedInExpr) && isa<UndefValue>(V));
}))		}))
Vec = ShuffleBuilder.finalize(E->ReuseShuffleIndices);		Res = ShuffleBuilder.finalize(E->ReuseShuffleIndices);
else		else
Vec = ShuffleBuilder.finalize(		Res = ShuffleBuilder.finalize(
E->ReuseShuffleIndices, E->Scalars.size(),		E->ReuseShuffleIndices, E->Scalars.size(),
[&](Value *&Vec, SmallVectorImpl<int> &Mask) {		[&](Value *&Vec, SmallVectorImpl<int> &Mask) {
TryPackScalars(NonConstants, Mask, /IsRootPoison=/false);		TryPackScalars(NonConstants, Mask, /IsRootPoison=/false);
Vec = gather(NonConstants, Vec);		Vec = ShuffleBuilder.gather(NonConstants, Vec);
});		});
} else if (!allConstant(GatheredScalars)) {		} else if (!allConstant(GatheredScalars)) {
// Gather unique scalars and all constants.		// Gather unique scalars and all constants.
SmallVector<int> ReuseMask(GatheredScalars.size(), PoisonMaskElem);		SmallVector<int> ReuseMask(GatheredScalars.size(), PoisonMaskElem);
TryPackScalars(GatheredScalars, ReuseMask, /IsRootPoison=/true);		TryPackScalars(GatheredScalars, ReuseMask, /IsRootPoison=/true);
Vec = gather(GatheredScalars);		Value *BV = ShuffleBuilder.gather(GatheredScalars);
ShuffleBuilder.add(Vec, ReuseMask);		ShuffleBuilder.add(BV, ReuseMask);
Vec = ShuffleBuilder.finalize(E->ReuseShuffleIndices);		Res = ShuffleBuilder.finalize(E->ReuseShuffleIndices);
} else {		} else {
// Gather all constants.		// Gather all constants.
SmallVector<int> Mask(E->Scalars.size(), PoisonMaskElem);		SmallVector<int> Mask(E->Scalars.size(), PoisonMaskElem);
for (auto [I, V] : enumerate(E->Scalars)) {		for (auto [I, V] : enumerate(E->Scalars)) {
if (!isa<PoisonValue>(V))		if (!isa<PoisonValue>(V))
Mask[I] = I;		Mask[I] = I;
}		}
Vec = gather(E->Scalars);		Value *BV = ShuffleBuilder.gather(E->Scalars);
ShuffleBuilder.add(Vec, Mask);		ShuffleBuilder.add(BV, Mask);
Vec = ShuffleBuilder.finalize(E->ReuseShuffleIndices);		Res = ShuffleBuilder.finalize(E->ReuseShuffleIndices);
}		}

if (NeedFreeze)		if (NeedFreeze)
Vec = Builder.CreateFreeze(Vec);		Res = ShuffleBuilder.createFreeze(Res);
return Vec;		return Res;
		}

		Value BoUpSLP::createBuildVector(const TreeEntry E) {
		return processBuildVector<ShuffleInstructionBuilder, Value *>(E, Builder,
		*this);
}		}

Value BoUpSLP::vectorizeTree(TreeEntry E) {		Value BoUpSLP::vectorizeTree(TreeEntry E) {
IRBuilder<>::InsertPointGuard Guard(Builder);		IRBuilder<>::InsertPointGuard Guard(Builder);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
Show All 29 Lines	Value BoUpSLP::vectorizeTree(TreeEntry E) {
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();
if (auto *Store = dyn_cast<StoreInst>(VL0))		if (auto *Store = dyn_cast<StoreInst>(VL0))
ScalarTy = Store->getValueOperand()->getType();		ScalarTy = Store->getValueOperand()->getType();
else if (auto *IE = dyn_cast<InsertElementInst>(VL0))		else if (auto *IE = dyn_cast<InsertElementInst>(VL0))
ScalarTy = IE->getOperand(1)->getType();		ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());		auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
assert((E->ReorderIndices.empty() \|\|		assert((E->ReorderIndices.empty() \|\|
		nlopesUnsubmitted Not Done Reply Inline Actions Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you! nlopes: Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, thanks! ABataev: Sure, thanks!
E != VectorizableTree.front().get() \|\|		E != VectorizableTree.front().get() \|\|
!E->UserTreeIndices.empty()) &&		!E->UserTreeIndices.empty()) &&
"PHI reordering is free.");		"PHI reordering is free.");
auto *PH = cast<PHINode>(VL0);		auto *PH = cast<PHINode>(VL0);
Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());		Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());		PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());
Value *V = NewPhi;		Value *V = NewPhi;
▲ Show 20 Lines • Show All 4,911 Lines • Show Last 20 Lines

llvm/test/DebugInfo/Generic/assignment-tracking/slp-vectorizer/merge-scalars.ll

	Show All 17 Lines
	;;			;;
	;; Generated by grabbingthe IR before SLP in:			;; Generated by grabbingthe IR before SLP in:
	;; $ clang++ -O2 -g test.cpp -Xclang -fexperimental-assignment-tracking			;; $ clang++ -O2 -g test.cpp -Xclang -fexperimental-assignment-tracking

	;; Test that dbg.assigns linked to the the scalar stores to quad get linked to			;; Test that dbg.assigns linked to the the scalar stores to quad get linked to
	;; the vector store that replaces them.			;; the vector store that replaces them.

	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR:[0-9]+]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32), metadata ![[ID:[0-9]+]], metadata ptr %arrayidx, metadata !DIExpression())			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR:[0-9]+]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32), metadata ![[ID:[0-9]+]], metadata ptr %arrayidx, metadata !DIExpression())
				; CHECK: store <2 x float> {{.*}} !DIAssignID ![[ID]]
	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 4))			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 4))
	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 8))			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata ![[ID1:[0-9]+]], metadata ptr %arrayidx7, metadata !DIExpression())
	; CHECK: store <4 x float> {{.*}} !DIAssignID ![[ID]]			; CHECK: store <2 x float> {{.*}} !DIAssignID ![[ID1]]
	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 96, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 12))			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 96, 32), metadata ![[ID1]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 12))

	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	define dso_local void @_Z3funffff(float %k1, float %k2, float %k3, float %k4) local_unnamed_addr #0 !dbg !7 {			define dso_local void @_Z3funffff(float %k1, float %k2, float %k3, float %k4) local_unnamed_addr #0 !dbg !7 {
	entry:			entry:
	%quad = alloca [4 x float], align 16, !DIAssignID !27			%quad = alloca [4 x float], align 16, !DIAssignID !27
	call void @llvm.dbg.assign(metadata i1 undef, metadata !16, metadata !DIExpression(), metadata !27, metadata ptr %quad, metadata !DIExpression()), !dbg !23			call void @llvm.dbg.assign(metadata i1 undef, metadata !16, metadata !DIExpression(), metadata !27, metadata ptr %quad, metadata !DIExpression()), !dbg !23
	call void @llvm.dbg.assign(metadata float %k1, metadata !12, metadata !DIExpression(), metadata !30, metadata ptr undef, metadata !DIExpression()), !dbg !23			call void @llvm.dbg.assign(metadata float %k1, metadata !12, metadata !DIExpression(), metadata !30, metadata ptr undef, metadata !DIExpression()), !dbg !23
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=aarch64 -aarch64-insert-extract-base-cost=3 \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=aarch64 -aarch64-insert-extract-base-cost=3 \| FileCheck %s

	define void @test(<2 x i64> %0, <2 x i64> %1, <2 x i64> %2) {			define void @test(<2 x i64> %0, <2 x i64> %1, <2 x i64> %2) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i64> [[TMP1:%.]], i64 0			; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i64> [[TMP2:%.]], i64 0
	; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP4]], 0			; CHECK-NEXT: [[TMP5:%.]] = shufflevector <2 x i64> [[TMP1:%.]], <2 x i64> [[TMP0:%.*]], <4 x i32> <i32 0, i32 2, i32 undef, i32 2>
	; CHECK-NEXT: [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i64> [[TMP5]], i64 [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP7:%.]] = extractelement <2 x i64> [[TMP0:%.]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 0			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i64> [[TMP7]], <4 x i64> <i64 0, i64 0, i64 poison, i64 0>, <4 x i32> <i32 4, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP9:%.*]] = trunc i64 [[TMP8]] to i32			; CHECK-NEXT: [[TMP9:%.*]] = or <4 x i64> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = extractelement <2 x i64> [[TMP2:%.]], i64 0			; CHECK-NEXT: [[TMP10:%.*]] = trunc <4 x i64> [[TMP9]] to <4 x i32>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP2]], i64 1			; CHECK-NEXT: br label [[TMP11:%.*]]
	; CHECK-NEXT: [[TMP12:%.*]] = or i64 [[TMP10]], [[TMP11]]			; CHECK: 11:
	; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[TMP12]] to i32			; CHECK-NEXT: [[TMP12:%.]] = phi <4 x i32> [ [[TMP16:%.]], [[TMP11]] ], [ [[TMP10]], [[TMP3:%.*]] ]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP0]], i64 0			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP12]], <4 x i32> <i32 poison, i32 0, i32 0, i32 0>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP15:%.*]] = or i64 [[TMP14]], 0			; CHECK-NEXT: [[TMP14:%.*]] = or <4 x i32> zeroinitializer, [[TMP13]]
	; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32			; CHECK-NEXT: [[TMP15:%.*]] = add <4 x i32> zeroinitializer, [[TMP13]]
	; CHECK-NEXT: br label [[TMP17:%.*]]			; CHECK-NEXT: [[TMP16]] = shufflevector <4 x i32> [[TMP14]], <4 x i32> [[TMP15]], <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK: 17:			; CHECK-NEXT: br label [[TMP11]]
	; CHECK-NEXT: [[TMP18:%.]] = phi i32 [ [[TMP22:%.]], [[TMP17]] ], [ [[TMP6]], [[TMP3:%.*]] ]
	; CHECK-NEXT: [[TMP19:%.*]] = phi i32 [ 0, [[TMP17]] ], [ [[TMP9]], [[TMP3]] ]
	; CHECK-NEXT: [[TMP20:%.*]] = phi i32 [ 0, [[TMP17]] ], [ [[TMP13]], [[TMP3]] ]
	; CHECK-NEXT: [[TMP21:%.*]] = phi i32 [ 0, [[TMP17]] ], [ [[TMP16]], [[TMP3]] ]
	; CHECK-NEXT: [[TMP22]] = or i32 [[TMP18]], 0
	; CHECK-NEXT: br label [[TMP17]]
	;			;
	%4 = extractelement <2 x i64> %1, i64 0			%4 = extractelement <2 x i64> %1, i64 0
	%5 = or i64 %4, 0			%5 = or i64 %4, 0
	%6 = trunc i64 %5 to i32			%6 = trunc i64 %5 to i32
	%7 = extractelement <2 x i64> %0, i64 0			%7 = extractelement <2 x i64> %0, i64 0
	%8 = or i64 %7, 0			%8 = or i64 %7, 0
	%9 = trunc i64 %8 to i32			%9 = trunc i64 %8 to i32
	%10 = extractelement <2 x i64> %2, i64 0			%10 = extractelement <2 x i64> %2, i64 0
	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

Show All 24 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
%tmp3.0 = insertelement <2 x i64> poison, i64 %tmp2.0, i32 0		%tmp3.0 = insertelement <2 x i64> poison, i64 %tmp2.0, i32 0
%tmp3.1 = insertelement <2 x i64> %tmp3.0, i64 %tmp2.1, i32 1		%tmp3.1 = insertelement <2 x i64> %tmp3.0, i64 %tmp2.1, i32 1
ret <2 x i64> %tmp3.1		ret <2 x i64> %tmp3.1
}		}

define void @store_chain_v2i64(ptr %a, ptr %b, ptr %c) {		define void @store_chain_v2i64(ptr %a, ptr %b, ptr %c) {
; CHECK-LABEL: @store_chain_v2i64(		; CHECK-LABEL: @store_chain_v2i64(
; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, ptr [[A:%.]], align 8		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, ptr [[A:%.]], align 8
; CHECK-NEXT: [[TMP4:%.]] = load <2 x i64>, ptr [[B:%.]], align 8		; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, ptr [[B:%.]], align 8
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i64> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP7]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP6]], [[TMP5]]
; CHECK-NEXT: store <2 x i64> [[TMP9]], ptr [[C:%.*]], align 8		; CHECK-NEXT: store <2 x i64> [[TMP7]], ptr [[C:%.*]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a.1 = getelementptr i64, ptr %a, i64 1		%a.1 = getelementptr i64, ptr %a, i64 1
%b.1 = getelementptr i64, ptr %b, i64 1		%b.1 = getelementptr i64, ptr %b, i64 1
%c.1 = getelementptr i64, ptr %c, i64 1		%c.1 = getelementptr i64, ptr %c, i64 1
%v0.0 = load i64, ptr %a, align 8		%v0.0 = load i64, ptr %a, align 8
%v0.1 = load i64, ptr %a.1, align 8		%v0.1 = load i64, ptr %a.1, align 8
%v1.0 = load i64, ptr %b, align 8		%v1.0 = load i64, ptr %b, align 8
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]		; CHECK-NEXT: ret <4 x i32> [[TMP6]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[SHUFFLE]], [[TMP7]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP7]], [[TMP8]]
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

Show All 24 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
%tmp3.0 = insertelement <2 x i64> undef, i64 %tmp2.0, i32 0		%tmp3.0 = insertelement <2 x i64> undef, i64 %tmp2.0, i32 0
%tmp3.1 = insertelement <2 x i64> %tmp3.0, i64 %tmp2.1, i32 1		%tmp3.1 = insertelement <2 x i64> %tmp3.0, i64 %tmp2.1, i32 1
ret <2 x i64> %tmp3.1		ret <2 x i64> %tmp3.1
}		}

define void @store_chain_v2i64(ptr %a, ptr %b, ptr %c) {		define void @store_chain_v2i64(ptr %a, ptr %b, ptr %c) {
; CHECK-LABEL: @store_chain_v2i64(		; CHECK-LABEL: @store_chain_v2i64(
; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, ptr [[A:%.]], align 8		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, ptr [[A:%.]], align 8
; CHECK-NEXT: [[TMP4:%.]] = load <2 x i64>, ptr [[B:%.]], align 8		; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, ptr [[B:%.]], align 8
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i64> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP7]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP6]], [[TMP5]]
; CHECK-NEXT: store <2 x i64> [[TMP9]], ptr [[C:%.*]], align 8		; CHECK-NEXT: store <2 x i64> [[TMP7]], ptr [[C:%.*]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a.1 = getelementptr i64, ptr %a, i64 1		%a.1 = getelementptr i64, ptr %a, i64 1
%b.1 = getelementptr i64, ptr %b, i64 1		%b.1 = getelementptr i64, ptr %b, i64 1
%c.1 = getelementptr i64, ptr %c, i64 1		%c.1 = getelementptr i64, ptr %c, i64 1
%v0.0 = load i64, ptr %a, align 8		%v0.0 = load i64, ptr %a, align 8
%v0.1 = load i64, ptr %a.1, align 8		%v0.1 = load i64, ptr %a.1, align 8
%v1.0 = load i64, ptr %b, align 8		%v1.0 = load i64, ptr %b, align 8
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]		; CHECK-NEXT: ret <4 x i32> [[TMP6]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[SHUFFLE]], [[TMP7]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP7]], [[TMP8]]
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

	Show All 12 Lines
	; These operands are coming from 4 loads which are not			; These operands are coming from 4 loads which are not
	; contiguous. The score estimation needs to be corrected, so that these 4 loads			; contiguous. The score estimation needs to be corrected, so that these 4 loads
	; are not selected for vectorization. Instead we should vectorize with			; are not selected for vectorization. Instead we should vectorize with
	; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.			; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.

	define void @s116_modified(ptr %a) {			define void @s116_modified(ptr %a) {
	; CHECK-LABEL: @s116_modified(			; CHECK-LABEL: @s116_modified(
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 1
				; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 2
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds float, ptr [[A]], i64 3			; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds float, ptr [[A]], i64 3
	; CHECK-NEXT: [[LD0:%.*]] = load float, ptr [[A]], align 4			; CHECK-NEXT: [[LD0:%.*]] = load float, ptr [[A]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[GEP1]], align 4			; CHECK-NEXT: [[LD1:%.*]] = load float, ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = load <2 x float>, ptr [[GEP3]], align 4			; CHECK-NEXT: [[LD2:%.*]] = load float, ptr [[GEP2]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0			; CHECK-NEXT: [[MUL0:%.*]] = fmul fast float [[LD0]], [[LD1]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 poison>			; CHECK-NEXT: [[MUL1:%.*]] = fmul fast float [[LD2]], [[LD1]]
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 5, i32 poison, i32 poison>			; CHECK-NEXT: store float [[MUL0]], ptr [[A]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; CHECK-NEXT: store float [[MUL1]], ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP3]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 poison, i32 2, i32 4>			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[LD2]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP10]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 2, i32 3>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP1]], <2 x i32> <i32 0, i32 2>
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <4 x float> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: store <4 x float> [[TMP12]], ptr [[A]], align 4			; CHECK-NEXT: store <2 x float> [[TMP4]], ptr [[GEP2]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep1 = getelementptr inbounds float, ptr %a, i64 1			%gep1 = getelementptr inbounds float, ptr %a, i64 1
	%gep2 = getelementptr inbounds float, ptr %a, i64 2			%gep2 = getelementptr inbounds float, ptr %a, i64 2
	%gep3 = getelementptr inbounds float, ptr %a, i64 3			%gep3 = getelementptr inbounds float, ptr %a, i64 3
	%gep4 = getelementptr inbounds float, ptr %a, i64 4			%gep4 = getelementptr inbounds float, ptr %a, i64 4
	%ld0 = load float, ptr %a			%ld0 = load float, ptr %a
	%ld1 = load float, ptr %gep1			%ld1 = load float, ptr %gep1
	Show All 15 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show All 10 Lines
	@cle32 = external unnamed_addr global [32 x i32], align 16			@cle32 = external unnamed_addr global [32 x i32], align 16


	; Check that we correctly detect a splat/broadcast by leveraging the			; Check that we correctly detect a splat/broadcast by leveraging the
	; commutativity property of `xor`.			; commutativity property of `xor`.

	define void @splat(i8 %a, i8 %b, i8 %c) {			define void @splat(i8 %a, i8 %b, i8 %c) {
	; SSE-LABEL: @splat(			; SSE-LABEL: @splat(
	; SSE-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0			; SSE-NEXT: [[TMP1:%.]] = xor i8 [[C:%.]], [[A:%.*]]
	; SSE-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1			; SSE-NEXT: store i8 [[TMP1]], ptr @cle, align 16
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; SSE-NEXT: [[TMP2:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; SSE-NEXT: store i8 [[TMP2]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 1), align 1
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer			; SSE-NEXT: [[TMP3:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: [[TMP6:%.*]] = xor <16 x i8> [[TMP3]], [[TMP5]]			; SSE-NEXT: store i8 [[TMP3]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 2), align 1
	; SSE-NEXT: store <16 x i8> [[TMP6]], ptr @cle, align 16			; SSE-NEXT: [[TMP4:%.*]] = xor i8 [[A]], [[C]]
				; SSE-NEXT: store i8 [[TMP4]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 3), align 1
				; SSE-NEXT: [[TMP5:%.*]] = xor i8 [[C]], [[A]]
				; SSE-NEXT: store i8 [[TMP5]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 4), align 1
				; SSE-NEXT: [[TMP6:%.]] = xor i8 [[C]], [[B:%.]]
				; SSE-NEXT: store i8 [[TMP6]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 5), align 1
				; SSE-NEXT: [[TMP7:%.*]] = xor i8 [[C]], [[A]]
				; SSE-NEXT: store i8 [[TMP7]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 6), align 1
				; SSE-NEXT: [[TMP8:%.*]] = xor i8 [[C]], [[B]]
				; SSE-NEXT: store i8 [[TMP8]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 7), align 1
				; SSE-NEXT: [[TMP9:%.*]] = insertelement <8 x i8> poison, i8 [[A]], i32 0
				; SSE-NEXT: [[TMP10:%.*]] = shufflevector <8 x i8> [[TMP9]], <8 x i8> poison, <8 x i32> zeroinitializer
				; SSE-NEXT: [[TMP11:%.*]] = insertelement <8 x i8> poison, i8 [[C]], i32 0
				; SSE-NEXT: [[TMP12:%.*]] = shufflevector <8 x i8> [[TMP11]], <8 x i8> poison, <8 x i32> zeroinitializer
				; SSE-NEXT: [[TMP13:%.*]] = xor <8 x i8> [[TMP10]], [[TMP12]]
				; SSE-NEXT: store <8 x i8> [[TMP13]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 8), align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @splat(			; AVX-LABEL: @splat(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1			; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; AVX-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; AVX-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512


	@b = global [8 x i32] zeroinitializer, align 16			@b = global [8 x i32] zeroinitializer, align 16
	@a = global [8 x i32] zeroinitializer, align 16			@a = global [8 x i32] zeroinitializer, align 16

	define void @foo() {			define void @foo() {
	; SSE-LABEL: @foo(			; SSE-LABEL: @foo(
	; SSE-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16			; SSE-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
	; SSE-NEXT: store i32 [[TMP1]], ptr @a, align 16			; SSE-NEXT: store i32 [[TMP1]], ptr @a, align 16
	; SSE-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			; SSE-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4
	; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8			; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4
	; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16			; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4
	; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8			; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @foo(			; AVX-LABEL: @foo(
	; AVX-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16			; AVX-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
				; AVX-NEXT: store i32 [[TMP1]], ptr @a, align 16
	; AVX-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			; AVX-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0			; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1			; AVX-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4
	; AVX-NEXT: store <8 x i32> [[SHUFFLE]], ptr @a, align 16			; AVX-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16
				; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4
				; AVX-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8
				; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
				; AVX2-LABEL: @foo(
				; AVX2-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
				; AVX2-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
				; AVX2-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0
				; AVX2-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1
				; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
				; AVX2-NEXT: store <8 x i32> [[TMP5]], ptr @a, align 16
				; AVX2-NEXT: ret void
				;
	; AVX512-LABEL: @foo(			; AVX512-LABEL: @foo(
	; AVX512-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16			; AVX512-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
	; AVX512-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			; AVX512-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	; AVX512-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0			; AVX512-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0
	; AVX512-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1			; AVX512-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1
	; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; AVX512-NEXT: store <8 x i32> [[SHUFFLE]], ptr @a, align 16			; AVX512-NEXT: store <8 x i32> [[TMP5]], ptr @a, align 16
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%1 = load i32, ptr @b, align 16			%1 = load i32, ptr @b, align 16
	store i32 %1, ptr @a, align 16			store i32 %1, ptr @a, align 16
	%2 = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			%2 = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4
	store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8			store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4
	store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16			store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4
	store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8			store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/reused-extractelements.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -pass-remarks-output=%t \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	; YAML: --- !Passed			; YAML: --- !Missed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: NotBeneficial
	; YAML-NEXT: Function: g			; YAML-NEXT: Function: g
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'List vectorization was possible but not beneficial with cost '
	; YAML-NEXT: - Cost: '-1'			; YAML-NEXT: - Cost: '0'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' >= '
	; YAML-NEXT: - TreeSize: '4'			; YAML-NEXT: - Treshold: '0'

	define <2 x i32> @g(<2 x i32> %x, i32 %a, i32 %b) {			define <2 x i32> @g(<2 x i32> %x, i32 %a, i32 %b) {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[X:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 poison>			; CHECK-NEXT: [[X1:%.]] = extractelement <2 x i32> [[X:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[A:%.]], i32 1			; CHECK-NEXT: [[X1X1:%.*]] = mul i32 [[X1]], [[X1]]
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[B:%.]], i32 1			; CHECK-NEXT: [[AB:%.]] = mul i32 [[A:%.]], [[B:%.*]]
	; CHECK-NEXT: [[TMP4:%.*]] = mul <2 x i32> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[INS1:%.*]] = insertelement <2 x i32> poison, i32 [[X1X1]], i32 0
	; CHECK-NEXT: ret <2 x i32> [[TMP4]]			; CHECK-NEXT: [[INS2:%.*]] = insertelement <2 x i32> [[INS1]], i32 [[AB]], i32 1
				; CHECK-NEXT: ret <2 x i32> [[INS2]]
	;			;
	%x1 = extractelement <2 x i32> %x, i32 1			%x1 = extractelement <2 x i32> %x, i32 1
	%x1x1 = mul i32 %x1, %x1			%x1x1 = mul i32 %x1, %x1
	%ab = mul i32 %a, %b			%ab = mul i32 %a, %b
	%ins1 = insertelement <2 x i32> poison, i32 %x1x1, i32 0			%ins1 = insertelement <2 x i32> poison, i32 %x1x1, i32 0
	%ins2 = insertelement <2 x i32> %ins1, i32 %ab, i32 1			%ins2 = insertelement <2 x i32> %ins1, i32 %ab, i32 1
	ret <2 x i32> %ins2			ret <2 x i32> %ins2
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 519833

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/DebugInfo/Generic/assignment-tracking/slp-vectorizer/merge-scalars.ll

llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

llvm/test/Transforms/SLPVectorizer/X86/reused-extractelements.ll

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic