This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic

Authored by ABataev on Oct 1 2021, 4:10 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
anton-afanasyev
vporpo

Commits

rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph.

Summary

Currently we emit gathers for scalars being vectorized in the tre as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Oct 1 2021, 4:10 PM

Herald added subscribers: kerbowa, hiraditya, nhaehnle, jvesely. · View Herald TranscriptOct 1 2021, 4:10 PM

ABataev requested review of this revision.Oct 1 2021, 4:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2021, 4:10 PM

Harbormaster completed remote builds in B126755: Diff 376651.Oct 1 2021, 4:10 PM

Rebase

Harbormaster completed remote builds in B126915: Diff 377013.Oct 4 2021, 2:29 PM

RKSimon retitled this revision from [SLP]Improve gathering of the scals used in the graph. to [SLP]Improve gathering of the scalars used in the graph..Oct 5 2021, 6:35 AM

Rebase + bug fixes

Harbormaster completed remote builds in B133811: Diff 386648.Nov 11 2021, 2:47 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Rebase

Harbormaster completed remote builds in B135503: Diff 389033.Nov 22 2021, 7:36 PM

RKSimon added inline comments.Nov 29 2021, 9:13 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
304	Is it worth merging the isa<> and cast<> into a dyn_cast<>?
597	return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block?
6844	What targets are we still missing support for?

ABataev added inline comments.Nov 29 2021, 9:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6844	AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.

Rebase + address comments.

Harbormaster completed remote builds in B136480: Diff 390398.Nov 29 2021, 11:39 AM

Rebase

Harbormaster completed remote builds in B136694: Diff 390702.Nov 30 2021, 8:08 AM

Rebase

Harbormaster completed remote builds in B136747: Diff 390783.Nov 30 2021, 1:09 PM

Rebase

Harbormaster completed remote builds in B138215: Diff 392842.Dec 8 2021, 12:09 PM

Rebase

RKSimon added inline comments.Dec 14 2021, 8:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6808	Wshadow warning vs Idx @ Line 4688?
6841	Wshadow warning vs Idx @ Line 4688?

Address comments

Harbormaster completed remote builds in B139236: Diff 394269.Dec 14 2021, 9:48 AM

Rebase

Harbormaster completed remote builds in B141051: Diff 396715.Dec 30 2021, 2:15 PM

ABataev mentioned this in D123587: [SLP] Generate shuffles if we can reorder an existing node.Apr 12 2022, 12:05 PM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 7:51 AM

Herald added subscribers: • pcwang-thead, nlopes, kosarev. · View Herald Transcript

nlopes added inline comments.Aug 26 2022, 7:54 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9808	Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you!

ABataev added inline comments.Aug 26 2022, 8:08 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9808	Sure, thanks!

Address comments

Harbormaster completed remote builds in B183623: Diff 455933.Aug 26 2022, 10:50 AM

Rebase

Harbormaster completed remote builds in B186399: Diff 459790.Sep 13 2022, 11:19 AM

ABataev mentioned this in rG796af0c02728: [SLP] Move getInsertIndex function, NFC..Sep 14 2022, 6:24 AM

ABataev mentioned this in rGd647312e3f57: [SLP][NFC]Extract getLastInstructionInBundle function for better.Sep 14 2022, 8:44 AM

Rebase

Harbormaster completed remote builds in B192832: Diff 468668.Oct 18 2022, 1:42 PM

nhaehnle removed a subscriber: nhaehnle.Oct 19 2022, 2:00 AM

Large update.
Includes:

Unifies all shuffle builders and shuffle demission operands.
Generalizes emission and cost model estimation of the buildvectors/gathers.

Will be splitted into several smaller patches eventually.

Harbormaster completed remote builds in B201460: Diff 480583.Dec 6 2022, 9:34 PM

ABataev mentioned this in D139718: [SLP][NFC]Inital redesign of ShuffleInstructionBuilder, NFC..Dec 9 2022, 7:50 AM

ABataev mentioned this in rGecac8192dbf6: [SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC..Dec 13 2022, 9:54 AM

Rebase

Harbormaster completed remote builds in B202927: Diff 482594.Dec 13 2022, 1:17 PM

Restore accidentally removed code.

Harbormaster completed remote builds in B202945: Diff 482619.Dec 13 2022, 2:43 PM

Rebase

Harbormaster completed remote builds in B204383: Diff 484571.Dec 21 2022, 7:50 AM

ABataev mentioned this in D140499: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 21 2022, 1:54 PM

khchen added a subscriber: khchen.Dec 22 2022, 8:35 AM

ABataev mentioned this in rGac01ae71f0c4: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 28 2022, 6:11 AM

Rebase

Harbormaster completed remote builds in B206131: Diff 486895.Jan 6 2023, 10:07 AM

Rebase

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 9 2023, 9:43 AM

Harbormaster completed remote builds in B206577: Diff 487485.Jan 9 2023, 10:30 AM

ABataev mentioned this in D141512: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 11 2023, 8:33 AM

ABataev mentioned this in D141940: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Jan 17 2023, 8:01 AM

ABataev mentioned this in rG9bdcf8778a5c: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 19 2023, 1:50 PM

ABataev mentioned this in rG708eb1b96d9a: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Feb 20 2023, 6:16 AM

ABataev mentioned this in D144958: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Feb 28 2023, 5:21 AM

ABataev mentioned this in rGa611b3f3059e: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 7 2023, 12:47 PM

Rebase

Restore deleted code/update test

Harbormaster completed remote builds in B218206: Diff 503510.Mar 8 2023, 2:48 PM

ABataev mentioned this in D145732: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function..Mar 9 2023, 2:20 PM

hans mentioned this in rG3b3a4c270bcb: Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather….Mar 10 2023, 5:40 AM

ABataev mentioned this in rG93a9be0cea0a: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 10 2023, 1:22 PM

ABataev mentioned this in rGf3a68ac10c84: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector….Mar 13 2023, 6:27 AM

Rebase

RKSimon added inline comments.Mar 13 2023, 2:27 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6726	Any chance that we can use ShuffleVectorInst::isIdentityMask ?
7173	auto *
7175	auto *

ABataev added inline comments.Mar 13 2023, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6726	Sure, will do it later
7175	Both these cases are the existing code, just the diff is not quite correct because of the big differences.

Restore accidentally removed lines, address comments

Harbormaster completed remote builds in B219182: Diff 504861.Mar 13 2023, 5:18 PM

Rebase

Restore some deleted code

Harbormaster completed remote builds in B219617: Diff 505467.Mar 15 2023, 7:08 AM

ABataev mentioned this in D146167: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars..Mar 15 2023, 2:14 PM

ABataev mentioned this in rG0ad87ffdcc23: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining….Mar 17 2023, 11:21 AM

Rebase

Harbormaster completed remote builds in B220124: Diff 506162.Mar 17 2023, 12:55 PM

ABataev mentioned this in D146564: [SLP]Find reused scalars in buildvector sequences, if any..Mar 21 2023, 2:11 PM

ABataev mentioned this in rG40105a993399: [SLP]Find reused scalars in buildvector sequences, if any..Apr 5 2023, 9:39 AM

Rebase

Harbormaster completed remote builds in B224057: Diff 511474.Apr 6 2023, 11:37 AM

Rebase

Harbormaster completed remote builds in B224133: Diff 511560.Apr 6 2023, 5:26 PM

Rebase

Harbormaster completed remote builds in B224875: Diff 512589.Apr 11 2023, 3:26 PM

ABataev mentioned this in D148174: [SLP]Introduce gather cost estimation function..Apr 12 2023, 2:36 PM

ABataev mentioned this in rGf82eb7e066f3: [SLP]Introduce gather cost estimation function..Apr 13 2023, 10:19 AM

Rebase

Harbormaster completed remote builds in B225410: Diff 513316.Apr 13 2023, 12:33 PM

ABataev mentioned this in D148279: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions..Apr 13 2023, 4:42 PM

ABataev mentioned this in rGcd341f3f4878: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 5:55 AM

ABataev mentioned this in rG1ce4b26a21a0: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 11:54 AM

Rebase

Harbormaster completed remote builds in B227770: Diff 516462.Apr 24 2023, 11:19 AM

dtemirbulatov added a reviewer: vporpo.Apr 27 2023, 5:39 PM

Temp rebase, requires some extra work.

Harbormaster completed remote builds in B230224: Diff 519833.May 5 2023, 7:04 AM

Rebase

Herald added a subscriber: wangpc. · View Herald TranscriptNov 9 2023, 2:20 PM

Harbormaster completed remote builds in B258052: Diff 558067.Nov 9 2023, 6:17 PM

Rebase

Harbormaster completed remote builds in B258083: Diff 558113.Nov 16 2023, 10:49 AM

LGTM.

This revision is now accepted and ready to land.Thu, Nov 30, 7:34 AM

LGTM.

Rebase

Harbormaster completed remote builds in B258147: Diff 558197.Thu, Nov 30, 11:35 AM

Closed by commit rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph. (authored by ABataev). · Explain WhyFri, Dec 1, 11:26 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph..

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

In D110978#4657889, @vporpo wrote:

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

Ping @ABataev ! This is blocking our internal release at Google!

dtemirbulatov added a subscriber: dtemirbulatov.Tue, Dec 12, 1:54 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

1350 lines

test/

DebugInfo/

Generic/

assignment-tracking/

slp-vectorizer/

merge-scalars.ll

7 lines

Transforms/

SLPVectorizer/

AArch64/

extractelements-to-shuffle.ll

36 lines

loadorder.ll

634 lines

tsc-s116.ll

23 lines

vectorize-free-extracts-inserts.ll

9 lines

X86/

PR39774.ll

2 lines

alternate-int-inseltpoison.ll

20 lines

alternate-int.ll

20 lines

arith-fp-inseltpoison.ll

47 lines

arith-fp.ll

47 lines

buildvector-nodes-dependency.ll

2 lines

c-ray.ll

2 lines

commutativity.ll

38 lines

crash_clear_undefs.ll

28 lines

crash_exceed_scheduling.ll

2 lines

crash_lencod.ll

8 lines

crash_netbsd_decompress.ll

2 lines

crash_smallpt.ll

10 lines

cse.ll

12 lines

gather-extractelements-different-bbs.ll

17 lines

jumbled-load-multiuse.ll

9 lines

lookahead.ll

73 lines

matched-shuffled-entries.ll

16 lines

memory-runtime-checks.ll

12 lines

220 lines

69 lines

30 lines

27 lines

reduced-gathered-vectorized.ll

24 lines

reduction-logical.ll

20 lines

reduction2.ll

4 lines

redux-feed-buildvector.ll

20 lines

remark_extract_broadcast.ll

2 lines

reorder-clustered-node.ll

21 lines

reorder-reused-masked-gather.ll

2 lines

root-trunc-extract-reuse.ll

5 lines

scatter-vectorize-reorder.ll

14 lines

shrink_after_reorder.ll

13 lines

vect-gather-same-nodes.ll

19 lines

Diff 504816

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
static std::optional<unsigned> getInsertIndex(const Value *InsertInst,		static std::optional<unsigned> getInsertIndex(const Value *InsertInst,
unsigned Offset = 0) {		unsigned Offset = 0) {
int Index = Offset;		int Index = Offset;
if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {		if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {
const auto *VT = dyn_cast<FixedVectorType>(IE->getType());		const auto *VT = dyn_cast<FixedVectorType>(IE->getType());
if (!VT)		if (!VT)
return std::nullopt;		return std::nullopt;
const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2));		const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
if (!CI)		if (!CI)
		RKSimonUnsubmitted Not Done Reply Inline Actions Is it worth merging the isa<> and cast<> into a dyn_cast<>? RKSimon: Is it worth merging the isa<> and cast<> into a dyn_cast<>?
return std::nullopt;		return std::nullopt;
if (CI->getValue().uge(VT->getNumElements()))		if (CI->getValue().uge(VT->getNumElements()))
return std::nullopt;		return std::nullopt;
Index *= VT->getNumElements();		Index *= VT->getNumElements();
Index += CI->getZExtValue();		Index += CI->getZExtValue();
return Index;		return Index;
}		}

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
/// ret <4 x i8> %ins4		/// ret <4 x i8> %ins4
/// can be transformed into:		/// can be transformed into:
/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,		/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,
/// i32 6>		/// i32 6>
/// %2 = mul <4 x i8> %1, %1		/// %2 = mul <4 x i8> %1, %1
/// ret <4 x i8> %2		/// ret <4 x i8> %2
/// We convert this initially to something like:
/// %x0 = extractelement <4 x i8> %x, i32 0
/// %x3 = extractelement <4 x i8> %x, i32 3
/// %y1 = extractelement <4 x i8> %y, i32 1
/// %y2 = extractelement <4 x i8> %y, i32 2
/// %1 = insertelement <4 x i8> poison, i8 %x0, i32 0
/// %2 = insertelement <4 x i8> %1, i8 %x3, i32 1
/// %3 = insertelement <4 x i8> %2, i8 %y1, i32 2
/// %4 = insertelement <4 x i8> %3, i8 %y2, i32 3
/// %5 = mul <4 x i8> %4, %4
/// %6 = extractelement <4 x i8> %5, i32 0
/// %ins1 = insertelement <4 x i8> poison, i8 %6, i32 0
/// %7 = extractelement <4 x i8> %5, i32 1
/// %ins2 = insertelement <4 x i8> %ins1, i8 %7, i32 1
/// %8 = extractelement <4 x i8> %5, i32 2
/// %ins3 = insertelement <4 x i8> %ins2, i8 %8, i32 2
/// %9 = extractelement <4 x i8> %5, i32 3
/// %ins4 = insertelement <4 x i8> %ins3, i8 %9, i32 3
/// ret <4 x i8> %ins4
/// InstCombiner transforms this into a shuffle and vector mul
/// Mask will return the Shuffle Mask equivalent to the extracted elements.		/// Mask will return the Shuffle Mask equivalent to the extracted elements.
/// TODO: Can we split off and reuse the shuffle mask detection from		/// TODO: Can we split off and reuse the shuffle mask detection from
/// ShuffleVectorInst/getShuffleCost?		/// ShuffleVectorInst/getShuffleCost?
static std::optional<TargetTransformInfo::ShuffleKind>		static std::optional<TargetTransformInfo::ShuffleKind>
isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {		isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });		find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });
if (It == VL.end())		if (It == VL.end())
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	if (V2 && PairMax < VectorOpToIdx[V1].size() + VectorOpToIdx[V2].size() +
PairVec = std::make_pair(V1, V2);		PairVec = std::make_pair(V1, V2);
}		}
}		}
if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)		if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)
return std::nullopt;		return std::nullopt;
// Check if better to perform a shuffle of 2 vectors or just of a single		// Check if better to perform a shuffle of 2 vectors or just of a single
// vector.		// vector.
SmallVector<Value *> SavedVL(VL.begin(), VL.end());		SmallVector<Value *> SavedVL(VL.begin(), VL.end());
SmallVector<Value *> GatheredExtracts(		SmallVector<Value *> GatheredExtracts(
		RKSimonUnsubmitted Not Done Reply Inline Actions return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block? RKSimon: return None instead to make it obvious it failed? Maybe do this as an early out instead of the…
VL.size(), PoisonValue::get(VL.front()->getType()));		VL.size(), PoisonValue::get(VL.front()->getType()));
if (SingleMax >= PairMax && SingleMax) {		if (SingleMax >= PairMax && SingleMax) {
for (int Idx : VectorOpToIdx[SingleVec])		for (int Idx : VectorOpToIdx[SingleVec])
std::swap(GatheredExtracts[Idx], VL[Idx]);		std::swap(GatheredExtracts[Idx], VL[Idx]);
} else {		} else {
for (Value *V : {PairVec.first, PairVec.second})		for (Value *V : {PairVec.first, PairVec.second})
for (int Idx : VectorOpToIdx[V])		for (int Idx : VectorOpToIdx[V])
std::swap(GatheredExtracts[Idx], VL[Idx]);		std::swap(GatheredExtracts[Idx], VL[Idx]);
▲ Show 20 Lines • Show All 449 Lines • ▼ Show 20 Lines
}		}

namespace slpvectorizer {		namespace slpvectorizer {

/// Bottom Up SLP Vectorizer.		/// Bottom Up SLP Vectorizer.
class BoUpSLP {		class BoUpSLP {
struct TreeEntry;		struct TreeEntry;
struct ScheduleData;		struct ScheduleData;
		class ShuffleCostEstimator;
class ShuffleInstructionBuilder;		class ShuffleInstructionBuilder;

public:		public:
using ValueList = SmallVector<Value *, 8>;		using ValueList = SmallVector<Value *, 8>;
using InstrList = SmallVector<Instruction *, 16>;		using InstrList = SmallVector<Instruction *, 16>;
using ValueSet = SmallPtrSet<Value *, 16>;		using ValueSet = SmallPtrSet<Value *, 16>;
using StoreList = SmallVector<StoreInst *, 8>;		using StoreList = SmallVector<StoreInst *, 8>;
using ExtraValueToDebugLocsMap =		using ExtraValueToDebugLocsMap =
▲ Show 20 Lines • Show All 1,356 Lines • ▼ Show 20 Lines	private:

/// Vectorize a single entry in the tree, the \p Idx-th operand of the entry		/// Vectorize a single entry in the tree, the \p Idx-th operand of the entry
/// \p E.		/// \p E.
Value vectorizeOperand(TreeEntry E, unsigned NodeIdx);		Value vectorizeOperand(TreeEntry E, unsigned NodeIdx);

/// Create a new vector from a list of scalar values. Produces a sequence		/// Create a new vector from a list of scalar values. Produces a sequence
/// which exploits values reused across lanes, and arranges the inserts		/// which exploits values reused across lanes, and arranges the inserts
/// for ease of later optimization.		/// for ease of later optimization.
		template <typename BVTy, typename ResTy, typename... Args>
		ResTy processBuildVector(const TreeEntry *E, Args &...Params);

		/// Create a new vector from a list of scalar values. Produces a sequence
		/// which exploits values reused across lanes, and arranges the inserts
		/// for ease of later optimization.
Value createBuildVector(const TreeEntry E);		Value createBuildVector(const TreeEntry E);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars. If \p		/// context means the creation of vectors from a group of scalars. If \p
/// NeedToShuffle is true, need to add a cost of reshuffling some of the		/// NeedToShuffle is true, need to add a cost of reshuffling some of the
/// vector elements.		/// vector elements.
InstructionCost getGatherCost(FixedVectorType *Ty,		InstructionCost getGatherCost(FixedVectorType *Ty,
const APInt &ShuffledIndices,		const APInt &ShuffledIndices,
Show All 22 Lines	private:
/// roots. This method calculates the cost of extracting the values.		/// roots. This method calculates the cost of extracting the values.
InstructionCost getGatherCost(ArrayRef<Value *> VL) const;		InstructionCost getGatherCost(ArrayRef<Value *> VL) const;

/// Set the Builder insert point to one after the last instruction in		/// Set the Builder insert point to one after the last instruction in
/// the bundle		/// the bundle
void setInsertPointAfterBundle(const TreeEntry *E);		void setInsertPointAfterBundle(const TreeEntry *E);

/// \returns a vector from a collection of scalars in \p VL.		/// \returns a vector from a collection of scalars in \p VL.
Value gather(ArrayRef<Value > VL);		Value gather(ArrayRef<Value > VL, Value *Root = nullptr);

/// \returns whether the VectorizableTree is fully vectorizable and will		/// \returns whether the VectorizableTree is fully vectorizable and will
/// be beneficial even the tree height is tiny.		/// be beneficial even the tree height is tiny.
bool isFullyVectorizableTinyTree(bool ForReduction) const;		bool isFullyVectorizableTinyTree(bool ForReduction) const;

/// Reorder commutative or alt operands to get better probability of		/// Reorder commutative or alt operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
static void reorderInputsAccordingToOpcode(		static void reorderInputsAccordingToOpcode(
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bool isSame(ArrayRef<Value *> VL) const {
});		});
};		};
if (!ReorderIndices.empty()) {		if (!ReorderIndices.empty()) {
// TODO: implement matching if the nodes are just reordered, still can		// TODO: implement matching if the nodes are just reordered, still can
// treat the vector as the same if the list of scalars matches VL		// treat the vector as the same if the list of scalars matches VL
// directly, without reordering.		// directly, without reordering.
SmallVector<int> Mask;		SmallVector<int> Mask;
inversePermutation(ReorderIndices, Mask);		inversePermutation(ReorderIndices, Mask);
if (VL.size() == Scalars.size())		if (VL.size() == Scalars.size() && ReuseShuffleIndices.empty())
return IsSame(Scalars, Mask);		return IsSame(Scalars, Mask);
if (VL.size() == ReuseShuffleIndices.size()) {		if (VL.size() == ReuseShuffleIndices.size()) {
::addMask(Mask, ReuseShuffleIndices);		::addMask(Mask, ReuseShuffleIndices);
return IsSame(Scalars, Mask);		return IsSame(Scalars, Mask);
}		}
return false;		return false;
}		}
return IsSame(Scalars, ReuseShuffleIndices);		return IsSame(Scalars, ReuseShuffleIndices);
▲ Show 20 Lines • Show All 3,722 Lines • ▼ Show 20 Lines	if (!CI->isNoBuiltin() && VecFunc) {
// Calculate the cost of the vector library call.		// Calculate the cost of the vector library call.
// If the corresponding vector call is cheaper, return its cost.		// If the corresponding vector call is cheaper, return its cost.
LibCost = TTI->getCallInstrCost(nullptr, VecTy, VecTys,		LibCost = TTI->getCallInstrCost(nullptr, VecTy, VecTys,
TTI::TCK_RecipThroughput);		TTI::TCK_RecipThroughput);
}		}
return {IntrinsicCost, LibCost};		return {IntrinsicCost, LibCost};
}		}

/// Compute the cost of creating a vector of type \p VecTy containing the
/// extracted values from \p VL.
static InstructionCost
computeExtractCost(ArrayRef<Value > VL, FixedVectorType VecTy,
TargetTransformInfo::ShuffleKind ShuffleKind,
ArrayRef<int> Mask, TargetTransformInfo &TTI) {
unsigned NumOfParts = TTI.getNumberOfParts(VecTy);

if (ShuffleKind != TargetTransformInfo::SK_PermuteSingleSrc \|\| !NumOfParts \|\|
VecTy->getNumElements() < NumOfParts)
return TTI.getShuffleCost(ShuffleKind, VecTy, Mask);

bool AllConsecutive = true;
unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;
unsigned Idx = -1;
InstructionCost Cost = 0;

// Process extracts in blocks of EltsPerVector to check if the source vector
// operand can be re-used directly. If not, add the cost of creating a shuffle
// to extract the values into a vector register.
SmallVector<int> RegMask(EltsPerVector, UndefMaskElem);
for (auto *V : VL) {
++Idx;

// Reached the start of a new vector registers.
if (Idx % EltsPerVector == 0) {
RegMask.assign(EltsPerVector, UndefMaskElem);
AllConsecutive = true;
continue;
}

// Need to exclude undefs from analysis.
if (isa<UndefValue>(V) \|\| Mask[Idx] == UndefMaskElem)
continue;

// Check all extracts for a vector register on the target directly
// extract values in order.
unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));
if (!isa<UndefValue>(VL[Idx - 1]) && Mask[Idx - 1] != UndefMaskElem) {
unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));
AllConsecutive &= PrevIdx + 1 == CurrentIdx &&
CurrentIdx % EltsPerVector == Idx % EltsPerVector;
RegMask[Idx % EltsPerVector] = CurrentIdx % EltsPerVector;
}

if (AllConsecutive)
continue;

// Skip all indices, except for the last index per vector block.
if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())
continue;

// If we have a series of extracts which are not consecutive and hence
// cannot re-use the source vector register directly, compute the shuffle
// cost to extract the vector with EltsPerVector elements.
Cost += TTI.getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc,
FixedVectorType::get(VecTy->getElementType(), EltsPerVector), RegMask);
}
return Cost;
}

/// Build shuffle mask for shuffle graph entries and lists of main and alternate		/// Build shuffle mask for shuffle graph entries and lists of main and alternate
/// operations operands.		/// operations operands.
static void		static void
buildShuffleEntryMask(ArrayRef<Value *> VL, ArrayRef<unsigned> ReorderIndices,		buildShuffleEntryMask(ArrayRef<Value *> VL, ArrayRef<unsigned> ReorderIndices,
ArrayRef<int> ReusesIndices,		ArrayRef<int> ReusesIndices,
const function_ref<bool(Instruction *)> IsAltOp,		const function_ref<bool(Instruction *)> IsAltOp,
SmallVectorImpl<int> &Mask,		SmallVectorImpl<int> &Mask,
SmallVectorImpl<Value > OpScalars = nullptr,		SmallVectorImpl<Value > OpScalars = nullptr,
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
IdentityMask.assign(Mask);		IdentityMask.assign(Mask);
}		}
int LocalVF = Mask.size();		int LocalVF = Mask.size();
if (auto *SVOpTy =		if (auto *SVOpTy =
dyn_cast<FixedVectorType>(SV->getOperand(0)->getType()))		dyn_cast<FixedVectorType>(SV->getOperand(0)->getType()))
LocalVF = SVOpTy->getNumElements();		LocalVF = SVOpTy->getNumElements();
SmallVector<int> ExtMask(Mask.size(), UndefMaskElem);		SmallVector<int> ExtMask(Mask.size(), UndefMaskElem);
for (auto [Idx, I] : enumerate(Mask)) {		for (auto [Idx, I] : enumerate(Mask)) {
if (I == UndefMaskElem)		if (I == UndefMaskElem \|\|
		static_cast<unsigned>(I) >= SV->getShuffleMask().size())
continue;		continue;
ExtMask[Idx] = SV->getMaskValue(I);		ExtMask[Idx] = SV->getMaskValue(I);
}		}
bool IsOp1Undef =		bool IsOp1Undef =
isUndefVector(SV->getOperand(0),		isUndefVector(SV->getOperand(0),
buildUseMask(LocalVF, ExtMask, UseMask::FirstArg))		buildUseMask(LocalVF, ExtMask, UseMask::FirstArg))
.all();		.all();
bool IsOp2Undef =		bool IsOp2Undef =
isUndefVector(SV->getOperand(1),		isUndefVector(SV->getOperand(1),
buildUseMask(LocalVF, ExtMask, UseMask::SecondArg))		buildUseMask(LocalVF, ExtMask, UseMask::SecondArg))
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	static bool peekThroughShuffles(Value *&V, SmallVectorImpl<int> &Mask,
}		}
V = Op;		V = Op;
return true;		return true;
}		}

/// Smart shuffle instruction emission, walks through shuffles trees and		/// Smart shuffle instruction emission, walks through shuffles trees and
/// tries to find the best matching vector for the actual shuffle		/// tries to find the best matching vector for the actual shuffle
/// instruction.		/// instruction.
template <typename ShuffleBuilderTy>		template <typename T, typename ShuffleBuilderTy>
static Value createShuffle(Value V1, Value *V2, ArrayRef<int> Mask,		static T createShuffle(Value V1, Value V2, ArrayRef<int> Mask,
ShuffleBuilderTy &Builder) {		ShuffleBuilderTy &Builder) {
assert(V1 && "Expected at least one vector value.");		assert(V1 && "Expected at least one vector value.");
		if (V2)
		Builder.resizeToMatch(V1, V2);
int VF = Mask.size();		int VF = Mask.size();
if (auto *FTy = dyn_cast<FixedVectorType>(V1->getType()))		if (auto *FTy = dyn_cast<FixedVectorType>(V1->getType()))
VF = FTy->getNumElements();		VF = FTy->getNumElements();
if (V2 &&		if (V2 &&
!isUndefVector(V2, buildUseMask(VF, Mask, UseMask::SecondArg)).all()) {		!isUndefVector(V2, buildUseMask(VF, Mask, UseMask::SecondArg)).all()) {
// Peek through shuffles.		// Peek through shuffles.
Value *Op1 = V1;		Value *Op1 = V1;
Value *Op2 = V2;		Value *Op2 = V2;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	if (V2 &&
if (auto *FTy = dyn_cast<FixedVectorType>(Op2->getType()))		if (auto *FTy = dyn_cast<FixedVectorType>(Op2->getType()))
LocalVF = FTy->getNumElements();		LocalVF = FTy->getNumElements();
combineMasks(LocalVF, ShuffleMask2, CombinedMask2);		combineMasks(LocalVF, ShuffleMask2, CombinedMask2);
CombinedMask2.swap(ShuffleMask2);		CombinedMask2.swap(ShuffleMask2);
}		}
}		}
} while (PrevOp1 != Op1 \|\| PrevOp2 != Op2);		} while (PrevOp1 != Op1 \|\| PrevOp2 != Op2);
Builder.resizeToMatch(Op1, Op2);		Builder.resizeToMatch(Op1, Op2);
VF = std::max(cast<VectorType>(Op1->getType())		VF = cast<VectorType>(Op1->getType())
->getElementCount()		->getElementCount()
.getKnownMinValue(),		.getKnownMinValue();
cast<VectorType>(Op2->getType())
->getElementCount()
.getKnownMinValue());
for (int I = 0, E = Mask.size(); I < E; ++I) {		for (int I = 0, E = Mask.size(); I < E; ++I) {
if (CombinedMask2[I] != UndefMaskElem) {		if (CombinedMask2[I] != UndefMaskElem) {
assert(CombinedMask1[I] == UndefMaskElem &&		assert(CombinedMask1[I] == UndefMaskElem &&
"Expected undefined mask element");		"Expected undefined mask element");
CombinedMask1[I] = CombinedMask2[I] + (Op1 == Op2 ? 0 : VF);		CombinedMask1[I] = CombinedMask2[I] + (Op1 == Op2 ? 0 : VF);
}		}
}		}
		const int Limit = CombinedMask1.size() * 2;
		if (Op1 == Op2 && Limit == 2 * VF &&
		all_of(CombinedMask1, [=](int Idx) { return Idx < Limit; }) &&
		(ShuffleVectorInst::isIdentityMask(CombinedMask1) \|\|
		(ShuffleVectorInst::isZeroEltSplatMask(CombinedMask1) &&
		isa<ShuffleVectorInst>(Op1) &&
		cast<ShuffleVectorInst>(Op1)->getShuffleMask() ==
		ArrayRef(CombinedMask1))))
		return Builder.createIdentity(Op1);
return Builder.createShuffleVector(		return Builder.createShuffleVector(
Op1, Op1 == Op2 ? PoisonValue::get(Op1->getType()) : Op2,		Op1, Op1 == Op2 ? PoisonValue::get(Op1->getType()) : Op2,
CombinedMask1);		CombinedMask1);
}		}
if (isa<PoisonValue>(V1))		if (isa<PoisonValue>(V1))
return PoisonValue::get(FixedVectorType::get(		return Builder.createPoison(
cast<VectorType>(V1->getType())->getElementType(), Mask.size()));		cast<VectorType>(V1->getType())->getElementType(), Mask.size());
SmallVector<int> NewMask(Mask.begin(), Mask.end());		SmallVector<int> NewMask(Mask.begin(), Mask.end());
bool IsIdentity = peekThroughShuffles(V1, NewMask, /SinglePermute=/true);		bool IsIdentity = peekThroughShuffles(V1, NewMask, /SinglePermute=/true);
assert(V1 && "Expected non-null value after looking through shuffles.");		assert(V1 && "Expected non-null value after looking through shuffles.");

if (!IsIdentity)		if (!IsIdentity)
return Builder.createShuffleVector(V1, NewMask);		return Builder.createShuffleVector(V1, NewMask);
return V1;		return Builder.createIdentity(V1);
}		}
};		};
} // namespace		} // namespace

InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,		/// Merges shuffle masks and emits final shuffle instruction, if required. It
ArrayRef<Value *> VectorizedVals) {		/// supports shuffling of 2 input vectors. It implements lazy shuffles emission,
ArrayRef<Value *> VL = E->Scalars;		/// when the actual shuffle instruction is generated only if this is actually
		/// required. Otherwise, the shuffle instruction emission is delayed till the
		/// end of the process, to reduce the number of emitted instructions and further
		/// analysis/transformations.
		class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
		bool IsFinalized = false;
		SmallVector<int> CommonMask;
		SmallVector<Value *, 2> InVectors;
		const TargetTransformInfo &TTI;
		InstructionCost Cost = 0;
		ArrayRef<Value *> VectorizedVals;
		BoUpSLP &R;
		constexpr static TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

Type *ScalarTy = VL[0]->getType();		class ShuffleCostBuilder {
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		const TargetTransformInfo &TTI;
ScalarTy = SI->getValueOperand()->getType();
else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))
ScalarTy = CI->getOperand(0)->getType();
else if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))
ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

// If we have computed a smaller type for the expression, update VecTy so		static bool isEmptyOrIdentity(ArrayRef<int> Mask, unsigned VF) {
// that the costs will be accurate.		return Mask.empty() \|\|
if (MinBWs.count(VL[0]))		(VF == Mask.size() && all_of(enumerate(Mask), [](auto Pair) {
VecTy = FixedVectorType::get(		return Pair.value() == UndefMaskElem \|\|
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());		Pair.index() == static_cast<unsigned>(Pair.value());
		RKSimonUnsubmitted Not Done Reply Inline Actions Any chance that we can use ShuffleVectorInst::isIdentityMask ? RKSimon: Any chance that we can use ShuffleVectorInst::isIdentityMask ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, will do it later ABataev: Sure, will do it later
unsigned EntryVF = E->getVectorFactor();		}));
auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);		}

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		public:
// FIXME: it tries to fix a problem with MSVC buildbots.		ShuffleCostBuilder(const TargetTransformInfo &TTI) : TTI(TTI) {}
TargetTransformInfo *TTI = this->TTI;		~ShuffleCostBuilder() = default;
auto AdjustExtractsCost = [=](InstructionCost &Cost,		InstructionCost createShuffleVector(Value V1, Value ,
ArrayRef<int> Mask) -> Value * {		ArrayRef<int> Mask) const {
		// Empty mask or identity mask are free.
		unsigned VF =
		cast<VectorType>(V1->getType())->getElementCount().getKnownMinValue();
		if (isEmptyOrIdentity(Mask, VF))
		return TTI::TCC_Free;
		return TTI.getShuffleCost(
		TTI::SK_PermuteTwoSrc,
		FixedVectorType::get(
		cast<VectorType>(V1->getType())->getElementType(), Mask.size()),
		Mask);
		}
		InstructionCost createShuffleVector(Value *V1, ArrayRef<int> Mask) const {
		// Empty mask or identity mask are free.
		if (isEmptyOrIdentity(Mask, Mask.size()))
		return TTI::TCC_Free;
		return TTI.getShuffleCost(
		TTI::SK_PermuteSingleSrc,
		FixedVectorType::get(
		cast<VectorType>(V1->getType())->getElementType(), Mask.size()),
		Mask);
		}
		InstructionCost createIdentity(Value *) const { return TTI::TCC_Free; }
		InstructionCost createPoison(Type *Ty, unsigned VF) const {
		return TTI::TCC_Free;
		}
		void resizeToMatch(Value &, Value &) const {}
		};

		/// Smart shuffle instruction emission, walks through shuffles trees and
		/// tries to find the best matching vector for the actual shuffle
		/// instruction.
		InstructionCost createShuffle(Value V1, Value V2, ArrayRef<int> Mask) {
		ShuffleCostBuilder Builder(TTI);
		return BaseShuffleAnalysis::createShuffle<InstructionCost>(V1, V2, Mask,
		Builder);
		}

		public:
		ShuffleCostEstimator(TargetTransformInfo &TTI,
		ArrayRef<Value *> VectorizedVals, BoUpSLP &R)
		: TTI(TTI), VectorizedVals(VectorizedVals), R(R) {}
		Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask) {
if (Mask.empty())		if (Mask.empty())
return nullptr;		return nullptr;
Value *VecBase = nullptr;		Value *VecBase = nullptr;
		ArrayRef<Value *> VL = E->Scalars;
		auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());
// If the resulting type is scalarized, do not adjust the cost.		// If the resulting type is scalarized, do not adjust the cost.
unsigned VecNumParts = TTI->getNumberOfParts(VecTy);		unsigned VecNumParts = TTI.getNumberOfParts(VecTy);
if (VecNumParts == VecTy->getNumElements())		if (VecNumParts == VecTy->getNumElements())
return nullptr;		return nullptr;
DenseMap<Value *, int> ExtractVectorsTys;		DenseMap<Value *, int> ExtractVectorsTys;
SmallPtrSet<Value *, 4> CheckedExtracts;		SmallPtrSet<Value *, 4> CheckedExtracts;
for (auto [I, V] : enumerate(VL)) {		for (auto [I, V] : enumerate(VL)) {
		// Ignore non-extractelement scalars.
if (isa<UndefValue>(V) \|\| (!Mask.empty() && Mask[I] == UndefMaskElem))		if (isa<UndefValue>(V) \|\| (!Mask.empty() && Mask[I] == UndefMaskElem))
continue;		continue;
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
// Also, avoid adjusting the cost for extractelements with multiple uses		// Also, avoid adjusting the cost for extractelements with multiple uses
// in different graph entries.		// in different graph entries.
const TreeEntry *VE = getTreeEntry(V);		const TreeEntry *VE = R.getTreeEntry(V);
if (!CheckedExtracts.insert(V).second \|\|		if (!CheckedExtracts.insert(V).second \|\|
!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|		!R.areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|
(VE && VE != E))		(VE && VE != E))
continue;		continue;
auto *EE = cast<ExtractElementInst>(V);		auto *EE = cast<ExtractElementInst>(V);
VecBase = EE->getVectorOperand();		VecBase = EE->getVectorOperand();
std::optional<unsigned> EEIdx = getExtractIndex(EE);		std::optional<unsigned> EEIdx = getExtractIndex(EE);
if (!EEIdx)		if (!EEIdx)
continue;		continue;
unsigned Idx = *EEIdx;		unsigned Idx = *EEIdx;
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
if (VecNumParts != TTI->getNumberOfParts(EE->getVectorOperandType())) {		if (VecNumParts != TTI.getNumberOfParts(EE->getVectorOperandType())) {
auto It =		auto It =
ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;		ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;
It->getSecond() = std::min<int>(It->second, Idx);		It->getSecond() = std::min<int>(It->second, Idx);
}		}
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EE->hasOneUse()) {		if (EE->hasOneUse()) {
Instruction *Ext = EE->user_back();		Instruction *Ext = EE->user_back();
if (isa<SExtInst, ZExtInst>(Ext) && all_of(Ext->users(), [](User *U) {		if (isa<SExtInst, ZExtInst>(Ext) && all_of(Ext->users(), [](User *U) {
return isa<GetElementPtrInst>(U);		return isa<GetElementPtrInst>(U);
})) {		})) {
// Use getExtractWithExtendCost() to calculate the cost of		// Use getExtractWithExtendCost() to calculate the cost of
// extractelement/ext pair.		// extractelement/ext pair.
Cost -=		Cost -= TTI.getExtractWithExtendCost(Ext->getOpcode(), Ext->getType(),
TTI->getExtractWithExtendCost(Ext->getOpcode(), Ext->getType(),
EE->getVectorOperandType(), Idx);		EE->getVectorOperandType(), Idx);
// Add back the cost of s\|zext which is subtracted separately.		// Add back the cost of s\|zext which is subtracted separately.
Cost += TTI->getCastInstrCost(		Cost += TTI.getCastInstrCost(
Ext->getOpcode(), Ext->getType(), EE->getType(),		Ext->getOpcode(), Ext->getType(), EE->getType(),
TTI::getCastContextHint(Ext), CostKind, Ext);		TTI::getCastContextHint(Ext), CostKind, Ext);
continue;		continue;
}		}
}		}
Cost -= TTI->getVectorInstrCost(*EE, EE->getVectorOperandType(), CostKind,		Cost -= TTI.getVectorInstrCost(*EE, EE->getVectorOperandType(), CostKind,
Idx);		Idx);
}		}
// Add a cost for subvector extracts/inserts if required.		// Add a cost for subvector extracts/inserts if required.
for (const auto &Data : ExtractVectorsTys) {		for (const auto &Data : ExtractVectorsTys) {
auto *EEVTy = cast<FixedVectorType>(Data.first->getType());		auto *EEVTy = cast<FixedVectorType>(Data.first->getType());
unsigned NumElts = VecTy->getNumElements();		unsigned NumElts = VecTy->getNumElements();
if (Data.second % NumElts == 0)		if (Data.second % NumElts == 0)
continue;		continue;
if (TTI->getNumberOfParts(EEVTy) > VecNumParts) {		if (TTI.getNumberOfParts(EEVTy) > VecNumParts) {
unsigned Idx = (Data.second / NumElts) * NumElts;		unsigned Idx = (Data.second / NumElts) * NumElts;
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
unsigned EENumElts = EEVTy->getNumElements();		unsigned EENumElts = EEVTy->getNumElements();
if (Idx % NumElts == 0)		if (Idx % NumElts == 0)
continue;		continue;
		RKSimonUnsubmitted Not Done Reply Inline Actions What targets are we still missing support for? RKSimon: What targets are we still missing support for?
		ABataevAuthorUnsubmitted Done Reply Inline Actions AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts. ABataev: AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.
if (Idx + NumElts <= EENumElts) {		if (Idx + NumElts <= EENumElts) {
Cost +=		Cost += TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
TTI->getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, std::nullopt, CostKind, Idx, VecTy);		EEVTy, std::nullopt, CostKind, Idx, VecTy);
} else {		} else {
// Need to round up the subvector type vectorization factor to avoid a		// Need to round up the subvector type vectorization factor to avoid a
// crash in cost model functions. Make SubVT so that Idx + VF of SubVT		// crash in cost model functions. Make SubVT so that Idx + VF of SubVT
// <= EENumElts.		// <= EENumElts.
auto *SubVT =		auto *SubVT =
FixedVectorType::get(VecTy->getElementType(), EENumElts - Idx);		FixedVectorType::get(VecTy->getElementType(), EENumElts - Idx);
Cost +=		Cost += TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
TTI->getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, std::nullopt, CostKind, Idx, SubVT);		EEVTy, std::nullopt, CostKind, Idx, SubVT);
}		}
} else {		} else {
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_InsertSubvector,		Cost += TTI.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,
VecTy, std::nullopt, CostKind, 0, EEVTy);		VecTy, std::nullopt, CostKind, 0, EEVTy);
}		}
}		}
return VecBase;		return VecBase;
};
if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))
return 0;
if (isa<InsertElementInst>(VL[0]))
return InstructionCost::getInvalid();
unsigned VF = E->getVectorFactor();
SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),
E->ReuseShuffleIndices.end());
SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());
// Build a mask out of the reorder indices and reorder scalars per this
// mask.
SmallVector<int> ReorderMask;
inversePermutation(E->ReorderIndices, ReorderMask);
if (!ReorderMask.empty())
reorderScalars(GatheredScalars, ReorderMask);
SmallVector<int> Mask;
SmallVector<int> ExtractMask;
std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;
std::optional<TargetTransformInfo::ShuffleKind> GatherShuffle;
SmallVector<const TreeEntry *> Entries;
Type *ScalarTy = GatheredScalars.front()->getType();
// Check for gathered extracts.
ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);
SmallVector<Value *> IgnoredVals;
if (UserIgnoreList)
IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());

InstructionCost Cost = 0;
bool Resized = false;
if (Value *VecBase = AdjustExtractsCost(Cost, ExtractMask))
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {
Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));
}		}
		std::optional<InstructionCost>
// Do not try to look for reshuffled loads for gathered loads (they will be		needToDelay(const TreeEntry , ArrayRef<const TreeEntry >) const {
// handled later), for vectorized scalars, and cases, which are definitely		// No need to delay the cost estimation during analysis.
// not profitable (splats and small gather nodes.)		return std::nullopt;
if (ExtractShuffle \|\| E->getOpcode() != Instruction::Load \|\|
E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2))
GatherShuffle = isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);
if (GatherShuffle) {
assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");
if (!Resized) {
unsigned VF1 = Entries.front()->getVectorFactor();
unsigned VF2 = Entries.back()->getVectorFactor();
if ((VF == VF1 \|\| VF == VF2) && GatheredScalars.size() != VF)
GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));
}		}
// Remove shuffled elements from list of gathers.		void add(const TreeEntry E1, const TreeEntry E2, ArrayRef<int> Mask) {
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {		// Use zeroinitializer instead of actual vector value here, since they are
if (Mask[I] != UndefMaskElem)		// not ready yet.
GatheredScalars[I] = PoisonValue::get(ScalarTy);		add(Constant::getNullValue(FixedVectorType::get(
		E1->Scalars.front()->getType(), E1->getVectorFactor())),
		Constant::getNullValue(FixedVectorType::get(
		E2->Scalars.front()->getType(), E2->getVectorFactor())),
		Mask);
}		}
InstructionCost GatherCost = 0;		void add(const TreeEntry *E1, ArrayRef<int> Mask) {
int Limit = Mask.size() * 2;		// Use zeroinitializer instead of actual vector value here, since they are
if (all_of(Mask, [=](int Idx) { return Idx < Limit; }) &&		// not ready yet.
ShuffleVectorInst::isIdentityMask(Mask)) {		add(Constant::getNullValue(FixedVectorType::get(
// Perfect match in the graph, will reuse the previously vectorized		E1->Scalars.front()->getType(), E1->getVectorFactor())),
// node. Cost is 0.		Mask);
LLVM_DEBUG(
dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "
<< *VL.front() << ".\n");
if (NeedToShuffleReuses)
GatherCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);
} else {
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
<< " entries for bundle that starts with "
<< *VL.front() << ".\n");
// Detected that instead of gather we can emit a shuffle of single/two
// previously vectorized nodes. Add the cost of the permutation rather
// than gather.
::addMask(Mask, E->ReuseShuffleIndices);
GatherCost = TTI->getShuffleCost(*GatherShuffle, FinalVecTy, Mask);
}		}
if (!all_of(GatheredScalars, UndefValue::classof))		/// Adds 2 input vectors and the mask for their shuffling.
GatherCost += getGatherCost(GatheredScalars);		void add(Value V1, Value V2, ArrayRef<int> Mask) {
return GatherCost;		assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");
		if (InVectors.empty()) {
		InVectors.push_back(V1);
		InVectors.push_back(V2);
		CommonMask.assign(Mask.begin(), Mask.end());
		return;
}		}
if (ExtractShuffle && all_of(GatheredScalars, PoisonValue::classof)) {		Value *Vec = InVectors.front();
// Check that gather of extractelements can be represented as just a		if (InVectors.size() == 2) {
// shuffle of a single/two vectors the scalars are extracted from.		Cost += createShuffle(Vec, InVectors.back(), CommonMask);
// Found the bunch of extractelement instructions that must be gathered		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
// into a vector and can be represented as a permutation elements in a		if (Mask[Idx] != UndefMaskElem)
// single input vector or of 2 input vectors.		CommonMask[Idx] = Idx;
InstructionCost Cost =		} else if (cast<FixedVectorType>(Vec->getType())->getNumElements() !=
computeExtractCost(VL, VecTy, ExtractShuffle, ExtractMask, TTI);		Mask.size()) {
AdjustExtractsCost(Cost, ExtractMask);		Cost += createShuffle(Vec, nullptr, CommonMask);
if (NeedToShuffleReuses)		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		if (Mask[Idx] != UndefMaskElem)
FinalVecTy, E->ReuseShuffleIndices);		CommonMask[Idx] = Idx;
return Cost;
}		}
if (isSplat(VL)) {		Cost += createShuffle(V1, V2, Mask);
// Found the broadcasting of the single scalar, calculate the cost as the		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
// broadcast.		if (Mask[Idx] != UndefMaskElem)
assert(VecTy == FinalVecTy &&		CommonMask[Idx] = Idx + Sz;
"No reused scalars expected for broadcast.");		InVectors.front() = Vec;
const auto *It =		if (InVectors.size() == 2)
find_if(VL, [](Value *V) { return !isa<UndefValue>(V); });		InVectors.back() = V1;
// If all values are undefs - consider cost free.		else
if (It == VL.end())		InVectors.push_back(V1);
return TTI::TCC_Free;
// Add broadcast for non-identity shuffle only.
bool NeedShuffle =
count(VL, *It) > 1 &&
(VL.front() != *It \|\| !all_of(VL.drop_front(), UndefValue::classof));
InstructionCost InsertCost = TTI->getVectorInstrCost(
Instruction::InsertElement, VecTy, CostKind,
NeedShuffle ? 0 : std::distance(VL.begin(), It),
PoisonValue::get(VecTy), *It);
return InsertCost + (NeedShuffle
? TTI->getShuffleCost(
TargetTransformInfo::SK_Broadcast, VecTy,
/Mask=/std::nullopt, CostKind,
/Index=/0,
/SubTp=/nullptr, /Args=/*It)
: TTI::TCC_Free);
}		}
InstructionCost ReuseShuffleCost = 0;		/// Adds another one input vector and the mask for the shuffling.
if (NeedToShuffleReuses)		void add(Value *V1, ArrayRef<int> Mask) {
ReuseShuffleCost = TTI->getShuffleCost(		if (InVectors.empty()) {
TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);		if (!isa<FixedVectorType>(V1->getType())) {
		Cost += createShuffle(V1, nullptr, CommonMask);
		CommonMask.assign(Mask.size(), UndefMaskElem);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		}
		InVectors.push_back(V1);
		CommonMask.assign(Mask.begin(), Mask.end());
		return;
		}
		const auto *It = find(InVectors, V1);
		if (It == InVectors.end()) {
		if (InVectors.size() == 2 \|\|
		InVectors.front()->getType() != V1->getType() \|\|
		!isa<FixedVectorType>(V1->getType())) {
		Value *V = InVectors.front();
		if (InVectors.size() == 2) {
		Cost +=
		createShuffle(InVectors.front(), InVectors.back(), CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		} else if (cast<FixedVectorType>(V->getType())->getNumElements() !=
		CommonMask.size()) {
		Cost += createShuffle(InVectors.front(), nullptr, CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] == UndefMaskElem && Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] =
		V->getType() != V1->getType()
		? Idx + Sz
		: Mask[Idx] + cast<FixedVectorType>(V1->getType())
		->getNumElements();
		if (V->getType() != V1->getType())
		Cost += createShuffle(V1, nullptr, Mask);
		InVectors.front() = V;
		if (InVectors.size() == 2)
		InVectors.back() = V1;
		else
		InVectors.push_back(V1);
		return;
		}
		// Check if second vector is required if the used elements are already
		// used from the first one.
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem && CommonMask[Idx] == UndefMaskElem) {
		InVectors.push_back(V1);
		break;
		}
		}
		int VF = CommonMask.size();
		if (auto *FTy = dyn_cast<FixedVectorType>(V1->getType()))
		VF = FTy->getNumElements();
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem && CommonMask[Idx] == UndefMaskElem)
		CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);
		}
		/// Adds another one input vector and the mask for the shuffling.
		void addOrdered(Value *V1, ArrayRef<unsigned> Order) {
		SmallVector<int, 4> NewMask;
		inversePermutation(Order, NewMask);
		add(V1, NewMask);
		}
		Value gather(ArrayRef<Value > VL, Value *Root = nullptr) {
		auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());
		auto BuildVectorCost = [&](ArrayRef<Value *> VL,
		Value *Root) -> InstructionCost {
		InstructionCost GatherCost = 0;
		SmallVector<Value *> Gathers(VL.begin(), VL.end());
		BoUpSLP::ValueSet VectorizedLoads;
// Improve gather cost for gather of loads, if we can group some of the		// Improve gather cost for gather of loads, if we can group some of the
// loads into vector loads.		// loads into vector loads.
if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&		InstructionsState S = getSameOpcode(VL, *R.TLI);
!E->isAltShuffle()) {		if (VL.size() > 2 && S.getOpcode() == Instruction::Load &&
BoUpSLP::ValueSet VectorizedLoads;		!S.isAltShuffle() &&
		!all_of(Gathers, [&](Value *V) { return R.getTreeEntry(V); }) &&
		!isSplat(Gathers)) {
unsigned StartIdx = 0;		unsigned StartIdx = 0;
unsigned VF = VL.size() / 2;		unsigned VF = VL.size() / 2;
unsigned VectorizedCnt = 0;		unsigned VectorizedCnt = 0;
unsigned ScatterVectorizeCnt = 0;		unsigned ScatterVectorizeCnt = 0;
const unsigned Sz = DL->getTypeSizeInBits(E->getMainOp()->getType());		const unsigned Sz = R.DL->getTypeSizeInBits(S.MainOp->getType());
for (unsigned MinVF = getMinVF(2 * Sz); VF >= MinVF; VF /= 2) {		for (unsigned MinVF = R.getMinVF(2 * Sz); VF >= MinVF; VF /= 2) {
for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;		for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;
Cnt += VF) {		Cnt += VF) {
ArrayRef<Value *> Slice = VL.slice(Cnt, VF);		ArrayRef<Value *> Slice = VL.slice(Cnt, VF);
if (!VectorizedLoads.count(Slice.front()) &&		if (!VectorizedLoads.count(Slice.front()) &&
!VectorizedLoads.count(Slice.back()) && allSameBlock(Slice)) {		!VectorizedLoads.count(Slice.back()) && allSameBlock(Slice)) {
SmallVector<Value *> PointerOps;		SmallVector<Value *> PointerOps;
OrdersType CurrentOrder;		OrdersType CurrentOrder;
LoadsState LS =		LoadsState LS =
canVectorizeLoads(Slice, Slice.front(), TTI, DL, SE, LI,		canVectorizeLoads(Slice, Slice.front(), TTI, R.DL, R.SE,
*TLI, CurrentOrder, PointerOps);		R.LI, R.TLI, CurrentOrder, PointerOps);
switch (LS) {		switch (LS) {
case LoadsState::Vectorize:		case LoadsState::Vectorize:
case LoadsState::ScatterVectorize:		case LoadsState::ScatterVectorize:
// Mark the vectorized loads so that we don't vectorize them		// Mark the vectorized loads so that we don't vectorize them
// again.		// again.
if (LS == LoadsState::Vectorize)		if (LS == LoadsState::Vectorize)
++VectorizedCnt;		++VectorizedCnt;
else		else
++ScatterVectorizeCnt;		++ScatterVectorizeCnt;
VectorizedLoads.insert(Slice.begin(), Slice.end());		VectorizedLoads.insert(Slice.begin(), Slice.end());
// If we vectorized initial block, no need to try to vectorize it		// If we vectorized initial block, no need to try to vectorize
// again.		// it again.
if (Cnt == StartIdx)		if (Cnt == StartIdx)
StartIdx += VF;		StartIdx += VF;
break;		break;
case LoadsState::Gather:		case LoadsState::Gather:
break;		break;
}		}
}		}
}		}
// Check if the whole array was vectorized already - exit.		// Check if the whole array was vectorized already - exit.
if (StartIdx >= VL.size())		if (StartIdx >= VL.size())
break;		break;
// Found vectorizable parts - exit.		// Found vectorizable parts - exit.
if (!VectorizedLoads.empty())		if (!VectorizedLoads.empty())
break;		break;
}		}
if (!VectorizedLoads.empty()) {		if (!VectorizedLoads.empty()) {
InstructionCost GatherCost = 0;
unsigned NumParts = TTI->getNumberOfParts(VecTy);
bool NeedInsertSubvectorAnalysis =
!NumParts \|\| (VL.size() / VF) > NumParts;
// Get the cost for gathered loads.		// Get the cost for gathered loads.
for (unsigned I = 0, End = VL.size(); I < End; I += VF) {		for (unsigned I = 0, End = VL.size(); I < End; I += VF) {
if (VectorizedLoads.contains(VL[I]))		if (!VectorizedLoads.contains(VL[I]))
continue;		continue;
GatherCost += getGatherCost(VL.slice(I, VF));		// Exclude potentially vectorized loads from list of gathered
		// scalars.
		for (unsigned K = I, End = I + VF; K < End; ++K)
		Gathers[K] = PoisonValue::get(Gathers[K]->getType());
}		}
// The cost for vectorized loads.		// The cost for vectorized loads.
InstructionCost ScalarsCost = 0;		InstructionCost ScalarsCost = 0;
for (Value *V : VectorizedLoads) {		for (Value *V : VectorizedLoads) {
auto *LI = cast<LoadInst>(V);		auto *LI = cast<LoadInst>(V);
ScalarsCost +=		ScalarsCost += TTI.getMemoryOpCost(
TTI->getMemoryOpCost(Instruction::Load, LI->getType(),		Instruction::Load, LI->getType(), LI->getAlign(),
LI->getAlign(), LI->getPointerAddressSpace(),		LI->getPointerAddressSpace(), CostKind, TTI::OperandValueInfo(),
CostKind, TTI::OperandValueInfo(), LI);		LI);
}		}
auto *LI = cast<LoadInst>(E->getMainOp());		auto *LI = cast<LoadInst>(S.MainOp);
auto *LoadTy = FixedVectorType::get(LI->getType(), VF);		auto *LoadTy = FixedVectorType::get(LI->getType(), VF);
Align Alignment = LI->getAlign();		Align Alignment = LI->getAlign();
GatherCost +=		GatherCost +=
VectorizedCnt *		VectorizedCnt *
TTI->getMemoryOpCost(Instruction::Load, LoadTy, Alignment,		TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment,
LI->getPointerAddressSpace(), CostKind,		LI->getPointerAddressSpace(), CostKind,
TTI::OperandValueInfo(), LI);		TTI::OperandValueInfo(), LI);
GatherCost += ScatterVectorizeCnt *		GatherCost += ScatterVectorizeCnt *
TTI->getGatherScatterOpCost(		TTI.getGatherScatterOpCost(
Instruction::Load, LoadTy, LI->getPointerOperand(),		Instruction::Load, LoadTy, LI->getPointerOperand(),
/VariableMask=/false, Alignment, CostKind, LI);		/VariableMask=/false, Alignment, CostKind, LI);
if (NeedInsertSubvectorAnalysis) {		// Add the cost for the subvectors shuffling.
// Add the cost for the subvectors insert.		GatherCost += (VectorizedCnt + ScatterVectorizeCnt - 1) *
for (int I = VF, E = VL.size(); I < E; I += VF)		TTI.getShuffleCost(TTI::SK_Select, VecTy);
GatherCost +=		GatherCost -= ScalarsCost;
TTI->getShuffleCost(TTI::SK_InsertSubvector, VecTy,		}
std::nullopt, CostKind, I, LoadTy);		} else if (!Root && !allConstant(VL) && isSplat(VL)) {
		// Found the broadcasting of the single scalar, calculate the cost as
		// the broadcast.
		const auto *It =
		find_if(VL, [](Value *V) { return !isa<UndefValue>(V); });
		assert(It != VL.end() && "Expected at least one non-undef value.");
		// Add broadcast for non-identity shuffle only.
		bool NeedShuffle =
		count(VL, *It) > 1 &&
		(VL.front() != *It \|\| !all_of(VL.drop_front(), UndefValue::classof));
		InstructionCost InsertCost = TTI.getVectorInstrCost(
		Instruction::InsertElement, VecTy, CostKind,
		NeedShuffle ? 0 : std::distance(VL.begin(), It),
		PoisonValue::get(VecTy), *It);
		return InsertCost +
		(NeedShuffle ? TTI.getShuffleCost(
		TargetTransformInfo::SK_Broadcast, VecTy,
		/Mask=/std::nullopt, CostKind,
		/Index=/0, /SubTp=/nullptr, /Args=/*It)
		: TTI::TCC_Free);
		}
		return GatherCost + R.getGatherCost(Gathers);
		};
		Cost += BuildVectorCost(VL, Root);
		if (!Root) {
		SmallVector<Constant *> Vals;
		for (Value *V : VL) {
		if (isa<UndefValue>(V)) {
		Vals.push_back(cast<Constant>(V));
		continue;
		}
		Vals.push_back(Constant::getNullValue(V->getType()));
		}
		return ConstantVector::get(Vals);
		}
		return ConstantVector::getSplat(
		ElementCount::getFixed(VL.size()),
		Constant::getNullValue(VL.front()->getType()));
		}
		InstructionCost createFreeze(InstructionCost Cost) { return Cost; }
		/// Finalize emission of the shuffles.
		InstructionCost
		finalize(ArrayRef<int> ExtMask,
		function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {
		IsFinalized = true;
		if (Action) {
		Value *Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Cost += createShuffle(Vec, InVectors.back(), CommonMask);
		InVectors.pop_back();
		} else {
		Cost += createShuffle(Vec, nullptr, CommonMask);
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		Action(Vec, CommonMask);
		InVectors.front() = Vec;
		}
		if (!ExtMask.empty()) {
		if (CommonMask.empty()) {
		CommonMask.assign(ExtMask.begin(), ExtMask.end());
		} else {
		SmallVector<int> NewMask(ExtMask.size(), UndefMaskElem);
		for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {
		if (ExtMask[I] == UndefMaskElem)
		continue;
		NewMask[I] = CommonMask[ExtMask[I]];
		}
		CommonMask.swap(NewMask);
		}
}		}
return ReuseShuffleCost + GatherCost - ScalarsCost;		if (CommonMask.empty()) {
		assert(InVectors.size() == 1 && "Expected only one vector with no mask");
		return Cost;
}		}
		if (InVectors.size() == 2)
		return Cost +
		createShuffle(InVectors.front(), InVectors.back(), CommonMask);
		return Cost + createShuffle(InVectors.front(), nullptr, CommonMask);
}		}
return ReuseShuffleCost + getGatherCost(VL);
		~ShuffleCostEstimator() {
		assert((IsFinalized \|\| CommonMask.empty()) &&
		"Shuffle construction must be finalized.");
		}
		};

		InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
		ArrayRef<Value *> VectorizedVals) {
		ArrayRef<Value *> VL = E->Scalars;

		Type *ScalarTy = VL[0]->getType();
		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
		ScalarTy = SI->getValueOperand()->getType();
		else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
		ABataevAuthorUnsubmitted Done Reply Inline Actions Both these cases are the existing code, just the diff is not quite correct because of the big differences. ABataev: Both these cases are the existing code, just the diff is not quite correct because of the big…
		ScalarTy = CI->getOperand(0)->getType();
		else if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))
		ScalarTy = IE->getOperand(1)->getType();
		auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());
		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

		// If we have computed a smaller type for the expression, update VecTy so
		// that the costs will be accurate.
		if (MinBWs.count(VL[0]))
		VecTy = FixedVectorType::get(
		IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());
		unsigned EntryVF = E->getVectorFactor();
		auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);

		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
		if (E->State == TreeEntry::NeedToGather) {
		if (allConstant(VL))
		return 0;
		if (isa<InsertElementInst>(VL[0]))
		return InstructionCost::getInvalid();
		return processBuildVector<ShuffleCostEstimator, InstructionCost>(
		E, TTI, VectorizedVals, this);
}		}
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!E->ReorderIndices.empty()) {		if (!E->ReorderIndices.empty()) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
if (E->getOpcode() == Instruction::Store) {		if (E->getOpcode() == Instruction::Store) {
// For stores the order is actually a mask.		// For stores the order is actually a mask.
NewMask.resize(E->ReorderIndices.size());		NewMask.resize(E->ReorderIndices.size());
▲ Show 20 Lines • Show All 1,742 Lines • ▼ Show 20 Lines	if (IsPHI \|\| (E->State != TreeEntry::NeedToGather &&
// Set the insertion point after the last instruction in the bundle. Set the		// Set the insertion point after the last instruction in the bundle. Set the
// debug location to Front.		// debug location to Front.
Builder.SetInsertPoint(LastInst->getParent(),		Builder.SetInsertPoint(LastInst->getParent(),
std::next(LastInst->getIterator()));		std::next(LastInst->getIterator()));
}		}
Builder.SetCurrentDebugLocation(Front->getDebugLoc());		Builder.SetCurrentDebugLocation(Front->getDebugLoc());
}		}

Value BoUpSLP::gather(ArrayRef<Value > VL) {		Value BoUpSLP::gather(ArrayRef<Value > VL, Value *Root) {
// List of instructions/lanes from current block and/or the blocks which are		// List of instructions/lanes from current block and/or the blocks which are
// part of the current loop. These instructions will be inserted at the end to		// part of the current loop. These instructions will be inserted at the end to
// make it possible to optimize loops and hoist invariant instructions out of		// make it possible to optimize loops and hoist invariant instructions out of
// the loops body with better chances for success.		// the loops body with better chances for success.
SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;		SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;
SmallSet<int, 4> PostponedIndices;		SmallSet<int, 4> PostponedIndices;
Loop *L = LI->getLoopFor(Builder.GetInsertBlock());		Loop *L = LI->getLoopFor(Builder.GetInsertBlock());
auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {		auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {
SmallPtrSet<BasicBlock *, 4> Visited;		SmallPtrSet<BasicBlock *, 4> Visited;
while (InsertBB && InsertBB != InstBB && Visited.insert(InsertBB).second)		while (InsertBB && InsertBB != InstBB && Visited.insert(InsertBB).second)
InsertBB = InsertBB->getSinglePredecessor();		InsertBB = InsertBB->getSinglePredecessor();
return InsertBB && InsertBB == InstBB;		return InsertBB && InsertBB == InstBB;
};		};
for (int I = 0, E = VL.size(); I < E; ++I) {		for (int I = 0, E = VL.size(); I < E; ++I) {
if (auto *Inst = dyn_cast<Instruction>(VL[I]))		if (auto *Inst = dyn_cast<Instruction>(VL[I]))
if ((CheckPredecessor(Inst->getParent(), Builder.GetInsertBlock()) \|\|		if ((CheckPredecessor(Inst->getParent(), Builder.GetInsertBlock()) \|\|
getTreeEntry(Inst) \|\| (L && (L->contains(Inst)))) &&		getTreeEntry(Inst) \|\|
		(L && (!Root \|\| L->isLoopInvariant(Root)) && L->contains(Inst))) &&
PostponedIndices.insert(I).second)		PostponedIndices.insert(I).second)
PostponedInsts.emplace_back(Inst, I);		PostponedInsts.emplace_back(Inst, I);
}		}

auto &&CreateInsertElement = [this](Value Vec, Value V, unsigned Pos) {		auto &&CreateInsertElement = [this](Value Vec, Value V, unsigned Pos) {
Vec = Builder.CreateInsertElement(Vec, V, Builder.getInt32(Pos));		Vec = Builder.CreateInsertElement(Vec, V, Builder.getInt32(Pos));
auto *InsElt = dyn_cast<InsertElementInst>(Vec);		auto *InsElt = dyn_cast<InsertElementInst>(Vec);
if (!InsElt)		if (!InsElt)
return Vec;		return Vec;
GatherShuffleExtractSeq.insert(InsElt);		GatherShuffleExtractSeq.insert(InsElt);
CSEBlocks.insert(InsElt->getParent());		CSEBlocks.insert(InsElt->getParent());
// Add to our 'need-to-extract' list.		// Add to our 'need-to-extract' list.
if (TreeEntry *Entry = getTreeEntry(V)) {		if (TreeEntry *Entry = getTreeEntry(V)) {
// Find which lane we need to extract.		// Find which lane we need to extract.
unsigned FoundLane = Entry->findLaneForValue(V);		unsigned FoundLane = Entry->findLaneForValue(V);
ExternalUses.emplace_back(V, InsElt, FoundLane);		ExternalUses.emplace_back(V, InsElt, FoundLane);
}		}
return Vec;		return Vec;
};		};
Value *Val0 =		Value *Val0 =
isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];		isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];
FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());		FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());
Value *Vec = PoisonValue::get(VecTy);		Value *Vec = Root ? Root : PoisonValue::get(VecTy);
SmallVector<int> NonConsts;		SmallVector<int> NonConsts;
// Insert constant values at first.		// Insert constant values at first.
for (int I = 0, E = VL.size(); I < E; ++I) {		for (int I = 0, E = VL.size(); I < E; ++I) {
if (PostponedIndices.contains(I))		if (PostponedIndices.contains(I))
continue;		continue;
if (!isConstant(VL[I])) {		if (!isConstant(VL[I])) {
NonConsts.push_back(I);		NonConsts.push_back(I);
continue;		continue;
}		}
		if (Root) {
		if (!isa<UndefValue>(VL[I])) {
		NonConsts.push_back(I);
		continue;
		}
		if (isa<PoisonValue>(VL[I]))
		continue;
		if (auto *SV = dyn_cast<ShuffleVectorInst>(Root)) {
		if (SV->getMaskValue(I) == UndefMaskElem)
		continue;
		}
		}
Vec = CreateInsertElement(Vec, VL[I], I);		Vec = CreateInsertElement(Vec, VL[I], I);
}		}
// Insert non-constant values.		// Insert non-constant values.
for (int I : NonConsts)		for (int I : NonConsts)
Vec = CreateInsertElement(Vec, VL[I], I);		Vec = CreateInsertElement(Vec, VL[I], I);
// Append instructions, which are/may be part of the loop, in the end to make		// Append instructions, which are/may be part of the loop, in the end to make
// it possible to hoist non-loop-based instructions.		// it possible to hoist non-loop-based instructions.
for (const std::pair<Value *, unsigned> &Pair : PostponedInsts)		for (const std::pair<Value *, unsigned> &Pair : PostponedInsts)
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	Value createShuffleVector(Value V1, ArrayRef<int> Mask) {
return V1;		return V1;
Value *Vec = Builder.CreateShuffleVector(V1, Mask);		Value *Vec = Builder.CreateShuffleVector(V1, Mask);
if (auto *I = dyn_cast<Instruction>(Vec)) {		if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherShuffleExtractSeq.insert(I);		GatherShuffleExtractSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
return Vec;		return Vec;
}		}
		Value createIdentity(Value V) { return V; }
		Value createPoison(Type Ty, unsigned VF) {
		return PoisonValue::get(FixedVectorType::get(Ty, VF));
		}
/// Resizes 2 input vector to match the sizes, if the they are not equal		/// Resizes 2 input vector to match the sizes, if the they are not equal
/// yet. The smallest vector is resized to the size of the larger vector.		/// yet. The smallest vector is resized to the size of the larger vector.
void resizeToMatch(Value &V1, Value &V2) {		void resizeToMatch(Value &V1, Value &V2) {
if (V1->getType() == V2->getType())		if (V1->getType() == V2->getType())
return;		return;
int V1VF = cast<FixedVectorType>(V1->getType())->getNumElements();		int V1VF = cast<FixedVectorType>(V1->getType())->getNumElements();
int V2VF = cast<FixedVectorType>(V2->getType())->getNumElements();		int V2VF = cast<FixedVectorType>(V2->getType())->getNumElements();
int VF = std::max(V1VF, V2VF);		int VF = std::max(V1VF, V2VF);
Show All 16 Lines	class BoUpSLP::ShuffleInstructionBuilder final : public BaseShuffleAnalysis {

/// Smart shuffle instruction emission, walks through shuffles trees and		/// Smart shuffle instruction emission, walks through shuffles trees and
/// tries to find the best matching vector for the actual shuffle		/// tries to find the best matching vector for the actual shuffle
/// instruction.		/// instruction.
Value createShuffle(Value V1, Value *V2, ArrayRef<int> Mask) {		Value createShuffle(Value V1, Value *V2, ArrayRef<int> Mask) {
assert(V1 && "Expected at least one vector value.");		assert(V1 && "Expected at least one vector value.");
ShuffleIRBuilder ShuffleBuilder(Builder, R.GatherShuffleExtractSeq,		ShuffleIRBuilder ShuffleBuilder(Builder, R.GatherShuffleExtractSeq,
R.CSEBlocks);		R.CSEBlocks);
return BaseShuffleAnalysis::createShuffle(V1, V2, Mask, ShuffleBuilder);		return BaseShuffleAnalysis::createShuffle<Value *>(V1, V2, Mask,
		ShuffleBuilder);
}		}

/// Transforms mask \p CommonMask per given \p Mask to make proper set after		/// Transforms mask \p CommonMask per given \p Mask to make proper set after
/// shuffle emission.		/// shuffle emission.
static void transformMaskAfterShuffle(MutableArrayRef<int> CommonMask,		static void transformMaskAfterShuffle(MutableArrayRef<int> CommonMask,
ArrayRef<int> Mask) {		ArrayRef<int> Mask) {
for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
if (Mask[Idx] != UndefMaskElem)		if (Mask[Idx] != UndefMaskElem)
CommonMask[Idx] = Idx;		CommonMask[Idx] = Idx;
}		}

public:		public:
ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)		ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)
: Builder(Builder), R(R) {}		: Builder(Builder), R(R) {}

		Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask) {
		Value *VecBase = nullptr;
		for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
		int Idx = Mask[I];
		if (Idx == UndefMaskElem)
		continue;
		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
		VecBase = EI->getVectorOperand();
		// If all users are vectorized - can delete the extractelement itself.
		if (any_of(EI->users(),
		[&](User *U) { return !R.ScalarToTreeEntry.count(U); }))
		continue;
		R.eraseInstruction(EI);
		}
		return VecBase;
		}
		std::optional<Value > needToDelay(const TreeEntry E,
		ArrayRef<const TreeEntry *> Deps) const {
		// No need to delay emission if all deps are ready.
		if (all_of(Deps, [](const TreeEntry *TE) { return TE->VectorizedValue; }))
		return std::nullopt;
		// Postpone gather emission, will be emitted after the end of the
		// process to keep correct order.
		auto *VecTy = FixedVectorType::get(E->Scalars.front()->getType(),
		E->getVectorFactor());
		Value *Vec = Builder.CreateAlignedLoad(
		VecTy, PoisonValue::get(VecTy->getPointerTo()), MaybeAlign());
		return Vec;
		}
		void add(const TreeEntry E1, const TreeEntry E2, ArrayRef<int> Mask) {
		add(E1->VectorizedValue, E2->VectorizedValue, Mask);
		}
		void add(const TreeEntry *E1, ArrayRef<int> Mask) {
		add(E1->VectorizedValue, Mask);
		}
/// Adds 2 input vectors and the mask for their shuffling.		/// Adds 2 input vectors and the mask for their shuffling.
void add(Value V1, Value V2, ArrayRef<int> Mask) {		void add(Value V1, Value V2, ArrayRef<int> Mask) {
assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");		assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");
if (InVectors.empty()) {		if (InVectors.empty()) {
InVectors.push_back(V1);		InVectors.push_back(V1);
InVectors.push_back(V2);		InVectors.push_back(V2);
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
return;		return;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);		CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : VF);
}		}
/// Adds another one input vector and the mask for the shuffling.		/// Adds another one input vector and the mask for the shuffling.
void addOrdered(Value *V1, ArrayRef<unsigned> Order) {		void addOrdered(Value *V1, ArrayRef<unsigned> Order) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
inversePermutation(Order, NewMask);		inversePermutation(Order, NewMask);
add(V1, NewMask);		add(V1, NewMask);
}		}
		Value gather(ArrayRef<Value > VL, Value *Root = nullptr) {
		return R.gather(VL, Root);
		}
		Value createFreeze(Value V) { return Builder.CreateFreeze(V); }
/// Finalize emission of the shuffles.		/// Finalize emission of the shuffles.
Value *		Value *
finalize(ArrayRef<int> ExtMask = std::nullopt) {		finalize(ArrayRef<int> ExtMask,
		function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {
IsFinalized = true;		IsFinalized = true;
		if (Action) {
		Value *Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Vec = createShuffle(Vec, InVectors.back(), CommonMask);
		InVectors.pop_back();
		} else {
		Vec = createShuffle(Vec, nullptr, CommonMask);
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		Action(Vec, CommonMask);
		InVectors.front() = Vec;
		}
if (!ExtMask.empty()) {		if (!ExtMask.empty()) {
if (CommonMask.empty()) {		if (CommonMask.empty()) {
CommonMask.assign(ExtMask.begin(), ExtMask.end());		CommonMask.assign(ExtMask.begin(), ExtMask.end());
} else {		} else {
SmallVector<int> NewMask(ExtMask.size(), UndefMaskElem);		SmallVector<int> NewMask(ExtMask.size(), UndefMaskElem);
for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {		for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {
if (ExtMask[I] == UndefMaskElem)		if (ExtMask[I] == UndefMaskElem)
continue;		continue;
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	if (E->getOpcode() != Instruction::InsertElement &&
E->getOpcode() != Instruction::PHI) {		E->getOpcode() != Instruction::PHI) {
Instruction *LastInst = EntryToLastInstruction.lookup(E);		Instruction *LastInst = EntryToLastInstruction.lookup(E);
assert(LastInst && "Failed to find last instruction in bundle");		assert(LastInst && "Failed to find last instruction in bundle");
Builder.SetInsertPoint(LastInst);		Builder.SetInsertPoint(LastInst);
}		}
return vectorizeTree(I->get());		return vectorizeTree(I->get());
}		}

Value BoUpSLP::createBuildVector(const TreeEntry E) {		template <typename BVTy, typename ResTy, typename... Args>
		ResTy BoUpSLP::processBuildVector(const TreeEntry *E, Args &...Params) {
assert(E->State == TreeEntry::NeedToGather && "Expected gather node.");		assert(E->State == TreeEntry::NeedToGather && "Expected gather node.");
unsigned VF = E->getVectorFactor();		unsigned VF = E->getVectorFactor();

auto AdjustExtracts = [&](const TreeEntry *E, ArrayRef<int> Mask) {
Value *VecBase = nullptr;
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
int Idx = Mask[I];
if (Idx == UndefMaskElem)
continue;
auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
VecBase = EI->getVectorOperand();
// TODO: EI can be erased, if all its users are vectorized. But need to
// emit shuffles for such extractelement instructions.
}
return VecBase;
};
auto CreateShuffle = [&](Value V1, Value V2, ArrayRef<int> Mask) {
unsigned VF1 = cast<FixedVectorType>(V1->getType())->getNumElements();
unsigned VF2 = cast<FixedVectorType>(V2->getType())->getNumElements();
unsigned VF = std::max(VF1, VF2);
if (VF1 != VF2) {
SmallVector<int> ExtMask(VF, UndefMaskElem);
std::iota(ExtMask.begin(), std::next(ExtMask.begin(), std::min(VF1, VF2)),
0);
if (VF1 < VF2) {
V1 = Builder.CreateShuffleVector(V1, ExtMask);
if (auto *I = dyn_cast<Instruction>(V1)) {
GatherShuffleExtractSeq.insert(I);
CSEBlocks.insert(I->getParent());
}
} else {
V2 = Builder.CreateShuffleVector(V2, ExtMask);
if (auto *I = dyn_cast<Instruction>(V2)) {
GatherShuffleExtractSeq.insert(I);
CSEBlocks.insert(I->getParent());
}
}
}
const int Limit = Mask.size() * 2;
if (V1 == V2 && Mask.size() == VF &&
all_of(Mask, [=](int Idx) { return Idx < Limit; }) &&
(ShuffleVectorInst::isIdentityMask(Mask) \|\|
(ShuffleVectorInst::isZeroEltSplatMask(Mask) &&
isa<ShuffleVectorInst>(V1) &&
cast<ShuffleVectorInst>(V1)->getShuffleMask() == Mask)))
return V1;
Value *Vec = V1 == V2 ? Builder.CreateShuffleVector(V1, Mask)
: Builder.CreateShuffleVector(V1, V2, Mask);
if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherShuffleExtractSeq.insert(I);
CSEBlocks.insert(I->getParent());
}
return Vec;
};
auto NeedToDelay = [=](const TreeEntry *E,
ArrayRef<const TreeEntry > Deps) -> Value {
// No need to delay emission if all deps are ready.
if (all_of(Deps, [](const TreeEntry *TE) { return TE->VectorizedValue; }))
return nullptr;
// Postpone gather emission, will be emitted after the end of the
// process to keep correct order.
auto *VecTy = FixedVectorType::get(E->Scalars.front()->getType(),
E->getVectorFactor());
Value *Vec = Builder.CreateAlignedLoad(
VecTy, PoisonValue::get(VecTy->getPointerTo()), MaybeAlign());
return Vec;
};

bool NeedFreeze = false;		bool NeedFreeze = false;
		bool RebuiltVector = false;
SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),		SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),
E->ReuseShuffleIndices.end());		E->ReuseShuffleIndices.end());
SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());		SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());
// Build a mask out of the reorder indices and reorder scalars per this		// Build a mask out of the redorder indices and reorder scalars per this mask.
// mask.
SmallVector<int> ReorderMask;		SmallVector<int> ReorderMask;
inversePermutation(E->ReorderIndices, ReorderMask);		inversePermutation(E->ReorderIndices, ReorderMask);
if (!ReorderMask.empty())		if (!ReorderMask.empty())
reorderScalars(GatheredScalars, ReorderMask);		reorderScalars(GatheredScalars, ReorderMask);
		auto FindReusedSplat = [&](SmallVectorImpl<int> &Mask) {
ShuffleInstructionBuilder ShuffleBuilder(Builder, *this);		if (!isSplat(E->Scalars) \|\| none_of(E->Scalars, [](Value *V) {
Value *Vec = nullptr;		return isa<UndefValue>(V) && !isa<PoisonValue>(V);
		}))
		return false;
		TreeEntry *UserTE = E->UserTreeIndices.back().UserTE;
		unsigned EdgeIdx = E->UserTreeIndices.back().EdgeIdx;
		if (UserTE->getNumOperands() != 2)
		return false;
		auto *It =
		find_if(VectorizableTree, [=](const std::unique_ptr<TreeEntry> &TE) {
		return find_if(TE->UserTreeIndices, [=](const EdgeInfo &EI) {
		return EI.UserTE == UserTE && EI.EdgeIdx != EdgeIdx;
		}) != TE->UserTreeIndices.end();
		});
		if (It == VectorizableTree.end())
		return false;
		unsigned I =
		*find_if_not(Mask, [](int Idx) { return Idx == UndefMaskElem; });
		int Sz = Mask.size();
		if (all_of(Mask, [Sz](int Idx) { return Idx < 2 * Sz; }) &&
		ShuffleVectorInst::isIdentityMask(Mask))
		std::iota(Mask.begin(), Mask.end(), 0);
		else
		std::fill(Mask.begin(), Mask.end(), I);
		return true;
		};
		BVTy GatherBuilder(Params...);
		ResTy Res = ResTy();
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<int> ExtractMask;		SmallVector<int> ExtractMask;
SmallVector<int> ReuseMask;		SmallVector<int> ReuseMask;
std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;		std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;
std::optional<TargetTransformInfo::ShuffleKind> GatherShuffle;		std::optional<TargetTransformInfo::ShuffleKind> GatherShuffle;
SmallVector<const TreeEntry *> Entries;		SmallVector<const TreeEntry *> Entries;
Type *ScalarTy = GatheredScalars.front()->getType();		Type *ScalarTy = GatheredScalars.front()->getType();
		bool IsNonPoisoned = true;
		bool IsUsedInExpr = false;
		SmallVector<const TreeEntry *, 2> ReusedEntries;
if (!all_of(GatheredScalars, UndefValue::classof)) {		if (!all_of(GatheredScalars, UndefValue::classof)) {
// Check for gathered extracts.		// Check for gathered extracts.
ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);		ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);
SmallVector<Value *> IgnoredVals;		SmallVector<Value *> IgnoredVals;
if (UserIgnoreList)		if (UserIgnoreList)
IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());		IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());
		// Need to remove vectorized extracelement instructions.
		Value *VecBase = GatherBuilder.adjustExtracts(E, ExtractMask);
bool Resized = false;		bool Resized = false;
if (Value *VecBase = AdjustExtracts(E, ExtractMask))		if (VecBase)
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))		if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {		if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {
Resized = true;		Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
}		}
// Gather extracts after we check for full matched gathers only.		// Gather extracts after we check for full matched gathers only.
if (ExtractShuffle \|\| E->getOpcode() != Instruction::Load \|\|		if (E->getOpcode() != Instruction::Load \|\| E->isAltShuffle() \|\|
E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|		all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|		isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {		(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {
GatherShuffle = isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);		GatherShuffle = isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);
}		}
if (GatherShuffle) {		if (GatherShuffle) {
if (Value *Delayed = NeedToDelay(E, Entries)) {		if (std::optional<ResTy> Delayed =
		GatherBuilder.needToDelay(E, Entries)) {
// Delay emission of gathers which are not ready yet.		// Delay emission of gathers which are not ready yet.
PostponedGathers.insert(E);		PostponedGathers.insert(E);
// Postpone gather emission, will be emitted after the end of the		// Postpone gather emission, will be emitted after the end of the
// process to keep correct order.		// process to keep correct order.
return Delayed;		return *Delayed;
}		}
assert((Entries.size() == 1 \|\| Entries.size() == 2) &&		assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");		"Expected shuffle of 1 or 2 entries.");
if (!Resized) {		if (!Resized) {
unsigned VF1 = Entries.front()->getVectorFactor();		unsigned VF1 = Entries.front()->getVectorFactor();
unsigned VF2 = Entries.back()->getVectorFactor();		unsigned VF2 = Entries.back()->getVectorFactor();
if ((VF == VF1 \|\| VF == VF2) && GatheredScalars.size() != VF)		if ((VF == VF1 && GatheredScalars.size() != VF1) \|\|
		(VF == VF2 && GatheredScalars.size() != VF2))
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
}		}
		if (*GatherShuffle == TTI::SK_PermuteSingleSrc)
		IsUsedInExpr = FindReusedSplat(Mask);
// Remove shuffled elements from list of gathers.		// Remove shuffled elements from list of gathers.
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {		for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
if (Mask[I] != UndefMaskElem)		if (Mask[I] != UndefMaskElem)
GatheredScalars[I] = PoisonValue::get(ScalarTy);		GatheredScalars[I] = PoisonValue::get(ScalarTy);
}		}
		if (Entries.front()->VectorizedValue)
		IsNonPoisoned &=
		isGuaranteedNotToBePoison(Entries.front()->VectorizedValue);
		ReusedEntries.push_back(Entries.front());
		if (Entries.size() == 1) {
		GatherBuilder.add(Entries.front(), Mask);
		} else {
		if (Entries.back()->VectorizedValue)
		IsNonPoisoned &=
		isGuaranteedNotToBePoison(Entries.back()->VectorizedValue);
		GatherBuilder.add(Entries.front(), Entries.back(), Mask);
		ReusedEntries.push_back(Entries.back());
}		}
}		} else if (!allConstant(GatheredScalars)) {
if ((ExtractShuffle \|\| GatherShuffle) &&		// For splats we can emit broadcasts instead of gathers, so try to find
all_of(GatheredScalars, PoisonValue::classof)) {
Value *Vec1 = nullptr;
if (ExtractShuffle) {
// Gather of extractelements can be represented as just a shuffle of
// a single/two vectors the scalars are extracted from.
// Find input vectors.
Value *Vec2 = nullptr;
for (unsigned I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
if (ExtractMask[I] == UndefMaskElem \|\|
(!Mask.empty() && Mask[I] != UndefMaskElem)) {
ExtractMask[I] = UndefMaskElem;
continue;
}
if (isa<UndefValue>(E->Scalars[I]))
continue;
auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
if (!Vec1) {
Vec1 = EI->getVectorOperand();
} else if (Vec1 != EI->getVectorOperand()) {
assert((!Vec2 \|\| Vec2 == EI->getVectorOperand()) &&
"Expected only 1 or 2 vectors shuffle.");
Vec2 = EI->getVectorOperand();
}
}
if (Vec2)
Vec1 = CreateShuffle(Vec1, Vec2, ExtractMask);
else if (Vec1)
Vec1 = CreateShuffle(Vec1, Vec1, ExtractMask);
else
Vec1 = PoisonValue::get(
FixedVectorType::get(ScalarTy, GatheredScalars.size()));
}
if (GatherShuffle) {
Vec = CreateShuffle(Entries.front()->VectorizedValue,
Entries.back()->VectorizedValue, Mask);
if (Vec1) {
// Build final mask.
for (auto [I, Idx] : enumerate(Mask)) {
if (ExtractMask[I] != UndefMaskElem)
Idx = I;
else if (Idx != UndefMaskElem)
Idx = I + VF;
}
Vec = CreateShuffle(Vec1, Vec, Mask);
}
} else {
Vec = Vec1;
}
} else if (!allConstant(E->Scalars)) {
// TODO: remove this code once able to combine shuffled vectors and build
// vector elements.
copy(E->Scalars, GatheredScalars.begin());
// For splats with can emit broadcasts instead of gathers, so try to find
// such sequences.		// such sequences.
bool IsSplat = isSplat(GatheredScalars) &&		bool IsSplat = isSplat(GatheredScalars) &&
(GatheredScalars.size() > 2 \|\|		(GatheredScalars.size() > 2 \|\|
GatheredScalars.front() == GatheredScalars.back());		GatheredScalars.front() == GatheredScalars.back());
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
ReuseMask.assign(VF, UndefMaskElem);		ReuseMask.assign(VF, UndefMaskElem);
SmallVector<int> UndefPos;		SmallVector<int> UndefPos;
DenseMap<Value *, unsigned> UniquePositions;		DenseMap<Value *, unsigned> UniquePositions;
// Gather unique non-const values and all constant values.		// Gather unique non-const values and all constant values.
// For repeated values, just shuffle them.		// For repeated values, just shuffle them.
int NumNonConsts = 0;		int NumNonConsts = 0;
int SinglePos = 0;		int SinglePos = 0;
for (auto [I, V] : enumerate(GatheredScalars)) {		for (auto [I, V] : enumerate(GatheredScalars)) {
if (isa<UndefValue>(V)) {		if (isa<UndefValue>(V)) {
if (!isa<PoisonValue>(V)) {		if (!isa<PoisonValue>(V)) {
ReuseMask[I] = I;		ReuseMask[I] = I;
UndefPos.push_back(I);		UndefPos.push_back(I);
}		}
continue;		continue;
}		}
if (isConstant(V)) {		if (isConstant(V)) {
ReuseMask[I] = I;		ReuseMask[I] = I;
continue;		continue;
}		}
++NumNonConsts;		++NumNonConsts;
SinglePos = I;		SinglePos = I;
Value *OrigV = V;		Value *OrigV = V;
GatheredScalars[I] = PoisonValue::get(ScalarTy);		GatheredScalars[I] = PoisonValue::get(ScalarTy);
if (IsSplat) {		if (IsSplat) {
		RebuiltVector \|= I != 0;
GatheredScalars.front() = OrigV;		GatheredScalars.front() = OrigV;
ReuseMask[I] = 0;		ReuseMask[I] = 0;
} else {		} else {
const auto Res = UniquePositions.try_emplace(OrigV, I);		const auto Res = UniquePositions.try_emplace(OrigV, I);
		RebuiltVector \|= Res.first->second != I;
GatheredScalars[Res.first->second] = OrigV;		GatheredScalars[Res.first->second] = OrigV;
ReuseMask[I] = Res.first->second;		ReuseMask[I] = Res.first->second;
}		}
}		}
if (NumNonConsts == 1) {		if (NumNonConsts == 1) {
// Restore single insert element.		// Restore single insert element.
		RebuiltVector = false;
if (IsSplat) {		if (IsSplat) {
ReuseMask.assign(VF, UndefMaskElem);		ReuseMask.assign(VF, UndefMaskElem);
std::swap(GatheredScalars.front(), GatheredScalars[SinglePos]);		std::swap(GatheredScalars.front(), GatheredScalars[SinglePos]);
if (!UndefPos.empty() && UndefPos.front() == 0)		if (!UndefPos.empty() && UndefPos.front() == 0)
GatheredScalars.front() = UndefValue::get(ScalarTy);		GatheredScalars.front() = UndefValue::get(ScalarTy);
}		}
ReuseMask[SinglePos] = SinglePos;		ReuseMask[SinglePos] = SinglePos;
} else if (!UndefPos.empty() && IsSplat) {		} else if (!UndefPos.empty() && IsSplat) {
// For undef values, try to replace them with the simple broadcast.		// For undef values, try to replace them with the simple broadcast.
// We can do it if the broadcasted value is guaranteed to be		// We can do it if the broadcasted value is guaranteed to be
// non-poisonous, or by freezing the incoming scalar value first.		// non-poisonous, or by freezing the incoming scalar value first.
auto It = find_if(GatheredScalars, [this, E](Value V) {		auto It = find_if(GatheredScalars, [this, E](Value V) {
return !isa<UndefValue>(V) &&		return !isa<UndefValue>(V) &&
(getTreeEntry(V) \|\| isGuaranteedNotToBePoison(V) \|\|		(getTreeEntry(V) \|\| isGuaranteedNotToBePoison(V) \|\|
(E->UserTreeIndices.size() == 1 &&		(E->UserTreeIndices.size() == 1 &&
any_of(V->uses(), [E](const Use &U) {		any_of(V->uses(), [E](const Use &U) {
// Check if the value already used in the same operation in		// Check if the value already used in the same operation in
// one of the nodes already.		// one of the nodes already.
return E->UserTreeIndices.front().EdgeIdx !=		return E->UserTreeIndices.front().EdgeIdx !=
U.getOperandNo() &&		U.getOperandNo() &&
is_contained(		is_contained(
E->UserTreeIndices.front().UserTE->Scalars,		E->UserTreeIndices.front().UserTE->Scalars,
U.getUser());		U.getUser());
})));		})));
});		});
if (It != GatheredScalars.end()) {		if (It != GatheredScalars.end()) {
// Replace undefs by the non-poisoned scalars and emit broadcast.		// Replace undefs by the non-poisoned scalars and emit broadcast.
int Pos = std::distance(GatheredScalars.begin(), It);		int Pos = std::distance(GatheredScalars.begin(), It);
for_each(UndefPos, [&](int I) {		for_each(UndefPos, [&](int I) {
// Set the undef position to the non-poisoned scalar.		// Set the undef position to the non-poisoned scalar.
ReuseMask[I] = Pos;		ReuseMask[I] = Pos;
// Replace the undef by the poison, in the mask it is replaced by		// Replace the undef by the poison, in the mask it is replaced by
// non-poisoned scalar already.		// non-poisoned scalar already.
if (I != Pos)		if (I != Pos)
GatheredScalars[I] = PoisonValue::get(ScalarTy);		GatheredScalars[I] = PoisonValue::get(ScalarTy);
});		});
} else {		} else {
// Replace undefs by the poisons, emit broadcast and then emit		// Replace undefs by the poisons, emit broadcast and then emit
// freeze.		// freeze.
for_each(UndefPos, [&](int I) {		for_each(UndefPos, [&](int I) {
ReuseMask[I] = UndefMaskElem;		ReuseMask[I] = UndefMaskElem;
if (isa<UndefValue>(GatheredScalars[I]))		if (isa<UndefValue>(GatheredScalars[I]))
GatheredScalars[I] = PoisonValue::get(ScalarTy);		GatheredScalars[I] = PoisonValue::get(ScalarTy);
});		});
NeedFreeze = true;		NeedFreeze = true;
}		}
}		}
// Gather unique scalars and all constants.		}
Vec = gather(GatheredScalars);		}
		// Combine generated extracts mask and reused scalars masks and
		// corresponding input vectors.
		if (ExtractShuffle) {
		// Gather of extractelements can be represented as just a shuffle of
		// a single/two vectors the scalars are extracted from.
		// Find input vectors.
		Value *Vec1 = nullptr;
		Value *Vec2 = nullptr;
		if (*ExtractShuffle == TTI::SK_PermuteSingleSrc)
		IsUsedInExpr = FindReusedSplat(ExtractMask);
		for (unsigned I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
		if (ExtractMask[I] == UndefMaskElem \|\|
		(!Mask.empty() && Mask[I] != UndefMaskElem)) {
		ExtractMask[I] = UndefMaskElem;
		continue;
		}
		if (isa<UndefValue>(E->Scalars[I]))
		continue;
		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
		if (!Vec1) {
		Vec1 = EI->getVectorOperand();
		} else if (Vec1 != EI->getVectorOperand()) {
		assert((!Vec2 \|\| Vec2 == EI->getVectorOperand()) &&
		"Expected only 1 or 2 vectors shuffle.");
		Vec2 = EI->getVectorOperand();
		}
		}
		if (Vec2) {
		IsNonPoisoned &=
		isGuaranteedNotToBePoison(Vec1) && isGuaranteedNotToBePoison(Vec2);
		GatherBuilder.add(Vec1, Vec2, ExtractMask);
		} else if (Vec1) {
		IsNonPoisoned &= isGuaranteedNotToBePoison(Vec1);
		GatherBuilder.add(Vec1, ExtractMask);
		} else {
		GatherBuilder.add(PoisonValue::get(FixedVectorType::get(
		ScalarTy, GatheredScalars.size())),
		ExtractMask);
		}
		}
		if (ExtractShuffle \|\| GatherShuffle) {
		// Insert non-constant scalars.
		SmallVector<Value *> NonConstants(GatheredScalars);
		int EMSz = ExtractMask.size();
		int MSz = Mask.size();
		bool EnoughConsts =
		!RebuiltVector && (!ExtractShuffle \|\| !GatherShuffle) &&
		((ExtractShuffle &&
		(*ExtractShuffle != TTI::SK_PermuteSingleSrc \|\|
		any_of(ExtractMask, [&](int I) { return I >= EMSz; }) \|\|
		!ShuffleVectorInst::isIdentityMask(ExtractMask))) \|\|
		(GatherShuffle && (*GatherShuffle != TTI::SK_PermuteSingleSrc \|\|
		any_of(Mask, [&](int I) { return I >= MSz; }) \|\|
		!ShuffleVectorInst::isIdentityMask(Mask))) \|\|
		count_if(GatheredScalars, [](Value *V) {
		return isa<Constant>(V) && !isa<PoisonValue>(V);
		}) > 1);
		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
		if (EnoughConsts && isa<Constant>(GatheredScalars[I]))
		NonConstants[I] = PoisonValue::get(ScalarTy);
		else
		GatheredScalars[I] = PoisonValue::get(ScalarTy);
		}
		// Generate constants for final shuffle.
		if (!all_of(GatheredScalars, UndefValue::classof)) {
		Mask.assign(GatheredScalars.size(), UndefMaskElem);
		Value *VecVal = GatherBuilder.gather(GatheredScalars);
		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
		if (!isa<UndefValue>(GatheredScalars[I]))
		Mask[I] = I;
		}
		GatherBuilder.add(VecVal, Mask);
		IsNonPoisoned &= isGuaranteedNotToBePoison(VecVal);
		}
		NeedFreeze = !IsNonPoisoned && !IsUsedInExpr &&
		any_of(GatheredScalars, [](Value *V) {
		return isa<UndefValue>(V) && !isa<PoisonValue>(V);
		});
		// Emit final insertelement instructions for defined values.
		if (!RebuiltVector && !all_of(NonConstants, UndefValue::classof)) {
		Res = GatherBuilder.finalize(
		ReuseShuffleIndicies, [&](Value *&Vec, SmallVectorImpl<int> &Mask) {
		Vec = GatherBuilder.gather(NonConstants, Vec);
		for (unsigned I = 0, Sz = Mask.size(); I < Sz; ++I)
		if ((!EnoughConsts && !isa<PoisonValue>(NonConstants[I])) \|\|
		!isa<Constant>(NonConstants[I]))
		Mask[I] = I;
		});
} else {		} else {
// Gather all constants.		if (RebuiltVector && !all_of(NonConstants, UndefValue::classof)) {
Vec = gather(E->Scalars);		// Just generate simple gather, no reused scalars/extracts.
		Value *BV = GatherBuilder.gather(NonConstants);
		GatherBuilder.add(BV, ReuseMask);
		}
		Res = GatherBuilder.finalize(ReuseShuffleIndicies);
		}
		} else {
		// Just generate simple gather, no reused scalars/extracts.
		Value *BV = GatherBuilder.gather(GatheredScalars);
		GatherBuilder.add(BV, ReuseMask);
		Res = GatherBuilder.finalize(ReuseShuffleIndicies);
}		}

ShuffleBuilder.add(Vec, ReuseMask);
Vec = ShuffleBuilder.finalize(E->ReuseShuffleIndices);
if (NeedFreeze)		if (NeedFreeze)
Vec = Builder.CreateFreeze(Vec);		Res = GatherBuilder.createFreeze(Res);
return Vec;		return Res;
		}

		Value BoUpSLP::createBuildVector(const TreeEntry E) {
		return processBuildVector<ShuffleInstructionBuilder, Value *>(E, Builder,
		*this);
}		}

Value BoUpSLP::vectorizeTree(TreeEntry E) {		Value BoUpSLP::vectorizeTree(TreeEntry E) {
IRBuilder<>::InsertPointGuard Guard(Builder);		IRBuilder<>::InsertPointGuard Guard(Builder);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
Show All 14 Lines	if (E->getOpcode() == Instruction::Store) {
ArrayRef(reinterpret_cast<const int *>(E->ReorderIndices.begin()),		ArrayRef(reinterpret_cast<const int *>(E->ReorderIndices.begin()),
E->ReorderIndices.size());		E->ReorderIndices.size());
ShuffleBuilder.add(V, Mask);		ShuffleBuilder.add(V, Mask);
} else {		} else {
ShuffleBuilder.addOrdered(V, E->ReorderIndices);		ShuffleBuilder.addOrdered(V, E->ReorderIndices);
}		}
return ShuffleBuilder.finalize(E->ReuseShuffleIndices);		return ShuffleBuilder.finalize(E->ReuseShuffleIndices);
};		};

assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();
if (auto *Store = dyn_cast<StoreInst>(VL0))		if (auto *Store = dyn_cast<StoreInst>(VL0))
ScalarTy = Store->getValueOperand()->getType();		ScalarTy = Store->getValueOperand()->getType();
else if (auto *IE = dyn_cast<InsertElementInst>(VL0))		else if (auto *IE = dyn_cast<InsertElementInst>(VL0))
ScalarTy = IE->getOperand(1)->getType();		ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());		auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
assert((E->ReorderIndices.empty() \|\|		assert((E->ReorderIndices.empty() \|\|
E != VectorizableTree.front().get() \|\|		E != VectorizableTree.front().get() \|\|
		nlopesUnsubmitted Not Done Reply Inline Actions Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you! nlopes: Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, thanks! ABataev: Sure, thanks!
!E->UserTreeIndices.empty()) &&		!E->UserTreeIndices.empty()) &&
"PHI reordering is free.");		"PHI reordering is free.");
auto *PH = cast<PHINode>(VL0);		auto *PH = cast<PHINode>(VL0);
Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());		Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());		PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());
Value *V = NewPhi;		Value *V = NewPhi;

▲ Show 20 Lines • Show All 4,934 Lines • Show Last 20 Lines

llvm/test/DebugInfo/Generic/assignment-tracking/slp-vectorizer/merge-scalars.ll

	Show All 17 Lines
	;;			;;
	;; Generated by grabbingthe IR before SLP in:			;; Generated by grabbingthe IR before SLP in:
	;; $ clang++ -O2 -g test.cpp -Xclang -fexperimental-assignment-tracking			;; $ clang++ -O2 -g test.cpp -Xclang -fexperimental-assignment-tracking

	;; Test that dbg.assigns linked to the the scalar stores to quad get linked to			;; Test that dbg.assigns linked to the the scalar stores to quad get linked to
	;; the vector store that replaces them.			;; the vector store that replaces them.

	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR:[0-9]+]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32), metadata ![[ID:[0-9]+]], metadata ptr %arrayidx, metadata !DIExpression())			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR:[0-9]+]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32), metadata ![[ID:[0-9]+]], metadata ptr %arrayidx, metadata !DIExpression())
				; CHECK: store <2 x float> {{.*}} !DIAssignID ![[ID]]
	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 4))			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 4))
	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 8))			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 64, 32), metadata ![[ID1:[0-9]+]], metadata ptr %arrayidx7, metadata !DIExpression())
	; CHECK: store <4 x float> {{.*}} !DIAssignID ![[ID]]			; CHECK: store <2 x float> {{.*}} !DIAssignID ![[ID1]]
	; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 96, 32), metadata ![[ID]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 12))			; CHECK: call void @llvm.dbg.assign(metadata float undef, metadata ![[VAR]], metadata !DIExpression(DW_OP_LLVM_fragment, 96, 32), metadata ![[ID1]], metadata ptr %quad, metadata !DIExpression(DW_OP_plus_uconst, 12))

	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	define dso_local void @_Z3funffff(float %k1, float %k2, float %k3, float %k4) local_unnamed_addr #0 !dbg !7 {			define dso_local void @_Z3funffff(float %k1, float %k2, float %k3, float %k4) local_unnamed_addr #0 !dbg !7 {
	entry:			entry:
	%quad = alloca [4 x float], align 16, !DIAssignID !27			%quad = alloca [4 x float], align 16, !DIAssignID !27
	call void @llvm.dbg.assign(metadata i1 undef, metadata !16, metadata !DIExpression(), metadata !27, metadata ptr %quad, metadata !DIExpression()), !dbg !23			call void @llvm.dbg.assign(metadata i1 undef, metadata !16, metadata !DIExpression(), metadata !27, metadata ptr %quad, metadata !DIExpression()), !dbg !23
	call void @llvm.dbg.assign(metadata float %k1, metadata !12, metadata !DIExpression(), metadata !30, metadata ptr undef, metadata !DIExpression()), !dbg !23			call void @llvm.dbg.assign(metadata float %k1, metadata !12, metadata !DIExpression(), metadata !30, metadata ptr undef, metadata !DIExpression()), !dbg !23
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=aarch64 -aarch64-insert-extract-base-cost=3 \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=aarch64 -aarch64-insert-extract-base-cost=3 \| FileCheck %s

	define void @test(<2 x i64> %0, <2 x i64> %1, <2 x i64> %2) {			define void @test(<2 x i64> %0, <2 x i64> %1, <2 x i64> %2) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i64> [[TMP1:%.]], i64 0			; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i64> [[TMP2:%.]], i64 0
	; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP4]], 0			; CHECK-NEXT: [[TMP5:%.]] = shufflevector <2 x i64> [[TMP1:%.]], <2 x i64> [[TMP0:%.*]], <4 x i32> <i32 0, i32 2, i32 undef, i32 2>
	; CHECK-NEXT: [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i64> [[TMP5]], i64 [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP7:%.]] = extractelement <2 x i64> [[TMP0:%.]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 0			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i64> [[TMP7]], <4 x i64> <i64 0, i64 0, i64 poison, i64 0>, <4 x i32> <i32 4, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP9:%.*]] = trunc i64 [[TMP8]] to i32			; CHECK-NEXT: [[TMP9:%.*]] = or <4 x i64> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = extractelement <2 x i64> [[TMP2:%.]], i64 0			; CHECK-NEXT: [[TMP10:%.*]] = trunc <4 x i64> [[TMP9]] to <4 x i32>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP2]], i64 1			; CHECK-NEXT: br label [[TMP11:%.*]]
	; CHECK-NEXT: [[TMP12:%.*]] = or i64 [[TMP10]], [[TMP11]]			; CHECK: 11:
	; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[TMP12]] to i32			; CHECK-NEXT: [[TMP12:%.]] = phi <4 x i32> [ [[TMP16:%.]], [[TMP11]] ], [ [[TMP10]], [[TMP3:%.*]] ]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP0]], i64 0			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP12]], <4 x i32> <i32 poison, i32 0, i32 0, i32 0>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP15:%.*]] = or i64 [[TMP14]], 0			; CHECK-NEXT: [[TMP14:%.*]] = or <4 x i32> zeroinitializer, [[TMP13]]
	; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32			; CHECK-NEXT: [[TMP15:%.*]] = add <4 x i32> zeroinitializer, [[TMP13]]
	; CHECK-NEXT: br label [[TMP17:%.*]]			; CHECK-NEXT: [[TMP16]] = shufflevector <4 x i32> [[TMP14]], <4 x i32> [[TMP15]], <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK: 17:			; CHECK-NEXT: br label [[TMP11]]
	; CHECK-NEXT: [[TMP18:%.]] = phi i32 [ [[TMP22:%.]], [[TMP17]] ], [ [[TMP6]], [[TMP3:%.*]] ]
	; CHECK-NEXT: [[TMP19:%.*]] = phi i32 [ 0, [[TMP17]] ], [ [[TMP9]], [[TMP3]] ]
	; CHECK-NEXT: [[TMP20:%.*]] = phi i32 [ 0, [[TMP17]] ], [ [[TMP13]], [[TMP3]] ]
	; CHECK-NEXT: [[TMP21:%.*]] = phi i32 [ 0, [[TMP17]] ], [ [[TMP16]], [[TMP3]] ]
	; CHECK-NEXT: [[TMP22]] = or i32 [[TMP18]], 0
	; CHECK-NEXT: br label [[TMP17]]
	;			;
	%4 = extractelement <2 x i64> %1, i64 0			%4 = extractelement <2 x i64> %1, i64 0
	%5 = or i64 %4, 0			%5 = or i64 %4, 0
	%6 = trunc i64 %5 to i32			%6 = trunc i64 %5 to i32
	%7 = extractelement <2 x i64> %0, i64 0			%7 = extractelement <2 x i64> %0, i64 0
	%8 = or i64 %7, 0			%8 = or i64 %7, 0
	%9 = trunc i64 %8 to i32			%9 = trunc i64 %8 to i32
	%10 = extractelement <2 x i64> %2, i64 0			%10 = extractelement <2 x i64> %2, i64 0
	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll

Show First 20 Lines • Show All 337 Lines • ▼ Show 20 Lines
}		}

define i16 @reduce_blockstrided4(ptr nocapture noundef readonly %x, ptr nocapture noundef readonly %y, i32 noundef %stride) {		define i16 @reduce_blockstrided4(ptr nocapture noundef readonly %x, ptr nocapture noundef readonly %y, i32 noundef %stride) {
; CHECK-LABEL: @reduce_blockstrided4(		; CHECK-LABEL: @reduce_blockstrided4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64		; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, ptr [[X:%.]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, ptr [[X:%.]], i64 [[IDXPROM]]
; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, ptr [[Y:%.]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, ptr [[Y:%.]], i64 [[IDXPROM]]
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[X]], align 2		; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i16>, ptr [[X]], align 2
		; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[Y]], align 2
		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i16> [[TMP1]], [[TMP0]]
; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i16>, ptr [[ARRAYIDX4]], align 2		; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i16>, ptr [[ARRAYIDX4]], align 2
; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i16>, ptr [[Y]], align 2		; CHECK-NEXT: [[TMP4:%.*]] = load <4 x i16>, ptr [[ARRAYIDX20]], align 2
; CHECK-NEXT: [[TMP7:%.*]] = load <4 x i16>, ptr [[ARRAYIDX20]], align 2		; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i16> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i16> [[TMP5]], [[TMP1]]		; CHECK-NEXT: [[TMP6:%.*]] = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> [[TMP2]])
; CHECK-NEXT: [[TMP9:%.*]] = mul <4 x i16> [[TMP7]], [[TMP3]]		; CHECK-NEXT: [[TMP7:%.*]] = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> [[TMP5]])
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP11:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP10]])		; CHECK-NEXT: ret i16 [[OP_RDX]]
; CHECK-NEXT: ret i16 [[TMP11]]
;		;
entry:		entry:
%0 = load i16, ptr %x, align 2		%0 = load i16, ptr %x, align 2
%arrayidx1 = getelementptr inbounds i16, ptr %x, i64 1		%arrayidx1 = getelementptr inbounds i16, ptr %x, i64 1
%1 = load i16, ptr %arrayidx1, align 2		%1 = load i16, ptr %arrayidx1, align 2
%arrayidx2 = getelementptr inbounds i16, ptr %x, i64 2		%arrayidx2 = getelementptr inbounds i16, ptr %x, i64 2
%2 = load i16, ptr %arrayidx2, align 2		%2 = load i16, ptr %arrayidx2, align 2
%arrayidx3 = getelementptr inbounds i16, ptr %x, i64 3		%arrayidx3 = getelementptr inbounds i16, ptr %x, i64 3
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[OFF1:%.]] to i64		; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[OFF1:%.]] to i64
; CHECK-NEXT: [[IDX_EXT63:%.]] = sext i32 [[OFF2:%.]] to i64		; CHECK-NEXT: [[IDX_EXT63:%.]] = sext i32 [[OFF2:%.]] to i64
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, ptr [[P1:%.]], i64 4		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, ptr [[P1:%.]], i64 4
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, ptr [[P2:%.]], i64 4		; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, ptr [[P2:%.]], i64 4
; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[IDX_EXT63]]
; CHECK-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 4
; CHECK-NEXT: [[ARRAYIDX5_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 4
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i8>, ptr [[P1]], align 1		; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i8>, ptr [[P1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i8>, ptr [[P2]], align 1		; CHECK-NEXT: [[TMP1:%.*]] = zext <4 x i8> [[TMP0]] to <4 x i32>
; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3]], align 1		; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3]], align 1
		; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
		; CHECK-NEXT: [[TMP4:%.*]] = mul nuw nsw <4 x i32> [[TMP1]], [[TMP3]]
		; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i8>, ptr [[P2]], align 1
		; CHECK-NEXT: [[TMP6:%.*]] = zext <4 x i8> [[TMP5]] to <4 x i32>
; CHECK-NEXT: [[TMP7:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5]], align 1		; CHECK-NEXT: [[TMP7:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5]], align 1
; CHECK-NEXT: [[TMP9:%.*]] = load <4 x i8>, ptr [[ADD_PTR]], align 1		; CHECK-NEXT: [[TMP8:%.*]] = zext <4 x i8> [[TMP7]] to <4 x i32>
; CHECK-NEXT: [[TMP11:%.*]] = load <4 x i8>, ptr [[ADD_PTR64]], align 1		; CHECK-NEXT: [[TMP9:%.*]] = mul nuw nsw <4 x i32> [[TMP6]], [[TMP8]]
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> [[TMP3]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP10:%.*]] = load <4 x i8>, ptr [[ADD_PTR]], align 1
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i8> [[TMP9]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP11:%.*]] = zext <4 x i8> [[TMP10]] to <4 x i32>
; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <16 x i8> [[TMP12]], <16 x i8> [[TMP13]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP12:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_1]], align 1
; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <4 x i8> [[TMP11]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP13:%.*]] = zext <4 x i8> [[TMP12]] to <4 x i32>
; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <16 x i8> [[TMP14]], <16 x i8> [[TMP15]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[TMP14:%.*]] = mul nuw nsw <4 x i32> [[TMP11]], [[TMP13]]
; CHECK-NEXT: [[TMP17:%.*]] = zext <16 x i8> [[TMP16]] to <16 x i32>		; CHECK-NEXT: [[TMP15:%.*]] = load <4 x i8>, ptr [[ADD_PTR64]], align 1
; CHECK-NEXT: [[TMP19:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_1]], align 1		; CHECK-NEXT: [[TMP16:%.*]] = zext <4 x i8> [[TMP15]] to <4 x i32>
; CHECK-NEXT: [[TMP21:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_1]], align 1		; CHECK-NEXT: [[TMP17:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_1]], align 1
; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <4 x i8> [[TMP5]], <4 x i8> [[TMP7]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP18:%.*]] = zext <4 x i8> [[TMP17]] to <4 x i32>
; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <4 x i8> [[TMP19]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP19:%.*]] = mul nuw nsw <4 x i32> [[TMP16]], [[TMP18]]
; CHECK-NEXT: [[TMP24:%.*]] = shufflevector <16 x i8> [[TMP22]], <16 x i8> [[TMP23]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP20:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <4 x i8> [[TMP21]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP21:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP9]])
; CHECK-NEXT: [[TMP26:%.*]] = shufflevector <16 x i8> [[TMP24]], <16 x i8> [[TMP25]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP20]], [[TMP21]]
; CHECK-NEXT: [[TMP27:%.*]] = zext <16 x i8> [[TMP26]] to <16 x i32>		; CHECK-NEXT: [[TMP22:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])
; CHECK-NEXT: [[TMP28:%.*]] = mul nuw nsw <16 x i32> [[TMP17]], [[TMP27]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], [[TMP22]]
; CHECK-NEXT: [[TMP29:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP28]])		; CHECK-NEXT: [[TMP23:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP19]])
; CHECK-NEXT: ret i32 [[TMP29]]		; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX1]], [[TMP23]]
		; CHECK-NEXT: ret i32 [[OP_RDX2]]
;		;
entry:		entry:
%idx.ext = sext i32 %off1 to i64		%idx.ext = sext i32 %off1 to i64
%idx.ext63 = sext i32 %off2 to i64		%idx.ext63 = sext i32 %off2 to i64

%0 = load i8, ptr %p1, align 1		%0 = load i8, ptr %p1, align 1
%conv = zext i8 %0 to i32		%conv = zext i8 %0 to i32
%1 = load i8, ptr %p2, align 1		%1 = load i8, ptr %p2, align 1
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX56:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[IDXPROM19]]		; CHECK-NEXT: [[ARRAYIDX56:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[IDXPROM19]]
; CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX56]], align 4		; CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX56]], align 4
; CHECK-NEXT: [[ARRAYIDX60:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[IDXPROM23]]		; CHECK-NEXT: [[ARRAYIDX60:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[IDXPROM23]]
; CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX60]], align 4		; CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX60]], align 4
; CHECK-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[IDXPROM27]]		; CHECK-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[IDXPROM27]]
; CHECK-NEXT: [[ARRAYIDX72:%.]] = getelementptr inbounds i32, ptr [[Z:%.]], i64 1		; CHECK-NEXT: [[ARRAYIDX72:%.]] = getelementptr inbounds i32, ptr [[Z:%.]], i64 1
; CHECK-NEXT: [[MUL73:%.*]] = mul nsw i32 [[TMP3]], [[TMP0]]		; CHECK-NEXT: [[MUL73:%.*]] = mul nsw i32 [[TMP3]], [[TMP0]]
; CHECK-NEXT: [[ARRAYIDX76:%.*]] = getelementptr inbounds i32, ptr [[Z]], i64 6		; CHECK-NEXT: [[ARRAYIDX76:%.*]] = getelementptr inbounds i32, ptr [[Z]], i64 6
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x i32>, ptr [[X]], align 4		; CHECK-NEXT: [[TMP6:%.*]] = load <2 x i32>, ptr [[X]], align 4
; CHECK-NEXT: [[TMP9:%.*]] = load <2 x i32>, ptr [[ARRAYIDX6]], align 4		; CHECK-NEXT: [[TMP7:%.*]] = load <2 x i32>, ptr [[ARRAYIDX6]], align 4
; CHECK-NEXT: [[TMP11:%.*]] = load <2 x i32>, ptr [[Y]], align 4		; CHECK-NEXT: [[TMP8:%.*]] = load <2 x i32>, ptr [[Y]], align 4
; CHECK-NEXT: [[TMP13:%.*]] = load <2 x i32>, ptr [[ARRAYIDX41]], align 4		; CHECK-NEXT: [[TMP9:%.*]] = load <2 x i32>, ptr [[ARRAYIDX41]], align 4
; CHECK-NEXT: [[TMP14:%.*]] = mul nsw <2 x i32> [[TMP11]], [[TMP7]]		; CHECK-NEXT: [[TMP10:%.*]] = mul nsw <2 x i32> [[TMP8]], [[TMP6]]
; CHECK-NEXT: [[TMP15:%.*]] = mul nsw <2 x i32> [[TMP13]], [[TMP9]]		; CHECK-NEXT: [[TMP11:%.*]] = mul nsw <2 x i32> [[TMP9]], [[TMP7]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP14]], <2 x i32> [[TMP15]], <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], <4 x i32> <i32 1, i32 0, i32 3, i32 2>
; CHECK-NEXT: [[ARRAYIDX84:%.*]] = getelementptr inbounds i32, ptr [[Z]], i64 7		; CHECK-NEXT: [[ARRAYIDX84:%.*]] = getelementptr inbounds i32, ptr [[Z]], i64 7
; CHECK-NEXT: [[MUL85:%.*]] = mul nsw i32 [[TMP4]], [[TMP1]]		; CHECK-NEXT: [[MUL85:%.*]] = mul nsw i32 [[TMP4]], [[TMP1]]
; CHECK-NEXT: [[MUL87:%.*]] = mul nsw i32 [[TMP5]], [[TMP2]]		; CHECK-NEXT: [[MUL87:%.*]] = mul nsw i32 [[TMP5]], [[TMP2]]
; CHECK-NEXT: [[ARRAYIDX88:%.*]] = getelementptr inbounds i32, ptr [[Z]], i64 11		; CHECK-NEXT: [[ARRAYIDX88:%.*]] = getelementptr inbounds i32, ptr [[Z]], i64 11
; CHECK-NEXT: [[TMP18:%.*]] = load <2 x i32>, ptr [[ARRAYIDX12]], align 4		; CHECK-NEXT: [[TMP13:%.*]] = load <2 x i32>, ptr [[ARRAYIDX12]], align 4
; CHECK-NEXT: [[TMP20:%.*]] = load <2 x i32>, ptr [[ARRAYIDX28]], align 4		; CHECK-NEXT: [[TMP14:%.*]] = load <2 x i32>, ptr [[ARRAYIDX28]], align 4
; CHECK-NEXT: [[TMP22:%.*]] = load <2 x i32>, ptr [[ARRAYIDX48]], align 4		; CHECK-NEXT: [[TMP15:%.*]] = load <2 x i32>, ptr [[ARRAYIDX48]], align 4
; CHECK-NEXT: [[TMP24:%.*]] = load <2 x i32>, ptr [[ARRAYIDX64]], align 4		; CHECK-NEXT: [[TMP16:%.*]] = load <2 x i32>, ptr [[ARRAYIDX64]], align 4
; CHECK-NEXT: store i32 [[MUL73]], ptr [[Z]], align 4		; CHECK-NEXT: store i32 [[MUL73]], ptr [[Z]], align 4
; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], ptr [[ARRAYIDX72]], align 4		; CHECK-NEXT: store <4 x i32> [[TMP12]], ptr [[ARRAYIDX72]], align 4
; CHECK-NEXT: store i32 [[MUL85]], ptr [[ARRAYIDX76]], align 4		; CHECK-NEXT: store i32 [[MUL85]], ptr [[ARRAYIDX76]], align 4
; CHECK-NEXT: store i32 [[MUL87]], ptr [[ARRAYIDX88]], align 4		; CHECK-NEXT: store i32 [[MUL87]], ptr [[ARRAYIDX88]], align 4
; CHECK-NEXT: [[TMP25:%.*]] = mul nsw <2 x i32> [[TMP22]], [[TMP18]]		; CHECK-NEXT: [[TMP17:%.*]] = mul nsw <2 x i32> [[TMP15]], [[TMP13]]
; CHECK-NEXT: [[TMP26:%.*]] = mul nsw <2 x i32> [[TMP24]], [[TMP20]]		; CHECK-NEXT: [[TMP18:%.*]] = mul nsw <2 x i32> [[TMP16]], [[TMP14]]
; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP25]], <2 x i32> [[TMP26]], <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <2 x i32> [[TMP17]], <2 x i32> [[TMP18]], <4 x i32> <i32 1, i32 0, i32 3, i32 2>
; CHECK-NEXT: store <4 x i32> [[SHUFFLE1]], ptr [[ARRAYIDX84]], align 4		; CHECK-NEXT: store <4 x i32> [[TMP19]], ptr [[ARRAYIDX84]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, ptr %x, align 4		%0 = load i32, ptr %x, align 4
%arrayidx1 = getelementptr inbounds i32, ptr %x, i64 1		%arrayidx1 = getelementptr inbounds i32, ptr %x, i64 1
%1 = load i32, ptr %arrayidx1, align 4		%1 = load i32, ptr %arrayidx1, align 4
%arrayidx2 = getelementptr inbounds i32, ptr %x, i64 2		%arrayidx2 = getelementptr inbounds i32, ptr %x, i64 2
%2 = load i32, ptr %arrayidx2, align 4		%2 = load i32, ptr %arrayidx2, align 4
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
}		}

define void @store_blockstrided4(ptr nocapture noundef readonly %x, ptr nocapture noundef readonly %y, i32 noundef %stride, ptr %dst0) {		define void @store_blockstrided4(ptr nocapture noundef readonly %x, ptr nocapture noundef readonly %y, i32 noundef %stride, ptr %dst0) {
; CHECK-LABEL: @store_blockstrided4(		; CHECK-LABEL: @store_blockstrided4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64		; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, ptr [[X:%.]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, ptr [[X:%.]], i64 [[IDXPROM]]
; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, ptr [[Y:%.]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, ptr [[Y:%.]], i64 [[IDXPROM]]
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[X]], align 2		; CHECK-NEXT: [[DST4:%.]] = getelementptr inbounds i16, ptr [[DST0:%.]], i64 4
; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i16>, ptr [[ARRAYIDX4]], align 2		; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i16>, ptr [[X]], align 2
; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i16>, ptr [[Y]], align 2		; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[Y]], align 2
; CHECK-NEXT: [[TMP7:%.*]] = load <4 x i16>, ptr [[ARRAYIDX20]], align 2		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i16> [[TMP1]], [[TMP0]]
; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i16> [[TMP5]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[TMP2]], <4 x i16> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP9:%.*]] = mul <4 x i16> [[TMP7]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = load <4 x i16>, ptr [[ARRAYIDX4]], align 2
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>		; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i16>, ptr [[ARRAYIDX20]], align 2
; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], ptr [[DST0:%.*]], align 2		; CHECK-NEXT: [[TMP6:%.*]] = mul <4 x i16> [[TMP5]], [[TMP4]]
		; CHECK-NEXT: store <4 x i16> [[TMP3]], ptr [[DST0]], align 2
		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i16> [[TMP6]], <4 x i16> poison, <4 x i32> <i32 1, i32 0, i32 3, i32 2>
		; CHECK-NEXT: store <4 x i16> [[TMP7]], ptr [[DST4]], align 2
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%0 = load i16, ptr %x, align 2		%0 = load i16, ptr %x, align 2
%arrayidx1 = getelementptr inbounds i16, ptr %x, i64 1		%arrayidx1 = getelementptr inbounds i16, ptr %x, i64 1
%1 = load i16, ptr %arrayidx1, align 2		%1 = load i16, ptr %arrayidx1, align 2
%arrayidx2 = getelementptr inbounds i16, ptr %x, i64 2		%arrayidx2 = getelementptr inbounds i16, ptr %x, i64 2
%2 = load i16, ptr %arrayidx2, align 2		%2 = load i16, ptr %arrayidx2, align 2
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, ptr [[P2:%.]], i64 4		; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, ptr [[P2:%.]], i64 4
; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[IDX_EXT63]]
; CHECK-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 4
; CHECK-NEXT: [[ARRAYIDX5_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 4
; CHECK-NEXT: [[DST4:%.]] = getelementptr inbounds i32, ptr [[DST0:%.]], i64 4		; CHECK-NEXT: [[DST4:%.]] = getelementptr inbounds i32, ptr [[DST0:%.]], i64 4
; CHECK-NEXT: [[DST8:%.*]] = getelementptr inbounds i32, ptr [[DST0]], i64 8		; CHECK-NEXT: [[DST8:%.*]] = getelementptr inbounds i32, ptr [[DST0]], i64 8
; CHECK-NEXT: [[DST12:%.*]] = getelementptr inbounds i32, ptr [[DST0]], i64 12		; CHECK-NEXT: [[DST12:%.*]] = getelementptr inbounds i32, ptr [[DST0]], i64 12
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i8>, ptr [[P1]], align 1		; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i8>, ptr [[P1]], align 1
; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[TMP1]] to <4 x i32>		; CHECK-NEXT: [[TMP1:%.*]] = zext <4 x i8> [[TMP0]] to <4 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3]], align 1		; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3]], align 1
; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>		; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
; CHECK-NEXT: [[TMP6:%.*]] = mul nuw nsw <4 x i32> [[TMP2]], [[TMP5]]		; CHECK-NEXT: [[TMP4:%.*]] = mul nuw nsw <4 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP9:%.*]] = load <4 x i8>, ptr [[P2]], align 1		; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i8>, ptr [[P2]], align 1
; CHECK-NEXT: [[TMP10:%.*]] = zext <4 x i8> [[TMP9]] to <4 x i32>		; CHECK-NEXT: [[TMP6:%.*]] = zext <4 x i8> [[TMP5]] to <4 x i32>
; CHECK-NEXT: [[TMP12:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5]], align 1		; CHECK-NEXT: [[TMP7:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5]], align 1
		; CHECK-NEXT: [[TMP8:%.*]] = zext <4 x i8> [[TMP7]] to <4 x i32>
		; CHECK-NEXT: [[TMP9:%.*]] = mul nuw nsw <4 x i32> [[TMP6]], [[TMP8]]
		; CHECK-NEXT: [[TMP10:%.*]] = load <4 x i8>, ptr [[ADD_PTR]], align 1
		; CHECK-NEXT: [[TMP11:%.*]] = zext <4 x i8> [[TMP10]] to <4 x i32>
		; CHECK-NEXT: [[TMP12:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_1]], align 1
; CHECK-NEXT: [[TMP13:%.*]] = zext <4 x i8> [[TMP12]] to <4 x i32>		; CHECK-NEXT: [[TMP13:%.*]] = zext <4 x i8> [[TMP12]] to <4 x i32>
; CHECK-NEXT: [[TMP14:%.*]] = mul nuw nsw <4 x i32> [[TMP10]], [[TMP13]]		; CHECK-NEXT: [[TMP14:%.*]] = mul nuw nsw <4 x i32> [[TMP11]], [[TMP13]]
; CHECK-NEXT: [[TMP17:%.*]] = load <4 x i8>, ptr [[ADD_PTR]], align 1		; CHECK-NEXT: [[TMP15:%.*]] = load <4 x i8>, ptr [[ADD_PTR64]], align 1
		; CHECK-NEXT: [[TMP16:%.*]] = zext <4 x i8> [[TMP15]] to <4 x i32>
		; CHECK-NEXT: [[TMP17:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_1]], align 1
; CHECK-NEXT: [[TMP18:%.*]] = zext <4 x i8> [[TMP17]] to <4 x i32>		; CHECK-NEXT: [[TMP18:%.*]] = zext <4 x i8> [[TMP17]] to <4 x i32>
; CHECK-NEXT: [[TMP20:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_1]], align 1		; CHECK-NEXT: [[TMP19:%.*]] = mul nuw nsw <4 x i32> [[TMP16]], [[TMP18]]
; CHECK-NEXT: [[TMP21:%.*]] = zext <4 x i8> [[TMP20]] to <4 x i32>		; CHECK-NEXT: store <4 x i32> [[TMP4]], ptr [[DST0]], align 4
; CHECK-NEXT: [[TMP22:%.*]] = mul nuw nsw <4 x i32> [[TMP18]], [[TMP21]]		; CHECK-NEXT: store <4 x i32> [[TMP9]], ptr [[DST4]], align 4
; CHECK-NEXT: [[TMP25:%.*]] = load <4 x i8>, ptr [[ADD_PTR64]], align 1		; CHECK-NEXT: store <4 x i32> [[TMP14]], ptr [[DST8]], align 4
; CHECK-NEXT: [[TMP26:%.*]] = zext <4 x i8> [[TMP25]] to <4 x i32>		; CHECK-NEXT: store <4 x i32> [[TMP19]], ptr [[DST12]], align 4
; CHECK-NEXT: [[TMP28:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_1]], align 1
; CHECK-NEXT: [[TMP29:%.*]] = zext <4 x i8> [[TMP28]] to <4 x i32>
; CHECK-NEXT: [[TMP30:%.*]] = mul nuw nsw <4 x i32> [[TMP26]], [[TMP29]]
; CHECK-NEXT: store <4 x i32> [[TMP6]], ptr [[DST0]], align 4
; CHECK-NEXT: store <4 x i32> [[TMP14]], ptr [[DST4]], align 4
; CHECK-NEXT: store <4 x i32> [[TMP22]], ptr [[DST8]], align 4
; CHECK-NEXT: store <4 x i32> [[TMP30]], ptr [[DST12]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%idx.ext = sext i32 %off1 to i64		%idx.ext = sext i32 %off1 to i64
%idx.ext63 = sext i32 %off2 to i64		%idx.ext63 = sext i32 %off2 to i64

%0 = load i8, ptr %p1, align 1		%0 = load i8, ptr %p1, align 1
%conv = zext i8 %0 to i32		%conv = zext i8 %0 to i32
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

define dso_local i32 @full(ptr nocapture noundef readonly %p1, i32 noundef %st1, ptr nocapture noundef readonly %p2, i32 noundef %st2) {		define dso_local i32 @full(ptr nocapture noundef readonly %p1, i32 noundef %st1, ptr nocapture noundef readonly %p2, i32 noundef %st2) {
; CHECK-LABEL: @full(		; CHECK-LABEL: @full(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST1:%.]] to i64		; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST1:%.]] to i64
; CHECK-NEXT: [[IDX_EXT63:%.]] = sext i32 [[ST2:%.]] to i64		; CHECK-NEXT: [[IDX_EXT63:%.]] = sext i32 [[ST2:%.]] to i64
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, ptr [[P1:%.]], i64 4		; CHECK-NEXT: [[TMP0:%.]] = load i8, ptr [[P1:%.]], align 1
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, ptr [[P2:%.]], i64 4		; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP0]] to i32
		; CHECK-NEXT: [[TMP1:%.]] = load i8, ptr [[P2:%.]], align 1
		; CHECK-NEXT: [[CONV2:%.*]] = zext i8 [[TMP1]] to i32
		; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[CONV]], [[CONV2]]
		; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 4
		; CHECK-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX3]], align 1
		; CHECK-NEXT: [[CONV4:%.*]] = zext i8 [[TMP2]] to i32
		; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 4
		; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX5]], align 1
		; CHECK-NEXT: [[CONV6:%.*]] = zext i8 [[TMP3]] to i32
		; CHECK-NEXT: [[SUB7:%.*]] = sub nsw i32 [[CONV4]], [[CONV6]]
		; CHECK-NEXT: [[SHL:%.*]] = shl nsw i32 [[SUB7]], 16
		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[SHL]], [[SUB]]
		; CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 1
		; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX8]], align 1
		; CHECK-NEXT: [[CONV9:%.*]] = zext i8 [[TMP4]] to i32
		; CHECK-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 1
		; CHECK-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX10]], align 1
		; CHECK-NEXT: [[CONV11:%.*]] = zext i8 [[TMP5]] to i32
		; CHECK-NEXT: [[SUB12:%.*]] = sub nsw i32 [[CONV9]], [[CONV11]]
		; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 5
		; CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX13]], align 1
		; CHECK-NEXT: [[CONV14:%.*]] = zext i8 [[TMP6]] to i32
		; CHECK-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 5
		; CHECK-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1
		; CHECK-NEXT: [[CONV16:%.*]] = zext i8 [[TMP7]] to i32
		; CHECK-NEXT: [[SUB17:%.*]] = sub nsw i32 [[CONV14]], [[CONV16]]
		; CHECK-NEXT: [[SHL18:%.*]] = shl nsw i32 [[SUB17]], 16
		; CHECK-NEXT: [[ADD19:%.*]] = add nsw i32 [[SHL18]], [[SUB12]]
		; CHECK-NEXT: [[ARRAYIDX20:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 2
		; CHECK-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX20]], align 1
		; CHECK-NEXT: [[CONV21:%.*]] = zext i8 [[TMP8]] to i32
		; CHECK-NEXT: [[ARRAYIDX22:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 2
		; CHECK-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX22]], align 1
		; CHECK-NEXT: [[CONV23:%.*]] = zext i8 [[TMP9]] to i32
		; CHECK-NEXT: [[SUB24:%.*]] = sub nsw i32 [[CONV21]], [[CONV23]]
		; CHECK-NEXT: [[ARRAYIDX25:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 6
		; CHECK-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX25]], align 1
		; CHECK-NEXT: [[CONV26:%.*]] = zext i8 [[TMP10]] to i32
		; CHECK-NEXT: [[ARRAYIDX27:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 6
		; CHECK-NEXT: [[TMP11:%.*]] = load i8, ptr [[ARRAYIDX27]], align 1
		; CHECK-NEXT: [[CONV28:%.*]] = zext i8 [[TMP11]] to i32
		; CHECK-NEXT: [[SUB29:%.*]] = sub nsw i32 [[CONV26]], [[CONV28]]
		; CHECK-NEXT: [[SHL30:%.*]] = shl nsw i32 [[SUB29]], 16
		; CHECK-NEXT: [[ADD31:%.*]] = add nsw i32 [[SHL30]], [[SUB24]]
		; CHECK-NEXT: [[ARRAYIDX32:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 3
		; CHECK-NEXT: [[TMP12:%.*]] = load i8, ptr [[ARRAYIDX32]], align 1
		; CHECK-NEXT: [[CONV33:%.*]] = zext i8 [[TMP12]] to i32
		; CHECK-NEXT: [[ARRAYIDX34:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 3
		; CHECK-NEXT: [[TMP13:%.*]] = load i8, ptr [[ARRAYIDX34]], align 1
		; CHECK-NEXT: [[CONV35:%.*]] = zext i8 [[TMP13]] to i32
		; CHECK-NEXT: [[SUB36:%.*]] = sub nsw i32 [[CONV33]], [[CONV35]]
		; CHECK-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 7
		; CHECK-NEXT: [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX37]], align 1
		; CHECK-NEXT: [[CONV38:%.*]] = zext i8 [[TMP14]] to i32
		; CHECK-NEXT: [[ARRAYIDX39:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 7
		; CHECK-NEXT: [[TMP15:%.*]] = load i8, ptr [[ARRAYIDX39]], align 1
		; CHECK-NEXT: [[CONV40:%.*]] = zext i8 [[TMP15]] to i32
		; CHECK-NEXT: [[SUB41:%.*]] = sub nsw i32 [[CONV38]], [[CONV40]]
		; CHECK-NEXT: [[SHL42:%.*]] = shl nsw i32 [[SUB41]], 16
		; CHECK-NEXT: [[ADD43:%.*]] = add nsw i32 [[SHL42]], [[SUB36]]
		; CHECK-NEXT: [[ADD44:%.*]] = add nsw i32 [[ADD19]], [[ADD]]
		; CHECK-NEXT: [[SUB45:%.*]] = sub nsw i32 [[ADD]], [[ADD19]]
		; CHECK-NEXT: [[ADD46:%.*]] = add nsw i32 [[ADD43]], [[ADD31]]
		; CHECK-NEXT: [[SUB47:%.*]] = sub nsw i32 [[ADD31]], [[ADD43]]
		; CHECK-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD46]], [[ADD44]]
		; CHECK-NEXT: [[SUB51:%.*]] = sub nsw i32 [[ADD44]], [[ADD46]]
		; CHECK-NEXT: [[ADD55:%.*]] = add nsw i32 [[SUB47]], [[SUB45]]
		; CHECK-NEXT: [[SUB59:%.*]] = sub nsw i32 [[SUB45]], [[SUB47]]
; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[IDX_EXT63]]
		; CHECK-NEXT: [[TMP16:%.*]] = load i8, ptr [[ADD_PTR]], align 1
		; CHECK-NEXT: [[CONV_1:%.*]] = zext i8 [[TMP16]] to i32
		; CHECK-NEXT: [[TMP17:%.*]] = load i8, ptr [[ADD_PTR64]], align 1
		; CHECK-NEXT: [[CONV2_1:%.*]] = zext i8 [[TMP17]] to i32
		; CHECK-NEXT: [[SUB_1:%.*]] = sub nsw i32 [[CONV_1]], [[CONV2_1]]
; CHECK-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 4
		; CHECK-NEXT: [[TMP18:%.*]] = load i8, ptr [[ARRAYIDX3_1]], align 1
		; CHECK-NEXT: [[CONV4_1:%.*]] = zext i8 [[TMP18]] to i32
; CHECK-NEXT: [[ARRAYIDX5_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 4
		; CHECK-NEXT: [[TMP19:%.*]] = load i8, ptr [[ARRAYIDX5_1]], align 1
		; CHECK-NEXT: [[CONV6_1:%.*]] = zext i8 [[TMP19]] to i32
		; CHECK-NEXT: [[SUB7_1:%.*]] = sub nsw i32 [[CONV4_1]], [[CONV6_1]]
		; CHECK-NEXT: [[SHL_1:%.*]] = shl nsw i32 [[SUB7_1]], 16
		; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[SHL_1]], [[SUB_1]]
		; CHECK-NEXT: [[ARRAYIDX8_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 1
		; CHECK-NEXT: [[TMP20:%.*]] = load i8, ptr [[ARRAYIDX8_1]], align 1
		; CHECK-NEXT: [[CONV9_1:%.*]] = zext i8 [[TMP20]] to i32
		; CHECK-NEXT: [[ARRAYIDX10_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 1
		; CHECK-NEXT: [[TMP21:%.*]] = load i8, ptr [[ARRAYIDX10_1]], align 1
		; CHECK-NEXT: [[CONV11_1:%.*]] = zext i8 [[TMP21]] to i32
		; CHECK-NEXT: [[SUB12_1:%.*]] = sub nsw i32 [[CONV9_1]], [[CONV11_1]]
		; CHECK-NEXT: [[ARRAYIDX13_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 5
		; CHECK-NEXT: [[TMP22:%.*]] = load i8, ptr [[ARRAYIDX13_1]], align 1
		; CHECK-NEXT: [[CONV14_1:%.*]] = zext i8 [[TMP22]] to i32
		; CHECK-NEXT: [[ARRAYIDX15_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 5
		; CHECK-NEXT: [[TMP23:%.*]] = load i8, ptr [[ARRAYIDX15_1]], align 1
		; CHECK-NEXT: [[CONV16_1:%.*]] = zext i8 [[TMP23]] to i32
		; CHECK-NEXT: [[SUB17_1:%.*]] = sub nsw i32 [[CONV14_1]], [[CONV16_1]]
		; CHECK-NEXT: [[SHL18_1:%.*]] = shl nsw i32 [[SUB17_1]], 16
		; CHECK-NEXT: [[ADD19_1:%.*]] = add nsw i32 [[SHL18_1]], [[SUB12_1]]
		; CHECK-NEXT: [[ARRAYIDX20_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 2
		; CHECK-NEXT: [[TMP24:%.*]] = load i8, ptr [[ARRAYIDX20_1]], align 1
		; CHECK-NEXT: [[CONV21_1:%.*]] = zext i8 [[TMP24]] to i32
		; CHECK-NEXT: [[ARRAYIDX22_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 2
		; CHECK-NEXT: [[TMP25:%.*]] = load i8, ptr [[ARRAYIDX22_1]], align 1
		; CHECK-NEXT: [[CONV23_1:%.*]] = zext i8 [[TMP25]] to i32
		; CHECK-NEXT: [[SUB24_1:%.*]] = sub nsw i32 [[CONV21_1]], [[CONV23_1]]
		; CHECK-NEXT: [[ARRAYIDX25_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 6
		; CHECK-NEXT: [[TMP26:%.*]] = load i8, ptr [[ARRAYIDX25_1]], align 1
		; CHECK-NEXT: [[CONV26_1:%.*]] = zext i8 [[TMP26]] to i32
		; CHECK-NEXT: [[ARRAYIDX27_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 6
		; CHECK-NEXT: [[TMP27:%.*]] = load i8, ptr [[ARRAYIDX27_1]], align 1
		; CHECK-NEXT: [[CONV28_1:%.*]] = zext i8 [[TMP27]] to i32
		; CHECK-NEXT: [[SUB29_1:%.*]] = sub nsw i32 [[CONV26_1]], [[CONV28_1]]
		; CHECK-NEXT: [[SHL30_1:%.*]] = shl nsw i32 [[SUB29_1]], 16
		; CHECK-NEXT: [[ADD31_1:%.*]] = add nsw i32 [[SHL30_1]], [[SUB24_1]]
		; CHECK-NEXT: [[ARRAYIDX32_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 3
		; CHECK-NEXT: [[TMP28:%.*]] = load i8, ptr [[ARRAYIDX32_1]], align 1
		; CHECK-NEXT: [[CONV33_1:%.*]] = zext i8 [[TMP28]] to i32
		; CHECK-NEXT: [[ARRAYIDX34_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 3
		; CHECK-NEXT: [[TMP29:%.*]] = load i8, ptr [[ARRAYIDX34_1]], align 1
		; CHECK-NEXT: [[CONV35_1:%.*]] = zext i8 [[TMP29]] to i32
		; CHECK-NEXT: [[SUB36_1:%.*]] = sub nsw i32 [[CONV33_1]], [[CONV35_1]]
		; CHECK-NEXT: [[ARRAYIDX37_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 7
		; CHECK-NEXT: [[TMP30:%.*]] = load i8, ptr [[ARRAYIDX37_1]], align 1
		; CHECK-NEXT: [[CONV38_1:%.*]] = zext i8 [[TMP30]] to i32
		; CHECK-NEXT: [[ARRAYIDX39_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 7
		; CHECK-NEXT: [[TMP31:%.*]] = load i8, ptr [[ARRAYIDX39_1]], align 1
		; CHECK-NEXT: [[CONV40_1:%.*]] = zext i8 [[TMP31]] to i32
		; CHECK-NEXT: [[SUB41_1:%.*]] = sub nsw i32 [[CONV38_1]], [[CONV40_1]]
		; CHECK-NEXT: [[SHL42_1:%.*]] = shl nsw i32 [[SUB41_1]], 16
		; CHECK-NEXT: [[ADD43_1:%.*]] = add nsw i32 [[SHL42_1]], [[SUB36_1]]
		; CHECK-NEXT: [[ADD44_1:%.*]] = add nsw i32 [[ADD19_1]], [[ADD_1]]
		; CHECK-NEXT: [[SUB45_1:%.*]] = sub nsw i32 [[ADD_1]], [[ADD19_1]]
		; CHECK-NEXT: [[ADD46_1:%.*]] = add nsw i32 [[ADD43_1]], [[ADD31_1]]
		; CHECK-NEXT: [[SUB47_1:%.*]] = sub nsw i32 [[ADD31_1]], [[ADD43_1]]
		; CHECK-NEXT: [[ADD48_1:%.*]] = add nsw i32 [[ADD46_1]], [[ADD44_1]]
		; CHECK-NEXT: [[SUB51_1:%.*]] = sub nsw i32 [[ADD44_1]], [[ADD46_1]]
		; CHECK-NEXT: [[ADD55_1:%.*]] = add nsw i32 [[SUB47_1]], [[SUB45_1]]
		; CHECK-NEXT: [[SUB59_1:%.*]] = sub nsw i32 [[SUB45_1]], [[SUB47_1]]
; CHECK-NEXT: [[ADD_PTR_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64_1:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64]], i64 [[IDX_EXT63]]
		; CHECK-NEXT: [[TMP32:%.*]] = load i8, ptr [[ADD_PTR_1]], align 1
		; CHECK-NEXT: [[CONV_2:%.*]] = zext i8 [[TMP32]] to i32
		; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[ADD_PTR64_1]], align 1
		; CHECK-NEXT: [[CONV2_2:%.*]] = zext i8 [[TMP33]] to i32
		; CHECK-NEXT: [[SUB_2:%.*]] = sub nsw i32 [[CONV_2]], [[CONV2_2]]
; CHECK-NEXT: [[ARRAYIDX3_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 4
		; CHECK-NEXT: [[TMP34:%.*]] = load i8, ptr [[ARRAYIDX3_2]], align 1
		; CHECK-NEXT: [[CONV4_2:%.*]] = zext i8 [[TMP34]] to i32
; CHECK-NEXT: [[ARRAYIDX5_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 4
		; CHECK-NEXT: [[TMP35:%.*]] = load i8, ptr [[ARRAYIDX5_2]], align 1
		; CHECK-NEXT: [[CONV6_2:%.*]] = zext i8 [[TMP35]] to i32
		; CHECK-NEXT: [[SUB7_2:%.*]] = sub nsw i32 [[CONV4_2]], [[CONV6_2]]
		; CHECK-NEXT: [[SHL_2:%.*]] = shl nsw i32 [[SUB7_2]], 16
		; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[SHL_2]], [[SUB_2]]
		; CHECK-NEXT: [[ARRAYIDX8_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 1
		; CHECK-NEXT: [[TMP36:%.*]] = load i8, ptr [[ARRAYIDX8_2]], align 1
		; CHECK-NEXT: [[CONV9_2:%.*]] = zext i8 [[TMP36]] to i32
		; CHECK-NEXT: [[ARRAYIDX10_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 1
		; CHECK-NEXT: [[TMP37:%.*]] = load i8, ptr [[ARRAYIDX10_2]], align 1
		; CHECK-NEXT: [[CONV11_2:%.*]] = zext i8 [[TMP37]] to i32
		; CHECK-NEXT: [[SUB12_2:%.*]] = sub nsw i32 [[CONV9_2]], [[CONV11_2]]
		; CHECK-NEXT: [[ARRAYIDX13_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 5
		; CHECK-NEXT: [[TMP38:%.*]] = load i8, ptr [[ARRAYIDX13_2]], align 1
		; CHECK-NEXT: [[CONV14_2:%.*]] = zext i8 [[TMP38]] to i32
		; CHECK-NEXT: [[ARRAYIDX15_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 5
		; CHECK-NEXT: [[TMP39:%.*]] = load i8, ptr [[ARRAYIDX15_2]], align 1
		; CHECK-NEXT: [[CONV16_2:%.*]] = zext i8 [[TMP39]] to i32
		; CHECK-NEXT: [[SUB17_2:%.*]] = sub nsw i32 [[CONV14_2]], [[CONV16_2]]
		; CHECK-NEXT: [[SHL18_2:%.*]] = shl nsw i32 [[SUB17_2]], 16
		; CHECK-NEXT: [[ADD19_2:%.*]] = add nsw i32 [[SHL18_2]], [[SUB12_2]]
		; CHECK-NEXT: [[ARRAYIDX20_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 2
		; CHECK-NEXT: [[TMP40:%.*]] = load i8, ptr [[ARRAYIDX20_2]], align 1
		; CHECK-NEXT: [[CONV21_2:%.*]] = zext i8 [[TMP40]] to i32
		; CHECK-NEXT: [[ARRAYIDX22_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 2
		; CHECK-NEXT: [[TMP41:%.*]] = load i8, ptr [[ARRAYIDX22_2]], align 1
		; CHECK-NEXT: [[CONV23_2:%.*]] = zext i8 [[TMP41]] to i32
		; CHECK-NEXT: [[SUB24_2:%.*]] = sub nsw i32 [[CONV21_2]], [[CONV23_2]]
		; CHECK-NEXT: [[ARRAYIDX25_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 6
		; CHECK-NEXT: [[TMP42:%.*]] = load i8, ptr [[ARRAYIDX25_2]], align 1
		; CHECK-NEXT: [[CONV26_2:%.*]] = zext i8 [[TMP42]] to i32
		; CHECK-NEXT: [[ARRAYIDX27_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 6
		; CHECK-NEXT: [[TMP43:%.*]] = load i8, ptr [[ARRAYIDX27_2]], align 1
		; CHECK-NEXT: [[CONV28_2:%.*]] = zext i8 [[TMP43]] to i32
		; CHECK-NEXT: [[SUB29_2:%.*]] = sub nsw i32 [[CONV26_2]], [[CONV28_2]]
		; CHECK-NEXT: [[SHL30_2:%.*]] = shl nsw i32 [[SUB29_2]], 16
		; CHECK-NEXT: [[ADD31_2:%.*]] = add nsw i32 [[SHL30_2]], [[SUB24_2]]
		; CHECK-NEXT: [[ARRAYIDX32_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 3
		; CHECK-NEXT: [[TMP44:%.*]] = load i8, ptr [[ARRAYIDX32_2]], align 1
		; CHECK-NEXT: [[CONV33_2:%.*]] = zext i8 [[TMP44]] to i32
		; CHECK-NEXT: [[ARRAYIDX34_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 3
		; CHECK-NEXT: [[TMP45:%.*]] = load i8, ptr [[ARRAYIDX34_2]], align 1
		; CHECK-NEXT: [[CONV35_2:%.*]] = zext i8 [[TMP45]] to i32
		; CHECK-NEXT: [[SUB36_2:%.*]] = sub nsw i32 [[CONV33_2]], [[CONV35_2]]
		; CHECK-NEXT: [[ARRAYIDX37_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 7
		; CHECK-NEXT: [[TMP46:%.*]] = load i8, ptr [[ARRAYIDX37_2]], align 1
		; CHECK-NEXT: [[CONV38_2:%.*]] = zext i8 [[TMP46]] to i32
		; CHECK-NEXT: [[ARRAYIDX39_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 7
		; CHECK-NEXT: [[TMP47:%.*]] = load i8, ptr [[ARRAYIDX39_2]], align 1
		; CHECK-NEXT: [[CONV40_2:%.*]] = zext i8 [[TMP47]] to i32
		; CHECK-NEXT: [[SUB41_2:%.*]] = sub nsw i32 [[CONV38_2]], [[CONV40_2]]
		; CHECK-NEXT: [[SHL42_2:%.*]] = shl nsw i32 [[SUB41_2]], 16
		; CHECK-NEXT: [[ADD43_2:%.*]] = add nsw i32 [[SHL42_2]], [[SUB36_2]]
		; CHECK-NEXT: [[ADD44_2:%.*]] = add nsw i32 [[ADD19_2]], [[ADD_2]]
		; CHECK-NEXT: [[SUB45_2:%.*]] = sub nsw i32 [[ADD_2]], [[ADD19_2]]
		; CHECK-NEXT: [[ADD46_2:%.*]] = add nsw i32 [[ADD43_2]], [[ADD31_2]]
		; CHECK-NEXT: [[SUB47_2:%.*]] = sub nsw i32 [[ADD31_2]], [[ADD43_2]]
		; CHECK-NEXT: [[ADD48_2:%.*]] = add nsw i32 [[ADD46_2]], [[ADD44_2]]
		; CHECK-NEXT: [[SUB51_2:%.*]] = sub nsw i32 [[ADD44_2]], [[ADD46_2]]
		; CHECK-NEXT: [[ADD55_2:%.*]] = add nsw i32 [[SUB47_2]], [[SUB45_2]]
		; CHECK-NEXT: [[SUB59_2:%.*]] = sub nsw i32 [[SUB45_2]], [[SUB47_2]]
; CHECK-NEXT: [[ADD_PTR_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_1]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64_2:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_1]], i64 [[IDX_EXT63]]
		; CHECK-NEXT: [[TMP48:%.*]] = load i8, ptr [[ADD_PTR_2]], align 1
		; CHECK-NEXT: [[CONV_3:%.*]] = zext i8 [[TMP48]] to i32
		; CHECK-NEXT: [[TMP49:%.*]] = load i8, ptr [[ADD_PTR64_2]], align 1
		; CHECK-NEXT: [[CONV2_3:%.*]] = zext i8 [[TMP49]] to i32
		; CHECK-NEXT: [[SUB_3:%.*]] = sub nsw i32 [[CONV_3]], [[CONV2_3]]
; CHECK-NEXT: [[ARRAYIDX3_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 4
		; CHECK-NEXT: [[TMP50:%.*]] = load i8, ptr [[ARRAYIDX3_3]], align 1
		; CHECK-NEXT: [[CONV4_3:%.*]] = zext i8 [[TMP50]] to i32
; CHECK-NEXT: [[ARRAYIDX5_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 4
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i8>, ptr [[P1]], align 1		; CHECK-NEXT: [[TMP51:%.*]] = load i8, ptr [[ARRAYIDX5_3]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i8>, ptr [[P2]], align 1		; CHECK-NEXT: [[CONV6_3:%.*]] = zext i8 [[TMP51]] to i32
; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3]], align 1		; CHECK-NEXT: [[SUB7_3:%.*]] = sub nsw i32 [[CONV4_3]], [[CONV6_3]]
; CHECK-NEXT: [[TMP7:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5]], align 1		; CHECK-NEXT: [[SHL_3:%.*]] = shl nsw i32 [[SUB7_3]], 16
; CHECK-NEXT: [[TMP9:%.*]] = load <4 x i8>, ptr [[ADD_PTR]], align 1		; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 [[SHL_3]], [[SUB_3]]
; CHECK-NEXT: [[TMP11:%.*]] = load <4 x i8>, ptr [[ADD_PTR64]], align 1		; CHECK-NEXT: [[ARRAYIDX8_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 1
; CHECK-NEXT: [[TMP13:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_1]], align 1		; CHECK-NEXT: [[TMP52:%.*]] = load i8, ptr [[ARRAYIDX8_3]], align 1
; CHECK-NEXT: [[TMP15:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_1]], align 1		; CHECK-NEXT: [[CONV9_3:%.*]] = zext i8 [[TMP52]] to i32
; CHECK-NEXT: [[TMP17:%.*]] = load <4 x i8>, ptr [[ADD_PTR_1]], align 1		; CHECK-NEXT: [[ARRAYIDX10_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 1
; CHECK-NEXT: [[TMP19:%.*]] = load <4 x i8>, ptr [[ADD_PTR64_1]], align 1		; CHECK-NEXT: [[TMP53:%.*]] = load i8, ptr [[ARRAYIDX10_3]], align 1
; CHECK-NEXT: [[TMP21:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_2]], align 1		; CHECK-NEXT: [[CONV11_3:%.*]] = zext i8 [[TMP53]] to i32
; CHECK-NEXT: [[TMP23:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_2]], align 1		; CHECK-NEXT: [[SUB12_3:%.*]] = sub nsw i32 [[CONV9_3]], [[CONV11_3]]
; CHECK-NEXT: [[TMP25:%.*]] = load <4 x i8>, ptr [[ADD_PTR_2]], align 1		; CHECK-NEXT: [[ARRAYIDX13_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 5
; CHECK-NEXT: [[TMP26:%.*]] = shufflevector <4 x i8> [[TMP25]], <4 x i8> [[TMP17]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP54:%.*]] = load i8, ptr [[ARRAYIDX13_3]], align 1
; CHECK-NEXT: [[TMP27:%.*]] = shufflevector <4 x i8> [[TMP9]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[CONV14_3:%.*]] = zext i8 [[TMP54]] to i32
; CHECK-NEXT: [[TMP28:%.*]] = shufflevector <16 x i8> [[TMP26]], <16 x i8> [[TMP27]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX15_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 5
; CHECK-NEXT: [[TMP29:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP55:%.*]] = load i8, ptr [[ARRAYIDX15_3]], align 1
; CHECK-NEXT: [[TMP30:%.*]] = shufflevector <16 x i8> [[TMP28]], <16 x i8> [[TMP29]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[CONV16_3:%.*]] = zext i8 [[TMP55]] to i32
; CHECK-NEXT: [[TMP31:%.*]] = zext <16 x i8> [[TMP30]] to <16 x i32>		; CHECK-NEXT: [[SUB17_3:%.*]] = sub nsw i32 [[CONV14_3]], [[CONV16_3]]
; CHECK-NEXT: [[TMP33:%.*]] = load <4 x i8>, ptr [[ADD_PTR64_2]], align 1		; CHECK-NEXT: [[SHL18_3:%.*]] = shl nsw i32 [[SUB17_3]], 16
; CHECK-NEXT: [[TMP34:%.*]] = shufflevector <4 x i8> [[TMP33]], <4 x i8> [[TMP19]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ADD19_3:%.*]] = add nsw i32 [[SHL18_3]], [[SUB12_3]]
; CHECK-NEXT: [[TMP35:%.*]] = shufflevector <4 x i8> [[TMP11]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX20_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 2
; CHECK-NEXT: [[TMP36:%.*]] = shufflevector <16 x i8> [[TMP34]], <16 x i8> [[TMP35]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP56:%.*]] = load i8, ptr [[ARRAYIDX20_3]], align 1
; CHECK-NEXT: [[TMP37:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[CONV21_3:%.*]] = zext i8 [[TMP56]] to i32
; CHECK-NEXT: [[TMP38:%.*]] = shufflevector <16 x i8> [[TMP36]], <16 x i8> [[TMP37]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[ARRAYIDX22_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 2
; CHECK-NEXT: [[TMP39:%.*]] = zext <16 x i8> [[TMP38]] to <16 x i32>		; CHECK-NEXT: [[TMP57:%.*]] = load i8, ptr [[ARRAYIDX22_3]], align 1
; CHECK-NEXT: [[TMP40:%.*]] = sub nsw <16 x i32> [[TMP31]], [[TMP39]]		; CHECK-NEXT: [[CONV23_3:%.*]] = zext i8 [[TMP57]] to i32
; CHECK-NEXT: [[TMP42:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_3]], align 1		; CHECK-NEXT: [[SUB24_3:%.*]] = sub nsw i32 [[CONV21_3]], [[CONV23_3]]
; CHECK-NEXT: [[TMP43:%.*]] = shufflevector <4 x i8> [[TMP42]], <4 x i8> [[TMP21]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX25_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 6
; CHECK-NEXT: [[TMP44:%.*]] = shufflevector <4 x i8> [[TMP13]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP58:%.*]] = load i8, ptr [[ARRAYIDX25_3]], align 1
; CHECK-NEXT: [[TMP45:%.*]] = shufflevector <16 x i8> [[TMP43]], <16 x i8> [[TMP44]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[CONV26_3:%.*]] = zext i8 [[TMP58]] to i32
; CHECK-NEXT: [[TMP46:%.*]] = shufflevector <4 x i8> [[TMP5]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX27_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 6
; CHECK-NEXT: [[TMP47:%.*]] = shufflevector <16 x i8> [[TMP45]], <16 x i8> [[TMP46]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[TMP59:%.*]] = load i8, ptr [[ARRAYIDX27_3]], align 1
; CHECK-NEXT: [[TMP48:%.*]] = zext <16 x i8> [[TMP47]] to <16 x i32>		; CHECK-NEXT: [[CONV28_3:%.*]] = zext i8 [[TMP59]] to i32
; CHECK-NEXT: [[TMP50:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_3]], align 1		; CHECK-NEXT: [[SUB29_3:%.*]] = sub nsw i32 [[CONV26_3]], [[CONV28_3]]
; CHECK-NEXT: [[TMP51:%.*]] = shufflevector <4 x i8> [[TMP50]], <4 x i8> [[TMP23]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SHL30_3:%.*]] = shl nsw i32 [[SUB29_3]], 16
; CHECK-NEXT: [[TMP52:%.*]] = shufflevector <4 x i8> [[TMP15]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ADD31_3:%.*]] = add nsw i32 [[SHL30_3]], [[SUB24_3]]
; CHECK-NEXT: [[TMP53:%.*]] = shufflevector <16 x i8> [[TMP51]], <16 x i8> [[TMP52]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX32_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 3
; CHECK-NEXT: [[TMP54:%.*]] = shufflevector <4 x i8> [[TMP7]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP60:%.*]] = load i8, ptr [[ARRAYIDX32_3]], align 1
; CHECK-NEXT: [[TMP55:%.*]] = shufflevector <16 x i8> [[TMP53]], <16 x i8> [[TMP54]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[CONV33_3:%.*]] = zext i8 [[TMP60]] to i32
; CHECK-NEXT: [[TMP56:%.*]] = zext <16 x i8> [[TMP55]] to <16 x i32>		; CHECK-NEXT: [[ARRAYIDX34_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 3
; CHECK-NEXT: [[TMP57:%.*]] = sub nsw <16 x i32> [[TMP48]], [[TMP56]]		; CHECK-NEXT: [[TMP61:%.*]] = load i8, ptr [[ARRAYIDX34_3]], align 1
; CHECK-NEXT: [[TMP58:%.*]] = shl nsw <16 x i32> [[TMP57]], <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: [[CONV35_3:%.*]] = zext i8 [[TMP61]] to i32
; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i32> [[TMP58]], [[TMP40]]		; CHECK-NEXT: [[SUB36_3:%.*]] = sub nsw i32 [[CONV33_3]], [[CONV35_3]]
; CHECK-NEXT: [[TMP60:%.*]] = shufflevector <16 x i32> [[TMP59]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>		; CHECK-NEXT: [[ARRAYIDX37_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR_2]], i64 7
; CHECK-NEXT: [[TMP61:%.*]] = add nsw <16 x i32> [[TMP59]], [[TMP60]]		; CHECK-NEXT: [[TMP62:%.*]] = load i8, ptr [[ARRAYIDX37_3]], align 1
; CHECK-NEXT: [[TMP62:%.*]] = sub nsw <16 x i32> [[TMP59]], [[TMP60]]		; CHECK-NEXT: [[CONV38_3:%.*]] = zext i8 [[TMP62]] to i32
; CHECK-NEXT: [[TMP63:%.*]] = shufflevector <16 x i32> [[TMP61]], <16 x i32> [[TMP62]], <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 22, i32 18, i32 26, i32 30, i32 5, i32 1, i32 9, i32 13, i32 20, i32 16, i32 24, i32 28>		; CHECK-NEXT: [[ARRAYIDX39_3:%.*]] = getelementptr inbounds i8, ptr [[ADD_PTR64_2]], i64 7
; CHECK-NEXT: [[TMP64:%.*]] = shufflevector <16 x i32> [[TMP63]], <16 x i32> poison, <16 x i32> <i32 9, i32 8, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 1, i32 0, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[TMP63:%.*]] = load i8, ptr [[ARRAYIDX39_3]], align 1
; CHECK-NEXT: [[TMP65:%.*]] = add nsw <16 x i32> [[TMP63]], [[TMP64]]		; CHECK-NEXT: [[CONV40_3:%.*]] = zext i8 [[TMP63]] to i32
; CHECK-NEXT: [[TMP66:%.*]] = sub nsw <16 x i32> [[TMP63]], [[TMP64]]		; CHECK-NEXT: [[SUB41_3:%.*]] = sub nsw i32 [[CONV38_3]], [[CONV40_3]]
; CHECK-NEXT: [[TMP67:%.*]] = shufflevector <16 x i32> [[TMP65]], <16 x i32> [[TMP66]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; CHECK-NEXT: [[SHL42_3:%.*]] = shl nsw i32 [[SUB41_3]], 16
; CHECK-NEXT: [[TMP68:%.*]] = shufflevector <16 x i32> [[TMP67]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>		; CHECK-NEXT: [[ADD43_3:%.*]] = add nsw i32 [[SHL42_3]], [[SUB36_3]]
; CHECK-NEXT: [[TMP69:%.*]] = add nsw <16 x i32> [[TMP67]], [[TMP68]]		; CHECK-NEXT: [[ADD44_3:%.*]] = add nsw i32 [[ADD19_3]], [[ADD_3]]
; CHECK-NEXT: [[TMP70:%.*]] = sub nsw <16 x i32> [[TMP67]], [[TMP68]]		; CHECK-NEXT: [[SUB45_3:%.*]] = sub nsw i32 [[ADD_3]], [[ADD19_3]]
; CHECK-NEXT: [[TMP71:%.*]] = shufflevector <16 x i32> [[TMP69]], <16 x i32> [[TMP70]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 20, i32 5, i32 6, i32 23, i32 24, i32 9, i32 10, i32 27, i32 28, i32 13, i32 14, i32 31>		; CHECK-NEXT: [[ADD46_3:%.*]] = add nsw i32 [[ADD43_3]], [[ADD31_3]]
; CHECK-NEXT: [[TMP72:%.*]] = shufflevector <16 x i32> [[TMP71]], <16 x i32> poison, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>		; CHECK-NEXT: [[SUB47_3:%.*]] = sub nsw i32 [[ADD31_3]], [[ADD43_3]]
; CHECK-NEXT: [[TMP73:%.*]] = add nsw <16 x i32> [[TMP71]], [[TMP72]]		; CHECK-NEXT: [[ADD48_3:%.*]] = add nsw i32 [[ADD46_3]], [[ADD44_3]]
; CHECK-NEXT: [[TMP74:%.*]] = sub nsw <16 x i32> [[TMP71]], [[TMP72]]		; CHECK-NEXT: [[SUB51_3:%.*]] = sub nsw i32 [[ADD44_3]], [[ADD46_3]]
; CHECK-NEXT: [[TMP75:%.*]] = shufflevector <16 x i32> [[TMP73]], <16 x i32> [[TMP74]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>		; CHECK-NEXT: [[ADD55_3:%.*]] = add nsw i32 [[SUB47_3]], [[SUB45_3]]
; CHECK-NEXT: [[TMP76:%.*]] = lshr <16 x i32> [[TMP75]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[SUB59_3:%.*]] = sub nsw i32 [[SUB45_3]], [[SUB47_3]]
; CHECK-NEXT: [[TMP77:%.*]] = and <16 x i32> [[TMP76]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[ADD78:%.*]] = add nsw i32 [[ADD48_1]], [[ADD48]]
; CHECK-NEXT: [[TMP78:%.*]] = mul nuw <16 x i32> [[TMP77]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[SUB86:%.*]] = sub nsw i32 [[ADD48]], [[ADD48_1]]
; CHECK-NEXT: [[TMP79:%.*]] = add <16 x i32> [[TMP78]], [[TMP75]]		; CHECK-NEXT: [[ADD94:%.*]] = add nsw i32 [[ADD48_3]], [[ADD48_2]]
; CHECK-NEXT: [[TMP80:%.*]] = xor <16 x i32> [[TMP79]], [[TMP78]]		; CHECK-NEXT: [[SUB102:%.*]] = sub nsw i32 [[ADD48_2]], [[ADD48_3]]
; CHECK-NEXT: [[TMP81:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP80]])		; CHECK-NEXT: [[ADD103:%.*]] = add nsw i32 [[ADD94]], [[ADD78]]
; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[TMP81]], 65535		; CHECK-NEXT: [[SUB104:%.*]] = sub nsw i32 [[ADD78]], [[ADD94]]
; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP81]], 16		; CHECK-NEXT: [[ADD105:%.*]] = add nsw i32 [[SUB102]], [[SUB86]]
		; CHECK-NEXT: [[SUB106:%.*]] = sub nsw i32 [[SUB86]], [[SUB102]]
		; CHECK-NEXT: [[SHR_I:%.*]] = lshr i32 [[ADD103]], 15
		; CHECK-NEXT: [[AND_I:%.*]] = and i32 [[SHR_I]], 65537
		; CHECK-NEXT: [[MUL_I:%.*]] = mul nuw i32 [[AND_I]], 65535
		; CHECK-NEXT: [[ADD_I:%.*]] = add i32 [[MUL_I]], [[ADD103]]
		; CHECK-NEXT: [[XOR_I:%.*]] = xor i32 [[ADD_I]], [[MUL_I]]
		; CHECK-NEXT: [[SHR_I184:%.*]] = lshr i32 [[ADD105]], 15
		; CHECK-NEXT: [[AND_I185:%.*]] = and i32 [[SHR_I184]], 65537
		; CHECK-NEXT: [[MUL_I186:%.*]] = mul nuw i32 [[AND_I185]], 65535
		; CHECK-NEXT: [[ADD_I187:%.*]] = add i32 [[MUL_I186]], [[ADD105]]
		; CHECK-NEXT: [[XOR_I188:%.*]] = xor i32 [[ADD_I187]], [[MUL_I186]]
		; CHECK-NEXT: [[SHR_I189:%.*]] = lshr i32 [[SUB104]], 15
		; CHECK-NEXT: [[AND_I190:%.*]] = and i32 [[SHR_I189]], 65537
		; CHECK-NEXT: [[MUL_I191:%.*]] = mul nuw i32 [[AND_I190]], 65535
		; CHECK-NEXT: [[ADD_I192:%.*]] = add i32 [[MUL_I191]], [[SUB104]]
		; CHECK-NEXT: [[XOR_I193:%.*]] = xor i32 [[ADD_I192]], [[MUL_I191]]
		; CHECK-NEXT: [[SHR_I194:%.*]] = lshr i32 [[SUB106]], 15
		; CHECK-NEXT: [[AND_I195:%.*]] = and i32 [[SHR_I194]], 65537
		; CHECK-NEXT: [[MUL_I196:%.*]] = mul nuw i32 [[AND_I195]], 65535
		; CHECK-NEXT: [[ADD_I197:%.*]] = add i32 [[MUL_I196]], [[SUB106]]
		; CHECK-NEXT: [[XOR_I198:%.*]] = xor i32 [[ADD_I197]], [[MUL_I196]]
		; CHECK-NEXT: [[ADD110:%.*]] = add i32 [[XOR_I188]], [[XOR_I]]
		; CHECK-NEXT: [[ADD112:%.*]] = add i32 [[ADD110]], [[XOR_I193]]
		; CHECK-NEXT: [[ADD113:%.*]] = add i32 [[ADD112]], [[XOR_I198]]
		; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 [[ADD55_1]], [[ADD55]]
		; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 [[ADD55]], [[ADD55_1]]
		; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 [[ADD55_3]], [[ADD55_2]]
		; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 [[ADD55_2]], [[ADD55_3]]
		; CHECK-NEXT: [[ADD103_1:%.*]] = add nsw i32 [[ADD94_1]], [[ADD78_1]]
		; CHECK-NEXT: [[SUB104_1:%.*]] = sub nsw i32 [[ADD78_1]], [[ADD94_1]]
		; CHECK-NEXT: [[ADD105_1:%.*]] = add nsw i32 [[SUB102_1]], [[SUB86_1]]
		; CHECK-NEXT: [[SUB106_1:%.*]] = sub nsw i32 [[SUB86_1]], [[SUB102_1]]
		; CHECK-NEXT: [[SHR_I_1:%.*]] = lshr i32 [[ADD103_1]], 15
		; CHECK-NEXT: [[AND_I_1:%.*]] = and i32 [[SHR_I_1]], 65537
		; CHECK-NEXT: [[MUL_I_1:%.*]] = mul nuw i32 [[AND_I_1]], 65535
		; CHECK-NEXT: [[ADD_I_1:%.*]] = add i32 [[MUL_I_1]], [[ADD103_1]]
		; CHECK-NEXT: [[XOR_I_1:%.*]] = xor i32 [[ADD_I_1]], [[MUL_I_1]]
		; CHECK-NEXT: [[SHR_I184_1:%.*]] = lshr i32 [[ADD105_1]], 15
		; CHECK-NEXT: [[AND_I185_1:%.*]] = and i32 [[SHR_I184_1]], 65537
		; CHECK-NEXT: [[MUL_I186_1:%.*]] = mul nuw i32 [[AND_I185_1]], 65535
		; CHECK-NEXT: [[ADD_I187_1:%.*]] = add i32 [[MUL_I186_1]], [[ADD105_1]]
		; CHECK-NEXT: [[XOR_I188_1:%.*]] = xor i32 [[ADD_I187_1]], [[MUL_I186_1]]
		; CHECK-NEXT: [[SHR_I189_1:%.*]] = lshr i32 [[SUB104_1]], 15
		; CHECK-NEXT: [[AND_I190_1:%.*]] = and i32 [[SHR_I189_1]], 65537
		; CHECK-NEXT: [[MUL_I191_1:%.*]] = mul nuw i32 [[AND_I190_1]], 65535
		; CHECK-NEXT: [[ADD_I192_1:%.*]] = add i32 [[MUL_I191_1]], [[SUB104_1]]
		; CHECK-NEXT: [[XOR_I193_1:%.*]] = xor i32 [[ADD_I192_1]], [[MUL_I191_1]]
		; CHECK-NEXT: [[SHR_I194_1:%.*]] = lshr i32 [[SUB106_1]], 15
		; CHECK-NEXT: [[AND_I195_1:%.*]] = and i32 [[SHR_I194_1]], 65537
		; CHECK-NEXT: [[MUL_I196_1:%.*]] = mul nuw i32 [[AND_I195_1]], 65535
		; CHECK-NEXT: [[ADD_I197_1:%.*]] = add i32 [[MUL_I196_1]], [[SUB106_1]]
		; CHECK-NEXT: [[XOR_I198_1:%.*]] = xor i32 [[ADD_I197_1]], [[MUL_I196_1]]
		; CHECK-NEXT: [[ADD108_1:%.*]] = add i32 [[XOR_I188_1]], [[ADD113]]
		; CHECK-NEXT: [[ADD110_1:%.*]] = add i32 [[ADD108_1]], [[XOR_I_1]]
		; CHECK-NEXT: [[ADD112_1:%.*]] = add i32 [[ADD110_1]], [[XOR_I193_1]]
		; CHECK-NEXT: [[ADD113_1:%.*]] = add i32 [[ADD112_1]], [[XOR_I198_1]]
		; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 [[SUB51_1]], [[SUB51]]
		; CHECK-NEXT: [[SUB86_2:%.*]] = sub nsw i32 [[SUB51]], [[SUB51_1]]
		; CHECK-NEXT: [[ADD94_2:%.*]] = add nsw i32 [[SUB51_3]], [[SUB51_2]]
		; CHECK-NEXT: [[SUB102_2:%.*]] = sub nsw i32 [[SUB51_2]], [[SUB51_3]]
		; CHECK-NEXT: [[ADD103_2:%.*]] = add nsw i32 [[ADD94_2]], [[ADD78_2]]
		; CHECK-NEXT: [[SUB104_2:%.*]] = sub nsw i32 [[ADD78_2]], [[ADD94_2]]
		; CHECK-NEXT: [[ADD105_2:%.*]] = add nsw i32 [[SUB102_2]], [[SUB86_2]]
		; CHECK-NEXT: [[SUB106_2:%.*]] = sub nsw i32 [[SUB86_2]], [[SUB102_2]]
		; CHECK-NEXT: [[SHR_I_2:%.*]] = lshr i32 [[ADD103_2]], 15
		; CHECK-NEXT: [[AND_I_2:%.*]] = and i32 [[SHR_I_2]], 65537
		; CHECK-NEXT: [[MUL_I_2:%.*]] = mul nuw i32 [[AND_I_2]], 65535
		; CHECK-NEXT: [[ADD_I_2:%.*]] = add i32 [[MUL_I_2]], [[ADD103_2]]
		; CHECK-NEXT: [[XOR_I_2:%.*]] = xor i32 [[ADD_I_2]], [[MUL_I_2]]
		; CHECK-NEXT: [[SHR_I184_2:%.*]] = lshr i32 [[ADD105_2]], 15
		; CHECK-NEXT: [[AND_I185_2:%.*]] = and i32 [[SHR_I184_2]], 65537
		; CHECK-NEXT: [[MUL_I186_2:%.*]] = mul nuw i32 [[AND_I185_2]], 65535
		; CHECK-NEXT: [[ADD_I187_2:%.*]] = add i32 [[MUL_I186_2]], [[ADD105_2]]
		; CHECK-NEXT: [[XOR_I188_2:%.*]] = xor i32 [[ADD_I187_2]], [[MUL_I186_2]]
		; CHECK-NEXT: [[SHR_I189_2:%.*]] = lshr i32 [[SUB104_2]], 15
		; CHECK-NEXT: [[AND_I190_2:%.*]] = and i32 [[SHR_I189_2]], 65537
		; CHECK-NEXT: [[MUL_I191_2:%.*]] = mul nuw i32 [[AND_I190_2]], 65535
		; CHECK-NEXT: [[ADD_I192_2:%.*]] = add i32 [[MUL_I191_2]], [[SUB104_2]]
		; CHECK-NEXT: [[XOR_I193_2:%.*]] = xor i32 [[ADD_I192_2]], [[MUL_I191_2]]
		; CHECK-NEXT: [[SHR_I194_2:%.*]] = lshr i32 [[SUB106_2]], 15
		; CHECK-NEXT: [[AND_I195_2:%.*]] = and i32 [[SHR_I194_2]], 65537
		; CHECK-NEXT: [[MUL_I196_2:%.*]] = mul nuw i32 [[AND_I195_2]], 65535
		; CHECK-NEXT: [[ADD_I197_2:%.*]] = add i32 [[MUL_I196_2]], [[SUB106_2]]
		; CHECK-NEXT: [[XOR_I198_2:%.*]] = xor i32 [[ADD_I197_2]], [[MUL_I196_2]]
		; CHECK-NEXT: [[ADD108_2:%.*]] = add i32 [[XOR_I188_2]], [[ADD113_1]]
		; CHECK-NEXT: [[ADD110_2:%.*]] = add i32 [[ADD108_2]], [[XOR_I_2]]
		; CHECK-NEXT: [[ADD112_2:%.*]] = add i32 [[ADD110_2]], [[XOR_I193_2]]
		; CHECK-NEXT: [[ADD113_2:%.*]] = add i32 [[ADD112_2]], [[XOR_I198_2]]
		; CHECK-NEXT: [[ADD78_3:%.*]] = add nsw i32 [[SUB59_1]], [[SUB59]]
		; CHECK-NEXT: [[SUB86_3:%.*]] = sub nsw i32 [[SUB59]], [[SUB59_1]]
		; CHECK-NEXT: [[ADD94_3:%.*]] = add nsw i32 [[SUB59_3]], [[SUB59_2]]
		; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 [[SUB59_2]], [[SUB59_3]]
		; CHECK-NEXT: [[ADD103_3:%.*]] = add nsw i32 [[ADD94_3]], [[ADD78_3]]
		; CHECK-NEXT: [[SUB104_3:%.*]] = sub nsw i32 [[ADD78_3]], [[ADD94_3]]
		; CHECK-NEXT: [[ADD105_3:%.*]] = add nsw i32 [[SUB102_3]], [[SUB86_3]]
		; CHECK-NEXT: [[SUB106_3:%.*]] = sub nsw i32 [[SUB86_3]], [[SUB102_3]]
		; CHECK-NEXT: [[SHR_I_3:%.*]] = lshr i32 [[ADD103_3]], 15
		; CHECK-NEXT: [[AND_I_3:%.*]] = and i32 [[SHR_I_3]], 65537
		; CHECK-NEXT: [[MUL_I_3:%.*]] = mul nuw i32 [[AND_I_3]], 65535
		; CHECK-NEXT: [[ADD_I_3:%.*]] = add i32 [[MUL_I_3]], [[ADD103_3]]
		; CHECK-NEXT: [[XOR_I_3:%.*]] = xor i32 [[ADD_I_3]], [[MUL_I_3]]
		; CHECK-NEXT: [[SHR_I184_3:%.*]] = lshr i32 [[ADD105_3]], 15
		; CHECK-NEXT: [[AND_I185_3:%.*]] = and i32 [[SHR_I184_3]], 65537
		; CHECK-NEXT: [[MUL_I186_3:%.*]] = mul nuw i32 [[AND_I185_3]], 65535
		; CHECK-NEXT: [[ADD_I187_3:%.*]] = add i32 [[MUL_I186_3]], [[ADD105_3]]
		; CHECK-NEXT: [[XOR_I188_3:%.*]] = xor i32 [[ADD_I187_3]], [[MUL_I186_3]]
		; CHECK-NEXT: [[SHR_I189_3:%.*]] = lshr i32 [[SUB104_3]], 15
		; CHECK-NEXT: [[AND_I190_3:%.*]] = and i32 [[SHR_I189_3]], 65537
		; CHECK-NEXT: [[MUL_I191_3:%.*]] = mul nuw i32 [[AND_I190_3]], 65535
		; CHECK-NEXT: [[ADD_I192_3:%.*]] = add i32 [[MUL_I191_3]], [[SUB104_3]]
		; CHECK-NEXT: [[XOR_I193_3:%.*]] = xor i32 [[ADD_I192_3]], [[MUL_I191_3]]
		; CHECK-NEXT: [[SHR_I194_3:%.*]] = lshr i32 [[SUB106_3]], 15
		; CHECK-NEXT: [[AND_I195_3:%.*]] = and i32 [[SHR_I194_3]], 65537
		; CHECK-NEXT: [[MUL_I196_3:%.*]] = mul nuw i32 [[AND_I195_3]], 65535
		; CHECK-NEXT: [[ADD_I197_3:%.*]] = add i32 [[MUL_I196_3]], [[SUB106_3]]
		; CHECK-NEXT: [[XOR_I198_3:%.*]] = xor i32 [[ADD_I197_3]], [[MUL_I196_3]]
		; CHECK-NEXT: [[ADD108_3:%.*]] = add i32 [[XOR_I188_3]], [[ADD113_2]]
		; CHECK-NEXT: [[ADD110_3:%.*]] = add i32 [[ADD108_3]], [[XOR_I_3]]
		; CHECK-NEXT: [[ADD112_3:%.*]] = add i32 [[ADD110_3]], [[XOR_I193_3]]
		; CHECK-NEXT: [[ADD113_3:%.*]] = add i32 [[ADD112_3]], [[XOR_I198_3]]
		; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[ADD113_3]], 65535
		; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ADD113_3]], 16
; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]		; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]
; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1		; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1
; CHECK-NEXT: ret i32 [[SHR120]]		; CHECK-NEXT: ret i32 [[SHR120]]
;		;
entry:		entry:
%idx.ext = sext i32 %st1 to i64		%idx.ext = sext i32 %st1 to i64
%idx.ext63 = sext i32 %st2 to i64		%idx.ext63 = sext i32 %st2 to i64
%0 = load i8, ptr %p1, align 1		%0 = load i8, ptr %p1, align 1
▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

	Show All 12 Lines
	; These operands are coming from 4 loads which are not			; These operands are coming from 4 loads which are not
	; contiguous. The score estimation needs to be corrected, so that these 4 loads			; contiguous. The score estimation needs to be corrected, so that these 4 loads
	; are not selected for vectorization. Instead we should vectorize with			; are not selected for vectorization. Instead we should vectorize with
	; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.			; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.

	define void @s116_modified(ptr %a) {			define void @s116_modified(ptr %a) {
	; CHECK-LABEL: @s116_modified(			; CHECK-LABEL: @s116_modified(
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 1
				; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 2
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds float, ptr [[A]], i64 3			; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds float, ptr [[A]], i64 3
	; CHECK-NEXT: [[LD0:%.*]] = load float, ptr [[A]], align 4			; CHECK-NEXT: [[LD0:%.*]] = load float, ptr [[A]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[GEP1]], align 4			; CHECK-NEXT: [[LD1:%.*]] = load float, ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = load <2 x float>, ptr [[GEP3]], align 4			; CHECK-NEXT: [[LD2:%.*]] = load float, ptr [[GEP2]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0			; CHECK-NEXT: [[MUL0:%.*]] = fmul fast float [[LD0]], [[LD1]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>			; CHECK-NEXT: [[MUL1:%.*]] = fmul fast float [[LD2]], [[LD1]]
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>			; CHECK-NEXT: store float [[MUL0]], ptr [[A]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: store float [[MUL1]], ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP3]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 undef, i32 2, i32 4>			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[LD2]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP10]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 2, i32 3>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP1]], <2 x i32> <i32 0, i32 2>
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <4 x float> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: store <4 x float> [[TMP12]], ptr [[A]], align 4			; CHECK-NEXT: store <2 x float> [[TMP4]], ptr [[GEP2]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep1 = getelementptr inbounds float, ptr %a, i64 1			%gep1 = getelementptr inbounds float, ptr %a, i64 1
	%gep2 = getelementptr inbounds float, ptr %a, i64 2			%gep2 = getelementptr inbounds float, ptr %a, i64 2
	%gep3 = getelementptr inbounds float, ptr %a, i64 3			%gep3 = getelementptr inbounds float, ptr %a, i64 3
	%gep4 = getelementptr inbounds float, ptr %a, i64 4			%gep4 = getelementptr inbounds float, ptr %a, i64 4
	%ld0 = load float, ptr %a			%ld0 = load float, ptr %a
	%ld1 = load float, ptr %gep1			%ld1 = load float, ptr %gep1
	Show All 15 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	Show First 20 Lines • Show All 205 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, ptr [[PTR_1:%.]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, ptr [[PTR_1:%.]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, ptr [[PTR_2:%.]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, ptr [[PTR_2:%.]], align 16
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 0>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 0>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x double> [[TMP0]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[TMP3]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[TMP4]], ptr [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP3]], ptr [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, ptr %ptr.1, align 8			%v.1 = load <9 x double>, ptr %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	▲ Show 20 Lines • Show All 334 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP9:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP9:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP2]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP2]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])
	; CHECK-NEXT: [[OP_RDX:%.]] = and i32 [[TMP0:%.]], [[TMP4]]			; CHECK-NEXT: [[OP_RDX:%.]] = and i32 [[TMP0:%.]], [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP7:%.*]] = and <2 x i32> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = and <2 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP9]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> [[TMP8]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP9]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> [[TMP8]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6		; SSE-NEXT: [[A4:%.]] = extractelement <8 x i32> [[A:%.]], i64 4
		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5
		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i64 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7
; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6		; SSE-NEXT: [[B4:%.]] = extractelement <8 x i32> [[B:%.]], i64 4
		; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i64 5
		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i64 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[AB4]], i64 4
; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i64 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i64 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6		; SSE-NEXT: [[A4:%.]] = extractelement <8 x i32> [[A:%.]], i64 4
		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5
		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i64 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7
; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6		; SSE-NEXT: [[B4:%.]] = extractelement <8 x i32> [[B:%.]], i64 4
		; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i64 5
		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i64 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[AB4]], i64 4
; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i64 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i64 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[A2:%.]] = extractelement <8 x double> [[A:%.]], i32 2
	; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
				; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
				; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
				; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
				; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
				; SLM-NEXT: [[B2:%.]] = extractelement <8 x double> [[B:%.]], i32 2
				; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
				; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
				; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
				; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
				; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
				; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
				; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>			; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>			; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
	; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]			; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]
	; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]			; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]
	; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[TMP4]], double [[C2]], i32 2
	; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
	; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
	; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
	; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
	; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>			; SLM-NEXT: ret <8 x double> [[R7]]
	; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R73]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX512-NEXT: ret <8 x double> [[TMP1]]			; AVX512-NEXT: ret <8 x double> [[TMP1]]
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[A2:%.]] = extractelement <8 x double> [[A:%.]], i32 2
	; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
				; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
				; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
				; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
				; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
				; SLM-NEXT: [[B2:%.]] = extractelement <8 x double> [[B:%.]], i32 2
				; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
				; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
				; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
				; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
				; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
				; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
				; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>			; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>			; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
	; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]			; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]
	; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]			; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]
	; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[TMP4]], double [[C2]], i32 2
	; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3
	; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4
	; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5
	; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6
	; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>			; SLM-NEXT: ret <8 x double> [[R7]]
	; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R73]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX512-NEXT: ret <8 x double> [[TMP1]]			; AVX512-NEXT: ret <8 x double> [[TMP1]]
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/buildvector-nodes-dependency.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64 < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64 < %s \| FileCheck %s

	define double @test() {			define double @test() {
	; CHECK-LABEL: define double @test() {			; CHECK-LABEL: define double @test() {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr null, align 8			; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr null, align 8
	; CHECK-NEXT: br label [[COND_TRUE:%.*]]			; CHECK-NEXT: br label [[COND_TRUE:%.*]]
	; CHECK: cond.true:			; CHECK: cond.true:
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[TMP0]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> zeroinitializer, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> zeroinitializer, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP3]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP3]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP3]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> [[TMP6]], zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> [[TMP6]], zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x double> [[TMP7]], zeroinitializer			; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x double> [[TMP7]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP7]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP7]], zeroinitializer
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> zeroinitializer, [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> zeroinitializer, [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> zeroinitializer, [[TMP10]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> zeroinitializer, [[TMP10]]
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = fsub <2 x double> [[TMP13]], [[TMP2]]			; CHECK-NEXT: [[TMP14:%.*]] = fsub <2 x double> [[TMP13]], [[TMP2]]
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP25]], 0.000000e+00			; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP25]], 0.000000e+00
	; CHECK-NEXT: br i1 [[CMP]], label [[CLEANUP:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[CLEANUP:%.]], label [[IF_END:%.]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[CALL:%.*]] = tail call double @sqrt(double noundef [[TMP25]])			; CHECK-NEXT: [[CALL:%.*]] = tail call double @sqrt(double noundef [[TMP25]])
	; CHECK-NEXT: [[FNEG87:%.*]] = fneg double [[TMP12]]			; CHECK-NEXT: [[FNEG87:%.*]] = fneg double [[TMP12]]
	; CHECK-NEXT: [[MUL88:%.*]] = fmul double [[TMP4]], 2.000000e+00			; CHECK-NEXT: [[MUL88:%.*]] = fmul double [[TMP4]], 2.000000e+00
	; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1
	; CHECK-NEXT: [[TMP28:%.*]] = insertelement <2 x double> poison, double [[CALL]], i32 0			; CHECK-NEXT: [[TMP28:%.*]] = shufflevector <2 x double> [[TMP27]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1			; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1
	; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]			; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0			; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP31]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP31]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP32:%.*]] = fdiv <2 x double> [[TMP30]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP32:%.*]] = fdiv <2 x double> [[TMP30]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP33:%.*]] = extractelement <2 x double> [[TMP32]], i32 1			; CHECK-NEXT: [[TMP33:%.*]] = extractelement <2 x double> [[TMP32]], i32 1
	; CHECK-NEXT: [[CMP93:%.*]] = fcmp olt double [[TMP33]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP93:%.*]] = fcmp olt double [[TMP33]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x double> [[TMP32]], i32 0			; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x double> [[TMP32]], i32 0
	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show All 10 Lines
	@cle32 = external unnamed_addr global [32 x i32], align 16			@cle32 = external unnamed_addr global [32 x i32], align 16


	; Check that we correctly detect a splat/broadcast by leveraging the			; Check that we correctly detect a splat/broadcast by leveraging the
	; commutativity property of `xor`.			; commutativity property of `xor`.

	define void @splat(i8 %a, i8 %b, i8 %c) {			define void @splat(i8 %a, i8 %b, i8 %c) {
	; SSE-LABEL: @splat(			; SSE-LABEL: @splat(
	; SSE-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0			; SSE-NEXT: [[TMP1:%.]] = xor i8 [[C:%.]], [[A:%.*]]
	; SSE-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1			; SSE-NEXT: store i8 [[TMP1]], ptr @cle, align 16
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; SSE-NEXT: [[TMP2:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; SSE-NEXT: store i8 [[TMP2]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 1), align 1
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer			; SSE-NEXT: [[TMP3:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: [[TMP6:%.*]] = xor <16 x i8> [[TMP3]], [[TMP5]]			; SSE-NEXT: store i8 [[TMP3]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 2), align 1
	; SSE-NEXT: store <16 x i8> [[TMP6]], ptr @cle, align 16			; SSE-NEXT: [[TMP4:%.*]] = xor i8 [[A]], [[C]]
				; SSE-NEXT: store i8 [[TMP4]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 3), align 1
				; SSE-NEXT: [[TMP5:%.*]] = xor i8 [[C]], [[A]]
				; SSE-NEXT: store i8 [[TMP5]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 4), align 1
				; SSE-NEXT: [[TMP6:%.]] = xor i8 [[C]], [[B:%.]]
				; SSE-NEXT: store i8 [[TMP6]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 5), align 1
				; SSE-NEXT: [[TMP7:%.*]] = xor i8 [[C]], [[A]]
				; SSE-NEXT: store i8 [[TMP7]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 6), align 1
				; SSE-NEXT: [[TMP8:%.*]] = xor i8 [[C]], [[B]]
				; SSE-NEXT: store i8 [[TMP8]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 7), align 1
				; SSE-NEXT: [[TMP9:%.*]] = insertelement <8 x i8> poison, i8 [[A]], i32 0
				; SSE-NEXT: [[TMP10:%.*]] = shufflevector <8 x i8> [[TMP9]], <8 x i8> poison, <8 x i32> zeroinitializer
				; SSE-NEXT: [[TMP11:%.*]] = insertelement <8 x i8> poison, i8 [[C]], i32 0
				; SSE-NEXT: [[TMP12:%.*]] = shufflevector <8 x i8> [[TMP11]], <8 x i8> poison, <8 x i32> zeroinitializer
				; SSE-NEXT: [[TMP13:%.*]] = xor <8 x i8> [[TMP10]], [[TMP12]]
				; SSE-NEXT: store <8 x i8> [[TMP13]], ptr getelementptr inbounds ([32 x i8], ptr @cle, i64 0, i64 8), align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @splat(			; AVX-LABEL: @splat(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1			; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; AVX-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; AVX-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @same_opcode_on_one_side(			; AVX-LABEL: @same_opcode_on_one_side(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
	; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0			; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer			; AVX-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
	; AVX-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]			; AVX-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
	; AVX-NEXT: [[TMP6:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[B:%.]], i32 1			; AVX-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 undef, i32 4, i32 0>
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[C]], i32 2			; AVX-NEXT: [[TMP7:%.]] = insertelement <4 x i32> [[TMP6]], i32 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 0>			; AVX-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP5]], [[TMP7]]
	; AVX-NEXT: [[TMP9:%.*]] = xor <4 x i32> [[TMP5]], [[TMP8]]			; AVX-NEXT: store <4 x i32> [[TMP8]], ptr @cle32, align 16
	; AVX-NEXT: store <4 x i32> [[TMP9]], ptr @cle32, align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%add1 = add i32 %c, %a			%add1 = add i32 %c, %a
	%add2 = add i32 %c, %a			%add2 = add i32 %c, %a
	%add3 = add i32 %a, %c			%add3 = add i32 %a, %c
	%add4 = add i32 %c, %a			%add4 = add i32 %c, %a
	%1 = xor i32 %add1, %a			%1 = xor i32 %add1, %a
	store i32 %1, ptr @cle32, align 16			store i32 %1, ptr @cle32, align 16
	%2 = xor i32 %b, %add2			%2 = xor i32 %b, %add2
	store i32 %2, ptr getelementptr inbounds ([32 x i32], ptr @cle32, i64 0, i64 1)			store i32 %2, ptr getelementptr inbounds ([32 x i32], ptr @cle32, i64 0, i64 1)
	%3 = xor i32 %c, %add3			%3 = xor i32 %c, %add3
	store i32 %3, ptr getelementptr inbounds ([32 x i32], ptr @cle32, i64 0, i64 2)			store i32 %3, ptr getelementptr inbounds ([32 x i32], ptr @cle32, i64 0, i64 2)
	%4 = xor i32 %a, %add4			%4 = xor i32 %a, %add4
	store i32 %4, ptr getelementptr inbounds ([32 x i32], ptr @cle32, i64 0, i64 3)			store i32 %4, ptr getelementptr inbounds ([32 x i32], ptr @cle32, i64 0, i64 3)
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/crash_clear_undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-- -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-- -mcpu=corei7 \| FileCheck %s
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

	define i1 @foo() {			define i1 @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr null, align 4			; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr null, align 4
	; CHECK-NEXT: br i1 false, label [[TMP15:%.]], label [[TMP2:%.]]			; CHECK-NEXT: br i1 false, label [[TMP11:%.]], label [[TMP2:%.]]
	; CHECK: 2:			; CHECK: 2:
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> zeroinitializer, i64 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> <float undef, float 0.000000e+00>, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x float> [[TMP3]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x float> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> zeroinitializer, <4 x float> [[TMP5]], <4 x float> [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = fsub <4 x float> [[TMP7]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP7]], zeroinitializer
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	; CHECK-NEXT: [[TMP11:%.*]] = select <4 x i1> zeroinitializer, <4 x float> [[TMP7]], <4 x float> [[TMP10]]			; CHECK-NEXT: br label [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = fsub <4 x float> [[TMP11]], zeroinitializer			; CHECK: 11:
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <4 x float> [[TMP11]], zeroinitializer			; CHECK-NEXT: [[TMP12:%.]] = phi <4 x float> [ [[TMP10]], [[TMP2]] ], [ zeroinitializer, [[TMP0:%.]] ]
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <4 x float> [[TMP12]], <4 x float> [[TMP13]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	; CHECK-NEXT: br label [[TMP15]]
	; CHECK: 15:
	; CHECK-NEXT: [[TMP16:%.]] = phi <4 x float> [ [[TMP14]], [[TMP2]] ], [ zeroinitializer, [[TMP0:%.]] ]
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%1 = load float, ptr null, align 4			%1 = load float, ptr null, align 4
	br i1 false, label %14, label %2			br i1 false, label %14, label %2

	2: ; preds = %0			2: ; preds = %0
	%3 = extractelement <4 x float> zeroinitializer, i64 0			%3 = extractelement <4 x float> zeroinitializer, i64 0
	%4 = fadd float %1, 0.000000e+00			%4 = fadd float %1, 0.000000e+00
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 24 Lines
	; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP5]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP5]], <2 x i32> <i32 0, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP11]], [[TMP9]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP11]], [[TMP9]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> [[TMP5]], <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> [[TMP5]], <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <2 x double> [[TMP13]], undef			; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <2 x double> [[TMP13]], undef
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	entry:
store i16 %conv153, ptr %arrayidx156, align 4		store i16 %conv153, ptr %arrayidx156, align 4
ret void		ret void
}		}

define fastcc void @dct36(ptr %inbuf) {		define fastcc void @dct36(ptr %inbuf) {
; CHECK-LABEL: @dct36(		; CHECK-LABEL: @dct36(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, ptr [[INBUF:%.]], i64 1		; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, ptr [[INBUF:%.]], i64 1
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, ptr [[INBUF]], align 8		; CHECK-NEXT: [[TMP0:%.*]] = load <2 x double>, ptr [[INBUF]], align 8
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP0]], [[TMP1]]
; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[ARRAYIDX44]], align 8		; CHECK-NEXT: store <2 x double> [[TMP2]], ptr [[ARRAYIDX44]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%arrayidx41 = getelementptr inbounds double, ptr %inbuf, i64 2		%arrayidx41 = getelementptr inbounds double, ptr %inbuf, i64 2
%arrayidx44 = getelementptr inbounds double, ptr %inbuf, i64 1		%arrayidx44 = getelementptr inbounds double, ptr %inbuf, i64 1
%0 = load double, ptr %arrayidx44, align 8		%0 = load double, ptr %arrayidx44, align 8
%add46 = fadd double %0, undef		%add46 = fadd double %0, undef
store double %add46, ptr %arrayidx41, align 8		store double %add46, ptr %arrayidx41, align 8
%1 = load double, ptr %inbuf, align 8		%1 = load double, ptr %inbuf, align 8
%add49 = fadd double %1, %0		%add49 = fadd double %1, %0
store double %add49, ptr %arrayidx44, align 8		store double %add49, ptr %arrayidx44, align 8
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/crash_netbsd_decompress.ll

	Show All 17 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = load <2 x i32>, ptr @b, align 4			; CHECK-NEXT: [[TMP0:%.*]] = load <2 x i32>, ptr @b, align 4
	; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr @d, align 4			; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr @d, align 4
	; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 [[TMP1]], 0			; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 [[TMP1]], 0
	; CHECK-NEXT: br i1 [[COND]], label [[SW_BB:%.]], label [[SAVE_STATE_AND_RETURN:%.]]			; CHECK-NEXT: br i1 [[COND]], label [[SW_BB:%.]], label [[SAVE_STATE_AND_RETURN:%.]]
	; CHECK: sw.bb:			; CHECK: sw.bb:
	; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr @c, align 4			; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr @c, align 4
	; CHECK-NEXT: [[AND:%.*]] = and i32 [[TMP2]], 7			; CHECK-NEXT: [[AND:%.*]] = and i32 [[TMP2]], 7
	; CHECK-NEXT: store i32 [[AND]], ptr @a, align 4			; CHECK-NEXT: store i32 [[AND]], ptr @a, align 4
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> <i32 poison, i32 0>, <2 x i32> [[TMP0]], <2 x i32> <i32 2, i32 1>			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> [[TMP0]], i32 0, i32 1
	; CHECK-NEXT: switch i32 [[AND]], label [[IF_END:%.*]] [			; CHECK-NEXT: switch i32 [[AND]], label [[IF_END:%.*]] [
	; CHECK-NEXT: i32 7, label [[SAVE_STATE_AND_RETURN]]			; CHECK-NEXT: i32 7, label [[SAVE_STATE_AND_RETURN]]
	; CHECK-NEXT: i32 0, label [[SAVE_STATE_AND_RETURN]]			; CHECK-NEXT: i32 0, label [[SAVE_STATE_AND_RETURN]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: br label [[SAVE_STATE_AND_RETURN]]			; CHECK-NEXT: br label [[SAVE_STATE_AND_RETURN]]
	; CHECK: save_state_and_return:			; CHECK: save_state_and_return:
	; CHECK-NEXT: [[TMP4:%.]] = phi <2 x i32> [ zeroinitializer, [[IF_END]] ], [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP3]], [[SW_BB]] ], [ [[TMP3]], [[SW_BB]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x i32> [ zeroinitializer, [[IF_END]] ], [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP3]], [[SW_BB]] ], [ [[TMP3]], [[SW_BB]] ]
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	}			}

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]
	; CHECK: if.then38:			; CHECK: if.then38:
	; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY:%.]], ptr undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY:%.]], ptr undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double 6.000000e-01, double poison>, double 6.000000e-02, i32 1			; CHECK-NEXT: store <2 x double> <double 0x3FFA356C1D8A7F76, double 0x3FFDC4F38B38BEF4>, ptr [[AGG_TMP74663_SROA_0_0_IDX]], align 8
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> <double 5.000000e-01, double 8.000000e-01>, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> <double 2.400000e-02, double 0.000000e+00>, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> <double 9.000000e-01, double 9.100000e-01>, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> <double 9.200000e-01, double 9.300000e-01>, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> <double 0x3FEE147AE147AE14, double 0x3FEE666666666666>, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> <double 0x3FEEB851EB851EB8, double 0x3FEF0A3D70A3D70A>, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> <double 0x3FEF5C28F5C28F5C, double 0x3FEFAE147AE147AE>, [[TMP6]]
	; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[AGG_TMP74663_SROA_0_0_IDX]], align 8
	; CHECK-NEXT: br label [[IF_THEN78]]			; CHECK-NEXT: br label [[IF_THEN78]]
	; CHECK: if.then78:			; CHECK: if.then78:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.then78, label %if.then38			br i1 undef, label %if.then78, label %if.then38

	if.then38:			if.then38:
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 13 Lines
	define i32 @test(ptr nocapture %G) {			define i32 @test(ptr nocapture %G) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, ptr [[G:%.]], i64 5			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, ptr [[G:%.]], i64 5
	; CHECK-NEXT: [[TMP0:%.*]] = load <2 x double>, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP0:%.*]] = load <2 x double>, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> [[TMP0]], <double 4.000000e+00, double 3.000000e+00>			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> [[TMP0]], <double 4.000000e+00, double 3.000000e+00>
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 1.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 1.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: store <2 x double> [[TMP2]], ptr [[G]], align 8			; CHECK-NEXT: store <2 x double> [[TMP2]], ptr [[G]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 0
	; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds double, ptr [[G]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds double, ptr [[G]], i64 2
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP0]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP0]], i32 1
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP4]], 4.000000e+00			; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP3]], 4.000000e+00
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP1]], double [[MUL11]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL11]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 7.000000e+00, double 8.000000e+00>			; CHECK-NEXT: store <2 x double> [[TMP5]], ptr [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[ARRAYIDX9]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, ptr %G, i64 5			%arrayidx = getelementptr inbounds double, ptr %G, i64 5
	%0 = load double, ptr %arrayidx, align 8			%0 = load double, ptr %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	%add = fadd double %mul, 1.000000e+00			%add = fadd double %mul, 1.000000e+00
	store double %add, ptr %G, align 8			store double %add, ptr %G, align 8
	▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/gather-extractelements-different-bbs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux -mattr="-avx512pf,+avx512f,+avx512bw" -slp-threshold=-100 -slp-min-tree-size=0 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux -mattr="-avx512pf,+avx512f,+avx512bw" -slp-threshold=-100 -slp-min-tree-size=0 < %s \| FileCheck %s

	define i32 @foo(i32 %a) {			define i32 @foo(i32 %a) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 0, i32 poison>, i32 [[A:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 0, i32 poison>, i32 [[A:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <2 x i32> zeroinitializer, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <2 x i32> zeroinitializer, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: br i1 false, label [[BB5:%.]], label [[BB1:%.]]			; CHECK-NEXT: br i1 false, label [[BB5:%.]], label [[BB1:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = mul <2 x i32> [[TMP1]], <i32 3, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[TMP4]], <i32 3, i32 1>			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i32> [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1			; CHECK-NEXT: [[OP_RDX10:%.*]] = add i32 [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[OP_RDX10:%.*]] = add i32 [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[OP_RDX11:%.*]] = add i32 [[OP_RDX10]], 0			; CHECK-NEXT: [[OP_RDX11:%.*]] = add i32 [[OP_RDX10]], 0
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[P1:%.]] = phi i32 [ [[OP_RDX11]], [[BB1]] ], [ 0, [[BB2:%.]] ]			; CHECK-NEXT: [[P1:%.]] = phi i32 [ [[OP_RDX11]], [[BB1]] ], [ 0, [[BB2:%.]] ]
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP2]], [[TMP8]]			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[TMP2]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP9]])			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])
	; CHECK-NEXT: [[OP_RDX8:%.*]] = add i32 [[TMP10]], 0			; CHECK-NEXT: [[OP_RDX8:%.*]] = add i32 [[TMP9]], 0
	; CHECK-NEXT: [[OP_RDX9:%.*]] = add i32 [[OP_RDX8]], [[TMP3]]			; CHECK-NEXT: [[OP_RDX9:%.*]] = add i32 [[OP_RDX8]], [[TMP3]]
	; CHECK-NEXT: ret i32 [[OP_RDX9]]			; CHECK-NEXT: ret i32 [[OP_RDX9]]
	; CHECK: bb5:			; CHECK: bb5:
	; CHECK-NEXT: br label [[BB4:%.*]]			; CHECK-NEXT: br label [[BB4:%.*]]
	;			;
	entry:			entry:
	%0 = sub nsw i32 0, %a			%0 = sub nsw i32 0, %a
	%local = sub nsw i32 0, 0			%local = sub nsw i32 0, 0
	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, ptr @b, align 4			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, ptr @b, align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (ptr @fn1 to i32), i32 poison>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> <i32 8, i32 poison, i32 ptrtoint (ptr @fn1 to i32), i32 ptrtoint (ptr @fn1 to i32)>, <4 x i32> <i32 4, i32 1, i32 6, i32 7>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[SHUFFLE]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: store <4 x i32> [[TMP4]], ptr @a, align 4
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE1]], ptr @a, align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, ptr @b, align 4			%0 = load i32, ptr @b, align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	%cond = select i1 %cmp, i32 8, i32 0			%cond = select i1 %cmp, i32 8, i32 0
	store i32 %cond, ptr getelementptr inbounds ([4 x i32], ptr @a, i64 0, i32 3), align 4			store i32 %cond, ptr getelementptr inbounds ([4 x i32], ptr @a, i64 0, i32 3), align 4
	%1 = load i32, ptr getelementptr ([4 x i32], ptr @b, i64 0, i32 1), align 4			%1 = load i32, ptr getelementptr ([4 x i32], ptr @b, i64 0, i32 1), align 4
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	;
%add79.i181 = fadd float 2.0, %add78.i180		%add79.i181 = fadd float 2.0, %add78.i180
%mul123.i184 = fmul float %add36.i173, %add79.i181		%mul123.i184 = fmul float %add36.i173, %add79.i181
%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00		%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00
ret i1 %cmp.i185		ret i1 %cmp.i185
}		}


define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {		define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {
; SSE-LABEL: @foo(		; CHECK-LABEL: @foo(
; SSE-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0		; CHECK-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0
; SSE-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]		; CHECK-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; SSE-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0
; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1
; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[VEC]], <4 x float> poison, <2 x i32> <i32 undef, i32 1>
; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[SUB14_I167]], i32 0
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0
; SSE-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0		; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]
; SSE-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>
; SSE-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1		; CHECK-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]
; SSE-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]		; CHECK-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; SSE-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00		; CHECK-NEXT: ret i1 [[CMP_I185]]
; SSE-NEXT: ret i1 [[CMP_I185]]
;
; AVX-LABEL: @foo(
; AVX-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0
; AVX-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; AVX-NEXT: [[FM:%.]] = fmul float [[A:%.]], [[SUB14_I167]]
; AVX-NEXT: [[SUB25_I168:%.]] = fsub float [[FM]], [[B:%.]]
; AVX-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1
; AVX-NEXT: [[ADD36_I173:%.*]] = fadd float [[SUB25_I168]], 1.000000e+01
; AVX-NEXT: [[MUL72_I179:%.]] = fmul float [[C:%.]], [[VECEXT_I276_I169]]
; AVX-NEXT: [[ADD78_I180:%.*]] = fsub float [[MUL72_I179]], 3.000000e+01
; AVX-NEXT: [[ADD79_I181:%.*]] = fadd float 2.000000e+00, [[ADD78_I180]]
; AVX-NEXT: [[MUL123_I184:%.*]] = fmul float [[ADD36_I173]], [[ADD79_I181]]
; AVX-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; AVX-NEXT: ret i1 [[CMP_I185]]
;		;
%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0		%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0
%sub14.i167 = fsub float undef, %vecext.i291.i166		%sub14.i167 = fsub float undef, %vecext.i291.i166
%fm = fmul float %a, %sub14.i167		%fm = fmul float %a, %sub14.i167
%sub25.i168 = fsub float %fm, %b		%sub25.i168 = fsub float %fm, %b
%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1		%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1
%add36.i173 = fadd float %sub25.i168, 10.0		%add36.i173 = fadd float %sub25.i168, 10.0
%mul72.i179 = fmul float %c, %vecext.i276.i169		%mul72.i179 = fmul float %c, %vecext.i276.i169
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; 2-wide splat loads in x86 use a single instruction so they are quite cheap.		; 2-wide splat loads in x86 use a single instruction so they are quite cheap.
define double @splat_loads(ptr %array1, ptr %array2, ptr %ptrA, ptr %ptrB) {		define double @splat_loads(ptr %array1, ptr %array2, ptr %ptrA, ptr %ptrB) {
; SSE-LABEL: @splat_loads(		; SSE-LABEL: @splat_loads(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8		; SSE-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[ARRAY2:%.]], align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[ARRAY2:%.]], align 8
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; SSE-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP0]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP0]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; SSE-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP0]], [[TMP4]]		; SSE-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP3]], [[TMP4]]
; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP3]], [[TMP5]]		; SSE-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
; SSE-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 0		; SSE-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1
; SSE-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 1		; SSE-NEXT: [[ADD3:%.*]] = fadd double [[TMP6]], [[TMP7]]
; SSE-NEXT: [[ADD3:%.*]] = fadd double [[TMP7]], [[TMP8]]
; SSE-NEXT: ret double [[ADD3]]		; SSE-NEXT: ret double [[ADD3]]
;		;
; AVX-LABEL: @splat_loads(		; AVX-LABEL: @splat_loads(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, ptr [[ARRAY2:%.]], i64 1		; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, ptr [[ARRAY2:%.]], i64 1
; AVX-NEXT: [[LD_2_0:%.*]] = load double, ptr [[ARRAY2]], align 8		; AVX-NEXT: [[LD_2_0:%.*]] = load double, ptr [[ARRAY2]], align 8
; AVX-NEXT: [[LD_2_1:%.*]] = load double, ptr [[GEP_2_1]], align 8		; AVX-NEXT: [[LD_2_1:%.*]] = load double, ptr [[GEP_2_1]], align 8
; AVX-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8		; AVX-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8
Show All 35 Lines
; Same as splat_loads() but the splat load has internal uses in the slp graph.		; Same as splat_loads() but the splat load has internal uses in the slp graph.
define double @splat_loads_with_internal_uses(ptr %array1, ptr %array2, ptr %ptrA, ptr %ptrB) {		define double @splat_loads_with_internal_uses(ptr %array1, ptr %array2, ptr %ptrA, ptr %ptrB) {
; SSE-LABEL: @splat_loads_with_internal_uses(		; SSE-LABEL: @splat_loads_with_internal_uses(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8		; SSE-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[ARRAY2:%.]], align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[ARRAY2:%.]], align 8
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; SSE-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP0]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP0]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; SSE-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP0]], [[TMP4]]		; SSE-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP3]], [[TMP4]]
; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP3]], [[TMP5]]		; SSE-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> zeroinitializer
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 1>		; SSE-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]
; SSE-NEXT: [[TMP8:%.*]] = fsub <2 x double> [[TMP6]], [[TMP7]]		; SSE-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP8]], i32 0		; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP8]], i32 1		; SSE-NEXT: [[RES:%.*]] = fadd double [[TMP8]], [[TMP9]]
; SSE-NEXT: [[RES:%.*]] = fadd double [[TMP9]], [[TMP10]]
; SSE-NEXT: ret double [[RES]]		; SSE-NEXT: ret double [[RES]]
;		;
; AVX-LABEL: @splat_loads_with_internal_uses(		; AVX-LABEL: @splat_loads_with_internal_uses(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, ptr [[ARRAY2:%.]], i64 1		; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, ptr [[ARRAY2:%.]], i64 1
; AVX-NEXT: [[LD_2_0:%.*]] = load double, ptr [[ARRAY2]], align 8		; AVX-NEXT: [[LD_2_0:%.*]] = load double, ptr [[ARRAY2]], align 8
; AVX-NEXT: [[LD_2_1:%.*]] = load double, ptr [[GEP_2_1]], align 8		; AVX-NEXT: [[LD_2_1:%.*]] = load double, ptr [[GEP_2_1]], align 8
; AVX-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8		; AVX-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[ARRAY1:%.]], align 8
Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

	Show All 9 Lines
	; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 poison, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[SUB102_1]], i32 4			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 poison, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[SUB102_1]], i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 5			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 6			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 6
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 7			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 7
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 9			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 9
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 9, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 9, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 undef, i32 undef, i32 poison>, i32 [[SUB86_1]], i32 4			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 7, i32 6, i32 5, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[ADD78_1]], i32 5			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[SUB102_3]], i32 12
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD94_1]], i32 6			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_3]], i32 15
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_1]], i32 7			; CHECK-NEXT: [[TMP9:%.*]] = freeze <16 x i32> [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_3]], i32 12			; CHECK-NEXT: [[TMP10:%.*]] = add nsw <16 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP9]], <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 12>			; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <16 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
	; CHECK-NEXT: [[TMP13:%.*]] = lshr <16 x i32> [[TMP12]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: [[TMP13:%.*]] = lshr <16 x i32> [[TMP12]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i32> [[TMP13]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>			; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i32> [[TMP13]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[TMP15:%.*]] = mul nuw <16 x i32> [[TMP14]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>			; CHECK-NEXT: [[TMP15:%.*]] = mul nuw <16 x i32> [[TMP14]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
	; CHECK-NEXT: [[TMP16:%.*]] = add <16 x i32> [[TMP15]], [[TMP12]]			; CHECK-NEXT: [[TMP16:%.*]] = add <16 x i32> [[TMP15]], [[TMP12]]
	; CHECK-NEXT: [[TMP17:%.*]] = xor <16 x i32> [[TMP16]], [[TMP15]]			; CHECK-NEXT: [[TMP17:%.*]] = xor <16 x i32> [[TMP16]], [[TMP15]]
	; CHECK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP17]])			; CHECK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP17]])
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP18]], 16			; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP18]], 16
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

	Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	}			}

	define void @gather_sequence_crash(<2 x float> %arg, ptr %arg1, float %arg2, ptr %arg3, ptr %arg4, ptr %arg5, i1 %c.1, i1 %c.2) {			define void @gather_sequence_crash(<2 x float> %arg, ptr %arg1, float %arg2, ptr %arg3, ptr %arg4, ptr %arg5, i1 %c.1, i1 %c.2) {
	; CHECK-LABEL: @gather_sequence_crash(			; CHECK-LABEL: @gather_sequence_crash(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br i1 [[C_1:%.]], label [[BB16:%.]], label [[BB6:%.*]]			; CHECK-NEXT: br i1 [[C_1:%.]], label [[BB16:%.]], label [[BB6:%.*]]
	; CHECK: bb6:			; CHECK: bb6:
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, ptr [[ARG1:%.]], i32 3			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, ptr [[ARG1:%.]], i32 3
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x float> <float poison, float poison, float poison, float 0.000000e+00>, float [[ARG2:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = shufflevector <2 x float> [[ARG:%.]], <2 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[ARG:%.]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> <float poison, float poison, float poison, float 0.000000e+00>, <4 x i32> <i32 undef, i32 1, i32 2, i32 7>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x float> [[TMP1]], float [[ARG2:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], zeroinitializer
	; CHECK-NEXT: store <4 x float> [[TMP3]], ptr [[TMP8]], align 4			; CHECK-NEXT: store <4 x float> [[TMP3]], ptr [[TMP8]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb16:			; CHECK: bb16:
	; CHECK-NEXT: br label [[BB17:%.*]]			; CHECK-NEXT: br label [[BB17:%.*]]
	; CHECK: bb17:			; CHECK: bb17:
	; CHECK-NEXT: br label [[BB18:%.*]]			; CHECK-NEXT: br label [[BB18:%.*]]
	; CHECK: bb18:			; CHECK: bb18:
	Show All 20 Lines
	; CHECK: bb33:			; CHECK: bb33:
	; CHECK-NEXT: br label [[BB34:%.*]]			; CHECK-NEXT: br label [[BB34:%.*]]
	; CHECK: bb34:			; CHECK: bb34:
	; CHECK-NEXT: [[TMP35:%.*]] = getelementptr float, ptr [[ARG4]], i64 3			; CHECK-NEXT: [[TMP35:%.*]] = getelementptr float, ptr [[ARG4]], i64 3
	; CHECK-NEXT: [[TMP37:%.*]] = load float, ptr [[TMP35]], align 4			; CHECK-NEXT: [[TMP37:%.*]] = load float, ptr [[TMP35]], align 4
	; CHECK-NEXT: [[TMP38:%.*]] = fadd float 0.000000e+00, [[TMP37]]			; CHECK-NEXT: [[TMP38:%.*]] = fadd float 0.000000e+00, [[TMP37]]
	; CHECK-NEXT: store float [[TMP38]], ptr [[TMP35]], align 4			; CHECK-NEXT: store float [[TMP38]], ptr [[TMP35]], align 4
	; CHECK-NEXT: [[TMP39:%.*]] = getelementptr float, ptr [[ARG4]], i64 1			; CHECK-NEXT: [[TMP39:%.*]] = getelementptr float, ptr [[ARG4]], i64 1
	; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, ptr [[TMP39]], align 4			; CHECK-NEXT: [[TMP4:%.*]] = load <2 x float>, ptr [[TMP39]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> zeroinitializer, [[TMP7]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x float> zeroinitializer, [[TMP4]]
	; CHECK-NEXT: store <2 x float> [[TMP8]], ptr [[TMP39]], align 4			; CHECK-NEXT: store <2 x float> [[TMP5]], ptr [[TMP39]], align 4
	; CHECK-NEXT: [[TMP44:%.]] = load float, ptr [[ARG3:%.]], align 4			; CHECK-NEXT: [[TMP44:%.]] = load float, ptr [[ARG3:%.]], align 4
	; CHECK-NEXT: [[TMP45:%.*]] = load float, ptr [[ARG4]], align 4			; CHECK-NEXT: [[TMP45:%.*]] = load float, ptr [[ARG4]], align 4
	; CHECK-NEXT: [[TMP46:%.*]] = fadd float 0.000000e+00, [[TMP45]]			; CHECK-NEXT: [[TMP46:%.*]] = fadd float 0.000000e+00, [[TMP45]]
	; CHECK-NEXT: store float [[TMP46]], ptr [[ARG4]], align 4			; CHECK-NEXT: store float [[TMP46]], ptr [[ARG4]], align 4
	; CHECK-NEXT: call void @quux()			; CHECK-NEXT: call void @quux()
	; CHECK-NEXT: br label [[BB47:%.*]]			; CHECK-NEXT: br label [[BB47:%.*]]
	; CHECK: bb47:			; CHECK: bb47:
	; CHECK-NEXT: br label [[BB17]]			; CHECK-NEXT: br label [[BB17]]
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer,instcombine,dce -slp-threshold=-100 -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer,instcombine,dce -slp-threshold=-100 -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s
	; RUN: opt < %s -passes=slp-vectorizer,instcombine,dce -slp-threshold=-100 -S -mtriple=i386-apple-macosx10.8.0 -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2			; RUN: opt < %s -passes=slp-vectorizer,instcombine,dce -slp-threshold=-100 -S -mtriple=i386-apple-macosx10.8.0 -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"

	; Make sure we order the operands of commutative operations so that we get			; Make sure we order the operands of commutative operations so that we get
	; bigger vectorizable trees.			; bigger vectorizable trees.

	define void @shuffle_operands1(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {			define void @shuffle_operands1(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_operands1(			; CHECK-LABEL: @shuffle_operands1(
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i64 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i64 0
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i64 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> [[TMP2]], double [[V2:%.]], i64 1
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: store <2 x double> [[TMP5]], ptr [[TO:%.*]], align 4			; CHECK-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @shuffle_operands1(			; SSE2-LABEL: @shuffle_operands1(
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; SSE2-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i64 0			; SSE2-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i64 0
	; SSE2-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i64 1			; SSE2-NEXT: [[TMP3:%.]] = insertelement <2 x double> [[TMP2]], double [[V2:%.]], i64 1
	; SSE2-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]			; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; SSE2-NEXT: store <2 x double> [[TMP5]], ptr [[TO:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	%from_1 = getelementptr double, ptr %from, i64 1			%from_1 = getelementptr double, ptr %from, i64 1
	%v0_1 = load double , ptr %from			%v0_1 = load double , ptr %from
	%v0_2 = load double , ptr %from_1			%v0_2 = load double , ptr %from_1
	%v1_1 = fadd double %v0_1, %v1			%v1_1 = fadd double %v0_1, %v1
	%v1_2 = fadd double %v2, %v0_2			%v1_2 = fadd double %v2, %v0_2
	%to_2 = getelementptr double, ptr %to, i64 1			%to_2 = getelementptr double, ptr %to, i64 1
	store double %v1_1, ptr %to			store double %v1_1, ptr %to
	store double %v1_2, ptr %to_2			store double %v1_2, ptr %to_2
	ret void			ret void
	}			}

	define void @vecload_vs_broadcast(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast(			; CHECK-LABEL: @vecload_vs_broadcast(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP0]], [[TMP2]]
	; CHECK-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4			; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @vecload_vs_broadcast(			; SSE2-LABEL: @vecload_vs_broadcast(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; SSE2-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; SSE2-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; SSE2-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 0
	; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP0]], [[TMP2]]
	; SSE2-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 13 Lines
	}			}

	define void @vecload_vs_broadcast2(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast2(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast2(			; CHECK-LABEL: @vecload_vs_broadcast2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[TMP0]]
	; CHECK-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4			; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @vecload_vs_broadcast2(			; SSE2-LABEL: @vecload_vs_broadcast2(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; SSE2-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; SSE2-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; SSE2-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 0
	; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[TMP0]]
	; SSE2-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 13 Lines
	}			}

	define void @vecload_vs_broadcast3(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast3(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast3(			; CHECK-LABEL: @vecload_vs_broadcast3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[TMP0]]
	; CHECK-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4			; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @vecload_vs_broadcast3(			; SSE2-LABEL: @vecload_vs_broadcast3(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; SSE2-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; SSE2-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; SSE2-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 0
	; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[TMP0]]
	; SSE2-NEXT: store <2 x double> [[TMP4]], ptr [[TO:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 13 Lines
	}			}

	define void @shuffle_nodes_match1(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {			define void @shuffle_nodes_match1(ptr noalias %from, ptr noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_nodes_match1(			; CHECK-LABEL: @shuffle_nodes_match1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i64 1
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4			; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @shuffle_nodes_match1(			; SSE2-LABEL: @shuffle_nodes_match1(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; SSE2-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; SSE2-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; SSE2-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1			; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i64 1
	; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[TMP1]]
	; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	Show All 32 Lines
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @vecload_vs_broadcast4(			; SSE2-LABEL: @vecload_vs_broadcast4(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; SSE2-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; SSE2-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; SSE2-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1			; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i64 1
	; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[TMP1]]
	; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	Show All 33 Lines
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @shuffle_nodes_match2(			; SSE2-LABEL: @shuffle_nodes_match2(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4			; SSE2-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[FROM:%.]], align 4
	; SSE2-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; SSE2-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1			; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i64 1
	; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP2]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[TO:%.*]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	Show All 21 Lines
	define void @good_load_order() {			define void @good_load_order() {
	; CHECK-LABEL: @good_load_order(			; CHECK-LABEL: @good_load_order(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; CHECK: for.cond1.preheader:			; CHECK: for.cond1.preheader:
	; CHECK-NEXT: [[TMP0:%.*]] = load float, ptr @a, align 16			; CHECK-NEXT: [[TMP0:%.*]] = load float, ptr @a, align 16
	; CHECK-NEXT: br label [[FOR_BODY3:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3:%.*]]
	; CHECK: for.body3:			; CHECK: for.body3:
	; CHECK-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP14:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP12:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1			; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4			; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4
	; CHECK-NEXT: [[ARRAYIDX31:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP6]]			; CHECK-NEXT: [[ARRAYIDX31:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = load <4 x float>, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP7:%.*]] = load <4 x float>, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i64 0			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP1]], i64 0
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul <4 x float> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: store <4 x float> [[TMP11]], ptr [[ARRAYIDX5]], align 4			; CHECK-NEXT: store <4 x float> [[TMP10]], ptr [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[ARRAYIDX41:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP13]]			; CHECK-NEXT: [[ARRAYIDX41:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP11]]
	; CHECK-NEXT: [[TMP14]] = load float, ptr [[ARRAYIDX41]], align 4			; CHECK-NEXT: [[TMP12]] = load float, ptr [[ARRAYIDX41]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i64 3			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP7]], i64 3
	; CHECK-NEXT: [[MUL45:%.*]] = fmul float [[TMP14]], [[TMP15]]			; CHECK-NEXT: [[MUL45:%.*]] = fmul float [[TMP12]], [[TMP13]]
	; CHECK-NEXT: store float [[MUL45]], ptr [[ARRAYIDX31]], align 4			; CHECK-NEXT: store float [[MUL45]], ptr [[ARRAYIDX31]], align 4
	; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP16]], 31995			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP14]], 31995
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @good_load_order(			; SSE2-LABEL: @good_load_order(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; SSE2-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; SSE2: for.cond1.preheader:			; SSE2: for.cond1.preheader:
	; SSE2-NEXT: [[TMP0:%.*]] = load float, ptr @a, align 16			; SSE2-NEXT: [[TMP0:%.*]] = load float, ptr @a, align 16
	; SSE2-NEXT: br label [[FOR_BODY3:%.*]]			; SSE2-NEXT: br label [[FOR_BODY3:%.*]]
	; SSE2: for.body3:			; SSE2: for.body3:
	; SSE2-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP14:%.]], [[FOR_BODY3]] ]			; SSE2-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP12:%.]], [[FOR_BODY3]] ]
	; SSE2-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]			; SSE2-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]
	; SSE2-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; SSE2-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; SSE2-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1			; SSE2-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1
	; SSE2-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP3]]			; SSE2-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP3]]
	; SSE2-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; SSE2-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; SSE2-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP4]]			; SSE2-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP4]]
	; SSE2-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; SSE2-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; SSE2-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4			; SSE2-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4
	; SSE2-NEXT: [[ARRAYIDX31:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP6]]			; SSE2-NEXT: [[ARRAYIDX31:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP6]]
	; SSE2-NEXT: [[TMP8:%.*]] = load <4 x float>, ptr [[ARRAYIDX]], align 4			; SSE2-NEXT: [[TMP7:%.*]] = load <4 x float>, ptr [[ARRAYIDX]], align 4
	; SSE2-NEXT: [[TMP9:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i64 0			; SSE2-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 2>
	; SSE2-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 6>			; SSE2-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP1]], i64 0
	; SSE2-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]			; SSE2-NEXT: [[TMP10:%.*]] = fmul <4 x float> [[TMP7]], [[TMP9]]
	; SSE2-NEXT: store <4 x float> [[TMP11]], ptr [[ARRAYIDX5]], align 4			; SSE2-NEXT: store <4 x float> [[TMP10]], ptr [[ARRAYIDX5]], align 4
	; SSE2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5			; SSE2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; SSE2-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; SSE2-NEXT: [[TMP11:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; SSE2-NEXT: [[ARRAYIDX41:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP13]]			; SSE2-NEXT: [[ARRAYIDX41:%.*]] = getelementptr inbounds [32000 x float], ptr @a, i32 0, i32 [[TMP11]]
	; SSE2-NEXT: [[TMP14]] = load float, ptr [[ARRAYIDX41]], align 4			; SSE2-NEXT: [[TMP12]] = load float, ptr [[ARRAYIDX41]], align 4
	; SSE2-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i64 3			; SSE2-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP7]], i64 3
	; SSE2-NEXT: [[MUL45:%.*]] = fmul float [[TMP14]], [[TMP15]]			; SSE2-NEXT: [[MUL45:%.*]] = fmul float [[TMP12]], [[TMP13]]
	; SSE2-NEXT: store float [[MUL45]], ptr [[ARRAYIDX31]], align 4			; SSE2-NEXT: store float [[MUL45]], ptr [[ARRAYIDX31]], align 4
	; SSE2-NEXT: [[TMP16:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; SSE2-NEXT: [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; SSE2-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP16]], 31995			; SSE2-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP14]], 31995
	; SSE2-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]			; SSE2-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]
	; SSE2: for.end:			; SSE2: for.end:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader:			for.cond1.preheader:
	Show All 38 Lines
	}			}

	; Check vectorization of following code for double data type-			; Check vectorization of following code for double data type-
	; c[0] = a[0]+b[0];			; c[0] = a[0]+b[0];
	; c[1] = b[1]+a[1]; // swapped b[1] and a[1]			; c[1] = b[1]+a[1]; // swapped b[1] and a[1]

	define void @load_reorder_double(ptr nocapture %c, ptr noalias nocapture readonly %a, ptr noalias nocapture readonly %b){			define void @load_reorder_double(ptr nocapture %c, ptr noalias nocapture readonly %a, ptr noalias nocapture readonly %b){
	; CHECK-LABEL: @load_reorder_double(			; CHECK-LABEL: @load_reorder_double(
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, ptr [[B:%.]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[B:%.]], align 4
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, ptr [[A:%.]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: store <2 x double> [[TMP5]], ptr [[C:%.*]], align 4			; CHECK-NEXT: store <2 x double> [[TMP3]], ptr [[C:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @load_reorder_double(			; SSE2-LABEL: @load_reorder_double(
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x double>, ptr [[B:%.]], align 4			; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[B:%.]], align 4
	; SSE2-NEXT: [[TMP4:%.]] = load <2 x double>, ptr [[A:%.]], align 4			; SSE2-NEXT: [[TMP2:%.]] = load <2 x double>, ptr [[A:%.]], align 4
	; SSE2-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SSE2-NEXT: store <2 x double> [[TMP5]], ptr [[C:%.*]], align 4			; SSE2-NEXT: store <2 x double> [[TMP3]], ptr [[C:%.*]], align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	%1 = load double, ptr %a			%1 = load double, ptr %a
	%2 = load double, ptr %b			%2 = load double, ptr %b
	%3 = fadd double %1, %2			%3 = fadd double %1, %2
	store double %3, ptr %c			store double %3, ptr %c
	%4 = getelementptr inbounds double, ptr %b, i64 1			%4 = getelementptr inbounds double, ptr %b, i64 1
	%5 = load double, ptr %4			%5 = load double, ptr %4
	%6 = getelementptr inbounds double, ptr %a, i64 1			%6 = getelementptr inbounds double, ptr %a, i64 1
	%7 = load double, ptr %6			%7 = load double, ptr %6
	%8 = fadd double %5, %7			%8 = fadd double %5, %7
	%9 = getelementptr inbounds double, ptr %c, i64 1			%9 = getelementptr inbounds double, ptr %c, i64 1
	store double %8, ptr %9			store double %8, ptr %9
	ret void			ret void
	}			}

	; Check vectorization of following code for float data type-			; Check vectorization of following code for float data type-
	; c[0] = a[0]+b[0];			; c[0] = a[0]+b[0];
	; c[1] = b[1]+a[1]; // swapped b[1] and a[1]			; c[1] = b[1]+a[1]; // swapped b[1] and a[1]
	; c[2] = a[2]+b[2];			; c[2] = a[2]+b[2];
	; c[3] = a[3]+b[3];			; c[3] = a[3]+b[3];

	define void @load_reorder_float(ptr nocapture %c, ptr noalias nocapture readonly %a, ptr noalias nocapture readonly %b){			define void @load_reorder_float(ptr nocapture %c, ptr noalias nocapture readonly %a, ptr noalias nocapture readonly %b){
	; CHECK-LABEL: @load_reorder_float(			; CHECK-LABEL: @load_reorder_float(
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[A:%.]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, ptr [[B:%.]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[B:%.]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: store <4 x float> [[TMP5]], ptr [[C:%.*]], align 4			; CHECK-NEXT: store <4 x float> [[TMP3]], ptr [[C:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @load_reorder_float(			; SSE2-LABEL: @load_reorder_float(
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[A:%.]], align 4			; SSE2-NEXT: [[TMP1:%.]] = load <4 x float>, ptr [[A:%.]], align 4
	; SSE2-NEXT: [[TMP4:%.]] = load <4 x float>, ptr [[B:%.]], align 4			; SSE2-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[B:%.]], align 4
	; SSE2-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; SSE2-NEXT: store <4 x float> [[TMP5]], ptr [[C:%.*]], align 4			; SSE2-NEXT: store <4 x float> [[TMP3]], ptr [[C:%.*]], align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	%1 = load float, ptr %a			%1 = load float, ptr %a
	%2 = load float, ptr %b			%2 = load float, ptr %b
	%3 = fadd float %1, %2			%3 = fadd float %1, %2
	store float %3, ptr %c			store float %3, ptr %c
	%4 = getelementptr inbounds float, ptr %b, i64 1			%4 = getelementptr inbounds float, ptr %b, i64 1
	%5 = load float, ptr %4			%5 = load float, ptr %4
	Show All 22 Lines
	; Check we properly reorder the below code so that it gets vectorized optimally-			; Check we properly reorder the below code so that it gets vectorized optimally-
	; a[0] = (b[0]+c[0])+d[0];			; a[0] = (b[0]+c[0])+d[0];
	; a[1] = d[1]+(b[1]+c[1]);			; a[1] = d[1]+(b[1]+c[1]);
	; a[2] = (b[2]+c[2])+d[2];			; a[2] = (b[2]+c[2])+d[2];
	; a[3] = (b[3]+c[3])+d[3];			; a[3] = (b[3]+c[3])+d[3];

	define void @opcode_reorder(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, ptr noalias nocapture readonly %c,ptr noalias nocapture readonly %d) {			define void @opcode_reorder(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, ptr noalias nocapture readonly %c,ptr noalias nocapture readonly %d) {
	; CHECK-LABEL: @opcode_reorder(			; CHECK-LABEL: @opcode_reorder(
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[B:%.]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, ptr [[B:%.]], align 4
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, ptr [[C:%.]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[C:%.]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP7:%.]] = load <4 x float>, ptr [[D:%.]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, ptr [[D:%.]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], [[TMP5]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP4]], [[TMP3]]
	; CHECK-NEXT: store <4 x float> [[TMP8]], ptr [[A:%.*]], align 4			; CHECK-NEXT: store <4 x float> [[TMP5]], ptr [[A:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @opcode_reorder(			; SSE2-LABEL: @opcode_reorder(
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[B:%.]], align 4			; SSE2-NEXT: [[TMP1:%.]] = load <4 x float>, ptr [[B:%.]], align 4
	; SSE2-NEXT: [[TMP4:%.]] = load <4 x float>, ptr [[C:%.]], align 4			; SSE2-NEXT: [[TMP2:%.]] = load <4 x float>, ptr [[C:%.]], align 4
	; SSE2-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]			; SSE2-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; SSE2-NEXT: [[TMP7:%.]] = load <4 x float>, ptr [[D:%.]], align 4			; SSE2-NEXT: [[TMP4:%.]] = load <4 x float>, ptr [[D:%.]], align 4
	; SSE2-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], [[TMP5]]			; SSE2-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP4]], [[TMP3]]
	; SSE2-NEXT: store <4 x float> [[TMP8]], ptr [[A:%.*]], align 4			; SSE2-NEXT: store <4 x float> [[TMP5]], ptr [[A:%.*]], align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	%1 = load float, ptr %b			%1 = load float, ptr %b
	%2 = load float, ptr %c			%2 = load float, ptr %c
	%3 = fadd float %1, %2			%3 = fadd float %1, %2
	%4 = load float, ptr %d			%4 = load float, ptr %d
	%5 = fadd float %3, %4			%5 = fadd float %3, %4
	store float %5, ptr %a			store float %5, ptr %a
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show All 20 Lines

	define i32 @foo(ptr nocapture %A, i32 %k) {			define i32 @foo(ptr nocapture %A, i32 %k) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[K:%.]], 0			; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[K:%.]], 0
	; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_ELSE:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_ELSE:%.]], label [[IF_END:%.]]
	; CHECK: if.else:			; CHECK: if.else:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, ptr [[A:%.]], i64 10			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, ptr [[A:%.]], i64 10
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP0:%.*]] = load <2 x double>, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP1]], [[IF_ELSE]] ], [ <double 3.000000e+00, double 5.000000e+00>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x double> [ [[TMP0]], [[IF_ELSE]] ], [ <double 3.000000e+00, double 5.000000e+00>, [[ENTRY:%.]] ]
	; CHECK-NEXT: store <2 x double> [[TMP2]], ptr [[A]], align 8			; CHECK-NEXT: store <2 x double> [[TMP1]], ptr [[A]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%tobool = icmp eq i32 %k, 0			%tobool = icmp eq i32 %k, 0
	br i1 %tobool, label %if.else, label %if.end			br i1 %tobool, label %if.else, label %if.end

	if.else: ; preds = %entry			if.else: ; preds = %entry
	%arrayidx = getelementptr inbounds double, ptr %A, i64 10			%arrayidx = getelementptr inbounds double, ptr %A, i64 10
	Show All 26 Lines
	; B[0] = G;			; B[0] = G;
	; B[1] = R;			; B[1] = R;
	; return 0;			; return 0;
	;}			;}

	define i32 @foo2(ptr noalias nocapture %B, ptr noalias nocapture %A, i32 %n, i32 %m) #0 {			define i32 @foo2(ptr noalias nocapture %B, ptr noalias nocapture %A, i32 %n, i32 %m) #0 {
	; CHECK-LABEL: @foo2(			; CHECK-LABEL: @foo2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, ptr [[A:%.]], align 8			; CHECK-NEXT: [[TMP0:%.]] = load <2 x double>, ptr [[A:%.]], align 8
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_019:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_019:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP1]], [[ENTRY]] ], [ [[TMP5:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x double> [ [[TMP0]], [[ENTRY]] ], [ [[TMP4:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+01, double 1.000000e+01>			; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 1.000000e+01, double 1.000000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], <double 4.000000e+00, double 4.000000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[TMP5]] = fadd <2 x double> [[TMP4]], <double 4.000000e+00, double 4.000000e+00>			; CHECK-NEXT: [[TMP4]] = fadd <2 x double> [[TMP3]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I_019]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I_019]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 100			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 100
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: store <2 x double> [[TMP5]], ptr [[B:%.*]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], ptr [[B:%.*]], align 8
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, ptr %A, i64 1			%arrayidx = getelementptr inbounds double, ptr %A, i64 1
	%0 = load double, ptr %arrayidx, align 8			%0 = load double, ptr %arrayidx, align 8
	%1 = load double, ptr %A, align 8			%1 = load double, ptr %A, align 8
	br label %for.body			br label %for.body

	Show All 36 Lines
	; return R+G+B+Y+P;			; return R+G+B+Y+P;
	; }			; }

	define float @foo3(ptr nocapture readonly %A) #0 {			define float @foo3(ptr nocapture readonly %A) #0 {
	; CHECK-LABEL: @foo3(			; CHECK-LABEL: @foo3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, ptr [[A:%.]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1
	; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, ptr [[ARRAYIDX1]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr [[ARRAYIDX1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP16:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP13:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.]] = phi <2 x float> [ [[TMP3]], [[ENTRY]] ], [ [[TMP9:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP6]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP7:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX14:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = load float, ptr [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP8:%.*]] = load float, ptr [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP12]] = load <2 x float>, ptr [[ARRAYIDX19]], align 4			; CHECK-NEXT: [[TMP9]] = load <2 x float>, ptr [[ARRAYIDX19]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP12]], <4 x i32> <i32 1, i32 undef, i32 2, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP9]], <4 x i32> <i32 1, i32 undef, i32 2, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP10]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x float> [[TMP10]], float [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = fmul <4 x float> [[TMP14]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>			; CHECK-NEXT: [[TMP12:%.*]] = fmul <4 x float> [[TMP11]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
	; CHECK-NEXT: [[TMP16]] = fadd <4 x float> [[TMP6]], [[TMP15]]			; CHECK-NEXT: [[TMP13]] = fadd <4 x float> [[TMP4]], [[TMP12]]
	; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP17]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP14]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP16]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP13]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP18]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP15]]
	; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x float> [[TMP16]], i32 1			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x float> [[TMP13]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP19]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP16]]
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP16]], i32 2			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[TMP13]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP20]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP17]]
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP16]], i32 3			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP13]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP21]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP18]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, ptr %A, align 4			%0 = load float, ptr %A, align 4
	%arrayidx1 = getelementptr inbounds float, ptr %A, i64 1			%arrayidx1 = getelementptr inbounds float, ptr %A, i64 1
	%1 = load float, ptr %arrayidx1, align 4			%1 = load float, ptr %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, ptr %A, i64 2			%arrayidx2 = getelementptr inbounds float, ptr %A, i64 2
	%2 = load float, ptr %arrayidx2, align 4			%2 = load float, ptr %arrayidx2, align 4
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i64 [[ADD]], ptr undef, align 1			; SSE-NEXT: store i64 [[ADD]], ptr undef, align 1
	; SSE-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4			; SSE-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4
	; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 poison, i64 undef>, i64 [[TMP0]], i32 0			; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 poison, i64 undef>, i64 [[TMP0]], i32 0
	; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
	; SSE-NEXT: [[TMP5:%.*]] = add nuw nsw <2 x i64> [[TMP4]], zeroinitializer			; SSE-NEXT: [[TMP5:%.*]] = add nuw nsw <2 x i64> [[TMP4]], zeroinitializer
	; SSE-NEXT: store <2 x i64> [[TMP5]], ptr undef, align 1			; SSE-NEXT: store <2 x i64> [[TMP5]], ptr undef, align 1
	; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[ADD]], i32 0			; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[ADD]], i32 0
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>			; SSE-NEXT: [[TMP7:%.*]] = shl <2 x i64> [[TMP6]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>			; SSE-NEXT: [[TMP8:%.*]] = and <2 x i64> [[TMP7]], <i64 20, i64 20>
	; SSE-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>			; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i64> [[TMP8]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
	; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i64> [[TMP9]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>			; SSE-NEXT: [[TMP10:%.*]] = lshr <2 x i64> [[TMP5]], <i64 6, i64 6>
	; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP5]], <i64 6, i64 6>			; SSE-NEXT: [[TMP11:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP10]]
	; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP10]], [[TMP11]]			; SSE-NEXT: store <2 x i64> [[TMP11]], ptr [[ARRAYIDX2_2]], align 1
	; SSE-NEXT: store <2 x i64> [[TMP12]], ptr [[ARRAYIDX2_2]], align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @pr35497(			; AVX-LABEL: @pr35497(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.*]] = load i64, ptr undef, align 1			; AVX-NEXT: [[TMP0:%.*]] = load i64, ptr undef, align 1
	; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef			; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; AVX-NEXT: store i64 [[ADD]], ptr undef, align 1			; AVX-NEXT: store i64 [[ADD]], ptr undef, align 1
	; AVX-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4			; AVX-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds [0 x i64], ptr undef, i64 0, i64 4
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
	; AVX-NEXT: store <2 x i64> [[TMP4]], ptr undef, align 1			; AVX-NEXT: store <2 x i64> [[TMP4]], ptr undef, align 1
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <2 x i64> poison, i64 [[ADD]], i32 0			; AVX-NEXT: [[TMP5:%.*]] = insertelement <2 x i64> [[TMP4]], i64 [[ADD]], i32 0
	; AVX-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>			; AVX-NEXT: [[TMP6:%.*]] = shl <2 x i64> [[TMP5]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP7:%.*]] = shl <2 x i64> [[TMP6]], <i64 2, i64 2>			; AVX-NEXT: [[TMP7:%.*]] = and <2 x i64> [[TMP6]], <i64 20, i64 20>
	; AVX-NEXT: [[TMP8:%.*]] = and <2 x i64> [[TMP7]], <i64 20, i64 20>			; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP7]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x i64> [[TMP8]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>			; AVX-NEXT: [[TMP9:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
	; AVX-NEXT: [[TMP10:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; AVX-NEXT: [[TMP10:%.*]] = add nuw nsw <2 x i64> [[TMP8]], [[TMP9]]
	; AVX-NEXT: [[TMP11:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP10]]			; AVX-NEXT: store <2 x i64> [[TMP10]], ptr [[ARRAYIDX2_2]], align 1
	; AVX-NEXT: store <2 x i64> [[TMP11]], ptr [[ARRAYIDX2_2]], align 1
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i64, ptr undef, align 1			%0 = load i64, ptr undef, align 1
	%and = shl i64 %0, 2			%and = shl i64 %0, 2
	%shl = and i64 %and, 20			%shl = and i64 %and, 20
	%add = add i64 undef, undef			%add = add i64 undef, undef
	store i64 %add, ptr undef, align 1			store i64 %add, ptr undef, align 1
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512
	; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -passes=slp-vectorizer,instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512


	@b = global [8 x i32] zeroinitializer, align 16			@b = global [8 x i32] zeroinitializer, align 16
	@a = global [8 x i32] zeroinitializer, align 16			@a = global [8 x i32] zeroinitializer, align 16

	define void @foo() {			define void @foo() {
	; SSE-LABEL: @foo(			; SSE-LABEL: @foo(
	; SSE-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16			; SSE-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
	; SSE-NEXT: store i32 [[TMP1]], ptr @a, align 16			; SSE-NEXT: store i32 [[TMP1]], ptr @a, align 16
	; SSE-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			; SSE-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4
	; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8			; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4
	; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16			; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4
	; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8			; SSE-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8
	; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4			; SSE-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @foo(			; AVX-LABEL: @foo(
	; AVX-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16			; AVX-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
				; AVX-NEXT: store i32 [[TMP1]], ptr @a, align 16
	; AVX-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			; AVX-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0			; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1			; AVX-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4
	; AVX-NEXT: store <8 x i32> [[SHUFFLE]], ptr @a, align 16			; AVX-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16
				; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4
				; AVX-NEXT: store i32 [[TMP1]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8
				; AVX-NEXT: store i32 [[TMP2]], ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
				; AVX2-LABEL: @foo(
				; AVX2-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
				; AVX2-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
				; AVX2-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0
				; AVX2-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1
				; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
				; AVX2-NEXT: store <8 x i32> [[TMP5]], ptr @a, align 16
				; AVX2-NEXT: ret void
				;
	; AVX512-LABEL: @foo(			; AVX512-LABEL: @foo(
	; AVX512-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16			; AVX512-NEXT: [[TMP1:%.*]] = load i32, ptr @b, align 16
	; AVX512-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			; AVX512-NEXT: [[TMP2:%.*]] = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	; AVX512-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0			; AVX512-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0
	; AVX512-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1			; AVX512-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1
	; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; AVX512-NEXT: store <8 x i32> [[SHUFFLE]], ptr @a, align 16			; AVX512-NEXT: store <8 x i32> [[TMP5]], ptr @a, align 16
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%1 = load i32, ptr @b, align 16			%1 = load i32, ptr @b, align 16
	store i32 %1, ptr @a, align 16			store i32 %1, ptr @a, align 16
	%2 = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8			%2 = load i32, ptr getelementptr inbounds ([8 x i32], ptr @b, i64 0, i64 2), align 8
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 1), align 4
	store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8			store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 2), align 8
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 3), align 4
	store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16			store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 4), align 16
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 5), align 4
	store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8			store i32 %1, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 6), align 8
	store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4			store i32 %2, ptr getelementptr inbounds ([8 x i32], ptr @a, i64 0, i64 7), align 4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/reduced-gathered-vectorized.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

				; FIXME: fix the cost of the xor reduction ops

	define i16 @test() {			define i16 @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 5			; CHECK-NEXT: [[A:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 5
	; CHECK-NEXT: [[A1:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 6			; CHECK-NEXT: [[A1:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 6
	; CHECK-NEXT: br label [[WHILE:%.*]]			; CHECK-NEXT: br label [[WHILE:%.*]]
	; CHECK: while:			; CHECK: while:
	; CHECK-NEXT: [[PH:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OP_RDX25:%.*]], [[WHILE]] ]			; CHECK-NEXT: [[PH:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OP_RDX27:%.*]], [[WHILE]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr null, align 8			; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr null, align 8
	; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr null, align 8			; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[A1]], align 16
	; CHECK-NEXT: [[TMP2:%.*]] = load <2 x i64>, ptr [[A]], align 8			; CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr null, align 8
	; CHECK-NEXT: [[TMP3:%.*]] = load <2 x i64>, ptr [[A1]], align 16			; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i64>, ptr [[A]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[TMP3]], <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 4, i32 4>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vector.reduce.xor.v4i64(<4 x i64> [[TMP4]])
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[OP_RDX24:%.*]] = xor i64 0, [[TMP2]]
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i64> [[TMP6]], <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 4>			; CHECK-NEXT: [[OP_RDX25:%.*]] = xor i64 [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.xor.v4i64(<4 x i64> [[TMP7]])			; CHECK-NEXT: [[OP_RDX26:%.*]] = xor i64 [[OP_RDX24]], [[OP_RDX25]]
	; CHECK-NEXT: [[OP_RDX23:%.*]] = xor i64 0, [[TMP1]]			; CHECK-NEXT: [[OP_RDX27]] = xor i64 [[OP_RDX26]], [[TMP5]]
	; CHECK-NEXT: [[OP_RDX24:%.*]] = xor i64 [[TMP0]], [[TMP8]]
	; CHECK-NEXT: [[OP_RDX25]] = xor i64 [[OP_RDX23]], [[OP_RDX24]]
	; CHECK-NEXT: br label [[WHILE]]			; CHECK-NEXT: br label [[WHILE]]
	;			;
	entry:			entry:
	%a = getelementptr [1000 x i64], ptr null, i64 0, i64 5			%a = getelementptr [1000 x i64], ptr null, i64 0, i64 5
	%a1 = getelementptr [1000 x i64], ptr null, i64 0, i64 6			%a1 = getelementptr [1000 x i64], ptr null, i64 0, i64 6
	%a2 = getelementptr [1000 x i64], ptr null, i64 0, i64 7			%a2 = getelementptr [1000 x i64], ptr null, i64 0, i64 7
	%a3 = getelementptr [1000 x i64], ptr null, i64 0, i64 8			%a3 = getelementptr [1000 x i64], ptr null, i64 0, i64 8
	br label %while			br label %while
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

Show First 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {		define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {
; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(		; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(
; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[X:%.]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[Y:%.]], <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <8 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = freeze <8 x i1> [[TMP3]]
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[X:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP4]])
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[X]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: ret i1 [[TMP5]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[Y0]], i32 4
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[Y1]], i32 5
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[Y2]], i32 6
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[Y3]], i32 7
; CHECK-NEXT: [[TMP7:%.*]] = icmp slt <8 x i32> [[TMP2]], [[TMP6]]
; CHECK-NEXT: [[TMP8:%.*]] = freeze <8 x i1> [[TMP7]]
; CHECK-NEXT: [[TMP9:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP8]])
; CHECK-NEXT: ret i1 [[TMP9]]
;		;
%x0 = extractelement <8 x i32> %x, i32 0		%x0 = extractelement <8 x i32> %x, i32 0
%x1 = extractelement <8 x i32> %x, i32 1		%x1 = extractelement <8 x i32> %x, i32 1
%x2 = extractelement <8 x i32> %x, i32 2		%x2 = extractelement <8 x i32> %x, i32 2
%x3 = extractelement <8 x i32> %x, i32 3		%x3 = extractelement <8 x i32> %x, i32 3
%y0 = extractelement <8 x i32> %y, i32 0		%y0 = extractelement <8 x i32> %y, i32 0
%y1 = extractelement <8 x i32> %y, i32 1		%y1 = extractelement <8 x i32> %y, i32 1
%y2 = extractelement <8 x i32> %y, i32 2		%y2 = extractelement <8 x i32> %y, i32 2
▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines

	define i1 @fcmp_lt_gt(double %a, double %b, double %c) {			define i1 @fcmp_lt_gt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt_gt(			; CHECK-LABEL: @fcmp_lt_gt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	Show All 32 Lines
	}			}

	define i1 @fcmp_lt(double %a, double %b, double %c) {			define i1 @fcmp_lt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt(			; CHECK-LABEL: @fcmp_lt(
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>			; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/redux-feed-buildvector.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64 -passes=slp-vectorizer -S -mcpu=skylake-avx512 \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64 -passes=slp-vectorizer -S -mcpu=skylake-avx512 \| FileCheck %s

	; The test represents the case with multiple vectorization possibilities			; The test represents the case with multiple vectorization possibilities
	; but the most effective way to vectorize it is to match both 8-way reductions			; but the most effective way to vectorize it is to match both 8-way reductions
	; feeding the insertelement vector build sequence.			; feeding the insertelement vector build sequence.

	declare void @llvm.masked.scatter.v2f64.v2p0(<2 x double>, <2 x ptr>, i32 immarg, <2 x i1>)			declare void @llvm.masked.scatter.v2f64.v2p0(<2 x double>, <2 x ptr>, i32 immarg, <2 x i1>)

	define void @test(ptr nocapture readonly %arg, ptr nocapture readonly %arg1, ptr nocapture %arg2) {			define void @test(ptr nocapture readonly %arg, ptr nocapture readonly %arg1, ptr nocapture %arg2) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x ptr> poison, ptr [[ARG:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x ptr> poison, ptr [[ARG:%.]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x ptr> [[TMP0]], <8 x ptr> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x ptr> [[TMP0]], <8 x ptr> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = getelementptr double, <8 x ptr> [[SHUFFLE]], <8 x i64> <i64 1, i64 3, i64 5, i64 7, i64 9, i64 11, i64 13, i64 15>			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr double, <8 x ptr> [[TMP1]], <8 x i64> <i64 1, i64 3, i64 5, i64 7, i64 9, i64 11, i64 13, i64 15>
	; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr inbounds double, ptr [[ARG1:%.]], i64 16			; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr inbounds double, ptr [[ARG1:%.]], i64 16
	; CHECK-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0(<8 x ptr> [[TMP1]], i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x double> poison)			; CHECK-NEXT: [[TMP3:%.*]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0(<8 x ptr> [[TMP2]], i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x double> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = load <8 x double>, ptr [[GEP2_0]], align 8			; CHECK-NEXT: [[TMP4:%.*]] = load <8 x double>, ptr [[GEP2_0]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <8 x double> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <8 x double> [[TMP4]], [[TMP3]]
	; CHECK-NEXT: [[TMP7:%.*]] = load <8 x double>, ptr [[ARG1]], align 8			; CHECK-NEXT: [[TMP6:%.*]] = load <8 x double>, ptr [[ARG1]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <8 x double> [[TMP7]], [[TMP2]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <8 x double> [[TMP6]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP8]])			; CHECK-NEXT: [[TMP8:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP7]])
	; CHECK-NEXT: [[TMP10:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP5]])			; CHECK-NEXT: [[TMP9:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP5]])
	; CHECK-NEXT: [[I142:%.*]] = insertelement <2 x double> poison, double [[TMP9]], i64 0			; CHECK-NEXT: [[I142:%.*]] = insertelement <2 x double> poison, double [[TMP8]], i64 0
	; CHECK-NEXT: [[I143:%.*]] = insertelement <2 x double> [[I142]], double [[TMP10]], i64 1			; CHECK-NEXT: [[I143:%.*]] = insertelement <2 x double> [[I142]], double [[TMP9]], i64 1
	; CHECK-NEXT: [[P:%.]] = getelementptr inbounds double, ptr [[ARG2:%.]], <2 x i64> <i64 0, i64 16>			; CHECK-NEXT: [[P:%.]] = getelementptr inbounds double, ptr [[ARG2:%.]], <2 x i64> <i64 0, i64 16>
	; CHECK-NEXT: call void @llvm.masked.scatter.v2f64.v2p0(<2 x double> [[I143]], <2 x ptr> [[P]], i32 8, <2 x i1> <i1 true, i1 true>)			; CHECK-NEXT: call void @llvm.masked.scatter.v2f64.v2p0(<2 x double> [[I143]], <2 x ptr> [[P]], i32 8, <2 x i1> <i1 true, i1 true>)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep1.0 = getelementptr inbounds double, ptr %arg, i64 1			%gep1.0 = getelementptr inbounds double, ptr %arg, i64 1
	%ld1.0 = load double, ptr %gep1.0, align 8			%ld1.0 = load double, ptr %gep1.0, align 8
	%ld0.0 = load double, ptr %arg1, align 8			%ld0.0 = load double, ptr %arg1, align 8
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -passes=slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s			; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -passes=slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	define void @fextr(ptr %ptr) {			define void @fextr(ptr %ptr) {
	; CHECK-LABEL: @fextr(			; CHECK-LABEL: @fextr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.*]] = load <8 x i16>, ptr undef, align 16			; CHECK-NEXT: [[LD:%.*]] = load <8 x i16>, ptr undef, align 16
	; CHECK-NEXT: br label [[T:%.*]]			; CHECK-NEXT: br label [[T:%.*]]
	; CHECK: t:			; CHECK: t:
	; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = add <8 x i16> [[LD]], [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = add <8 x i16> [[LD]], [[TMP0]]
	; CHECK-NEXT: store <8 x i16> [[TMP1]], ptr [[PTR:%.*]], align 2			; CHECK-NEXT: store <8 x i16> [[TMP1]], ptr [[PTR:%.*]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder-clustered-node.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64 -slp-threshold=-150 \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64 -slp-threshold=-150 \| FileCheck %s

	define i1 @test(ptr %arg, ptr %i233, i64 %i241, ptr %i235, ptr %i237, ptr %i227) {			define i1 @test(ptr %arg, ptr %i233, i64 %i241, ptr %i235, ptr %i237, ptr %i227) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[I226:%.]] = getelementptr ptr, ptr [[ARG:%.]], i32 7			; CHECK-NEXT: [[I226:%.]] = getelementptr ptr, ptr [[ARG:%.]], i32 7
	; CHECK-NEXT: [[I242:%.]] = getelementptr double, ptr [[I233:%.]], i64 [[I241:%.*]]			; CHECK-NEXT: [[I242:%.]] = getelementptr double, ptr [[I233:%.]], i64 [[I241:%.*]]
	; CHECK-NEXT: [[I245:%.]] = getelementptr double, ptr [[I235:%.]], i64 [[I241]]			; CHECK-NEXT: [[I245:%.]] = getelementptr double, ptr [[I235:%.]], i64 [[I241]]
	; CHECK-NEXT: [[I248:%.]] = getelementptr double, ptr [[I237:%.]], i64 [[I241]]			; CHECK-NEXT: [[I248:%.]] = getelementptr double, ptr [[I237:%.]], i64 [[I241]]
	; CHECK-NEXT: [[I250:%.]] = getelementptr double, ptr [[I227:%.]], i64 [[I241]]			; CHECK-NEXT: [[I250:%.]] = getelementptr double, ptr [[I227:%.]], i64 [[I241]]
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x ptr>, ptr [[I226]], align 8			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x ptr>, ptr [[I226]], align 8
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x ptr> [[TMP0]], <4 x ptr> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x ptr> [[TMP0]], <4 x ptr> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x ptr> <ptr poison, ptr null, ptr poison, ptr null, ptr null, ptr null, ptr null, ptr null>, ptr [[I242]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x ptr> <ptr poison, ptr null, ptr poison, ptr null, ptr null, ptr null, ptr null, ptr null>, ptr [[I242]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x ptr> [[TMP2]], ptr [[I250]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x ptr> [[TMP2]], ptr [[I250]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = icmp ult <8 x ptr> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp ult <8 x ptr> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x ptr> poison, ptr [[I250]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x ptr> [[TMP3]], <8 x ptr> poison, <4 x i32> <i32 2, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x ptr> [[TMP5]], ptr [[I242]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x ptr> [[TMP5]], ptr [[I245]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x ptr> [[TMP6]], ptr [[I245]], i32 2			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x ptr> [[TMP6]], ptr [[I248]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x ptr> [[TMP7]], ptr [[I248]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x ptr> [[TMP7]], <4 x ptr> poison, <8 x i32> <i32 2, i32 0, i32 1, i32 3, i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <8 x ptr> [[TMP8]], <8 x ptr> poison, <8 x i32> <i32 2, i32 0, i32 1, i32 3, i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <8 x ptr> [[TMP1]], <8 x ptr> <ptr poison, ptr poison, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null>, <8 x i32> <i32 0, i32 1, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <8 x ptr> <ptr poison, ptr poison, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null>, <8 x ptr> [[TMP1]], <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP10:%.*]] = icmp ult <8 x ptr> [[TMP8]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = icmp ult <8 x ptr> [[TMP9]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <8 x i1> [[TMP10]], <8 x i1> poison, <8 x i32> <i32 1, i32 2, i32 0, i32 3, i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x i1> [[TMP11]], <8 x i1> poison, <8 x i32> <i32 1, i32 2, i32 0, i32 3, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP12:%.*]] = or <8 x i1> [[TMP4]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = or <8 x i1> [[TMP4]], [[TMP12]]			; CHECK-NEXT: [[TMP13:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP12]])
	; CHECK-NEXT: [[TMP14:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP13]])			; CHECK-NEXT: [[OP_RDX:%.*]] = and i1 [[TMP13]], false
	; CHECK-NEXT: [[OP_RDX:%.*]] = and i1 [[TMP14]], false
	; CHECK-NEXT: ret i1 [[OP_RDX]]			; CHECK-NEXT: ret i1 [[OP_RDX]]
	;			;
	bb:			bb:
	%i226 = getelementptr ptr, ptr %arg, i32 7			%i226 = getelementptr ptr, ptr %arg, i32 7
	%i2271 = load ptr, ptr %i226, align 8			%i2271 = load ptr, ptr %i226, align 8
	%i232 = getelementptr ptr, ptr %arg, i32 8			%i232 = getelementptr ptr, ptr %arg, i32 8
	%i2332 = load ptr, ptr %i232, align 8			%i2332 = load ptr, ptr %i232, align 8
	%i234 = getelementptr ptr, ptr %arg, i32 9			%i234 = getelementptr ptr, ptr %arg, i32 9
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder-reused-masked-gather.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -mattr=+avx512f -mtriple=x86_64 -S < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -mattr=+avx512f -mtriple=x86_64 -S < %s \| FileCheck %s

	define void @test(ptr noalias %0, ptr %p) {			define void @test(ptr noalias %0, ptr %p) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x ptr> poison, ptr [[P:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x ptr> poison, ptr [[P:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x ptr> [[TMP2]], <8 x ptr> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x ptr> [[TMP2]], <8 x ptr> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr float, <8 x ptr> [[TMP3]], <8 x i64> <i64 15, i64 4, i64 5, i64 0, i64 2, i64 6, i64 7, i64 8>			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr float, <8 x ptr> [[TMP3]], <8 x i64> <i64 15, i64 4, i64 5, i64 0, i64 2, i64 6, i64 7, i64 8>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, ptr [[TMP0:%.]], i64 2			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, ptr [[TMP0:%.]], i64 2
	; CHECK-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0(<8 x ptr> [[TMP4]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> poison)			; CHECK-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0(<8 x ptr> [[TMP4]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> poison)
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> poison, <16 x i32> <i32 4, i32 3, i32 0, i32 1, i32 2, i32 0, i32 1, i32 2, i32 0, i32 2, i32 5, i32 6, i32 7, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> poison, <16 x i32> <i32 4, i32 3, i32 0, i32 1, i32 2, i32 0, i32 1, i32 2, i32 0, i32 2, i32 5, i32 6, i32 7, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <16 x float> <float poison, float poison, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <16 x float> [[TMP8]], <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <16 x float> [[TMP8]], <16 x float> <float poison, float poison, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	; CHECK-NEXT: [[TMP10:%.*]] = fadd reassoc nsz arcp contract afn <16 x float> [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd reassoc nsz arcp contract afn <16 x float> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <16 x float> [[TMP10]], <16 x float> poison, <16 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 1, i32 9, i32 0, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <16 x float> [[TMP10]], <16 x float> poison, <16 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 1, i32 9, i32 0, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: store <16 x float> [[TMP11]], ptr [[TMP5]], align 4			; CHECK-NEXT: store <16 x float> [[TMP11]], ptr [[TMP5]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%2 = getelementptr inbounds float, ptr %p, i64 2			%2 = getelementptr inbounds float, ptr %p, i64 2
	%3 = getelementptr inbounds float, ptr %p, i64 4			%3 = getelementptr inbounds float, ptr %p, i64 4
	%4 = load float, ptr %3, align 4			%4 = load float, ptr %3, align 4
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll

	Show All 11 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = trunc <2 x i32> [[TMP0]] to <2 x i8>			; CHECK-NEXT: [[TMP1:%.*]] = trunc <2 x i32> [[TMP0]] to <2 x i8>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = zext i8 [[TMP2]] to i32			; CHECK-NEXT: [[TMP3:%.*]] = zext i8 [[TMP2]] to i32
	; CHECK-NEXT: [[BF_CAST162:%.*]] = and i32 [[TMP3]], 0			; CHECK-NEXT: [[BF_CAST162:%.*]] = and i32 [[TMP3]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> zeroinitializer, <2 x i32> [[TMP0]], <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> zeroinitializer, <2 x i32> [[TMP0]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[T13:%.*]] = and <2 x i32> [[TMP4]], zeroinitializer			; CHECK-NEXT: [[T13:%.*]] = and <2 x i32> [[TMP4]], zeroinitializer
	; CHECK-NEXT: br label [[ELSE1:%.*]]			; CHECK-NEXT: br label [[ELSE1:%.*]]
	; CHECK: else1:			; CHECK: else1:
	; CHECK-NEXT: [[T20:%.*]] = extractelement <2 x i32> [[T13]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[T13]], <2 x i32> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[BF_CAST162]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[BF_CAST162]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[T20]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt <2 x i32> [[TMP6]], zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt <2 x i32> [[TMP6]], zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
	; CHECK-NEXT: ret i1 [[TMP8]]			; CHECK-NEXT: ret i1 [[TMP8]]
	;			;
	entry:			entry:
	br i1 false, label %then, label %else			br i1 false, label %then, label %else

	then:			then:
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reorder.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -mcpu=cascadelake < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -mcpu=cascadelake < %s \| FileCheck %s

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX10_I_I86:%.*]] = getelementptr inbounds float, ptr undef, i64 2			; CHECK-NEXT: [[ARRAYIDX10_I_I86:%.*]] = getelementptr inbounds float, ptr undef, i64 2
	; CHECK-NEXT: [[ARRAYIDX21_I:%.*]] = getelementptr inbounds [4 x float], ptr undef, i64 2			; CHECK-NEXT: [[ARRAYIDX21_I:%.*]] = getelementptr inbounds [4 x float], ptr undef, i64 2
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr undef, align 4			; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr undef, align 4
	; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x float> zeroinitializer, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x float> zeroinitializer, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = load float, ptr [[ARRAYIDX10_I_I86]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = load float, ptr [[ARRAYIDX10_I_I86]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = load float, ptr undef, align 4			; CHECK-NEXT: [[TMP3:%.*]] = load float, ptr undef, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> <float 0.000000e+00, float poison>, <2 x float> [[TMP0]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP0]], float 0.000000e+00, i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[TMP3]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[TMP2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> <float poison, float 0.000000e+00>, float [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> <float poison, float 0.000000e+00>, <2 x i32> <i32 1, i32 3>
	; CHECK-NEXT: [[TMP8:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP4]], <2 x float> [[TMP6]], <2 x float> [[TMP7]])			; CHECK-NEXT: [[TMP8:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP4]], <2 x float> [[TMP6]], <2 x float> [[TMP7]])
	; CHECK-NEXT: br i1 false, label [[BB2:%.]], label [[BB3:%.]]			; CHECK-NEXT: br i1 false, label [[BB2:%.]], label [[BB3:%.]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x float> [ [[TMP9]], [[BB2]] ], [ zeroinitializer, [[BB1]] ]			; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x float> [ [[TMP9]], [[BB2]] ], [ zeroinitializer, [[BB1]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x float> [[TMP1]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x float> [[TMP1]], [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x float> [[TMP11]], zeroinitializer			; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x float> [[TMP12]], zeroinitializer
	; CHECK-NEXT: [[TMP13:%.*]] = fsub <2 x float> [[TMP12]], zeroinitializer
	; CHECK-NEXT: [[TMP14:%.*]] = fsub <2 x float> [[TMP13]], zeroinitializer			; CHECK-NEXT: [[TMP14:%.*]] = fsub <2 x float> [[TMP13]], zeroinitializer
	; CHECK-NEXT: store <2 x float> [[TMP14]], ptr [[ARRAYIDX21_I]], align 16			; CHECK-NEXT: [[TMP15:%.*]] = fsub <2 x float> [[TMP14]], zeroinitializer
				; CHECK-NEXT: store <2 x float> [[TMP15]], ptr [[ARRAYIDX21_I]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx10.i.i86 = getelementptr inbounds float, ptr undef, i64 2			%arrayidx10.i.i86 = getelementptr inbounds float, ptr undef, i64 2
	%arrayidx6.i66.i = getelementptr inbounds float, ptr undef, i64 1			%arrayidx6.i66.i = getelementptr inbounds float, ptr undef, i64 1
	%arrayidx21.i = getelementptr inbounds [4 x float], ptr undef, i64 2			%arrayidx21.i = getelementptr inbounds [4 x float], ptr undef, i64 2
	%arrayidx6.i109.i = getelementptr inbounds [4 x float], ptr undef, i64 2, i64 1			%arrayidx6.i109.i = getelementptr inbounds [4 x float], ptr undef, i64 2, i64 1
	br label %bb1			br label %bb1
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell < %s \| FileCheck %s

	define void @wombat(ptr %ptr, ptr %ptr1) {			define void @wombat(ptr %ptr, ptr %ptr1) {
	; CHECK-LABEL: @wombat(			; CHECK-LABEL: @wombat(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, ptr [[PTR1:%.]], i32 3			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, ptr [[PTR1:%.]], i32 3
	; CHECK-NEXT: [[TMP0:%.]] = load <2 x i32>, ptr [[PTR:%.]], align 8			; CHECK-NEXT: [[TMP0:%.]] = load <2 x i32>, ptr [[PTR:%.]], align 8
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP0]], <i32 -1, i32 -1>
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> [[TMP2]], <i32 -1, i32 -1>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[TMP1]], undef
	; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <4 x i32> [[TMP1]], undef			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> undef, <4 x i32> [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> undef, <4 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> zeroinitializer, <4 x i32> zeroinitializer, <4 x i32> [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> zeroinitializer, <4 x i32> zeroinitializer, <4 x i32> [[TMP6]]			; CHECK-NEXT: store <4 x i32> [[TMP6]], ptr [[TMP27]], align 8
	; CHECK-NEXT: store <4 x i32> [[TMP7]], ptr [[TMP27]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%tmp7 = getelementptr inbounds i32, ptr %ptr, i64 1			%tmp7 = getelementptr inbounds i32, ptr %ptr, i64 1
	%tmp12 = load i32, ptr %tmp7, align 4			%tmp12 = load i32, ptr %tmp7, align 4
	%tmp13 = load i32, ptr %ptr, align 8			%tmp13 = load i32, ptr %ptr, align 8
	%tmp21 = add nsw i32 %tmp12, -1			%tmp21 = add nsw i32 %tmp12, -1
	%tmp22 = fptosi float undef to i32			%tmp22 = fptosi float undef to i32
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vect-gather-same-nodes.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-pc-windows-msvc19.16.0 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-pc-windows-msvc19.16.0 < %s \| FileCheck %s

	define void @test(ptr %a, ptr %b) {			define void @test(ptr %a, ptr %b) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[RESULT:%.*]] = alloca [4 x [4 x float]], i32 0, align 4			; CHECK-NEXT: [[RESULT:%.*]] = alloca [4 x [4 x float]], i32 0, align 4
	; CHECK-NEXT: [[TMP0:%.*]] = load float, ptr null, align 4			; CHECK-NEXT: [[TMP0:%.*]] = load float, ptr null, align 4
	; CHECK-NEXT: [[ARRAYIDX120:%.]] = getelementptr [4 x float], ptr [[B:%.]], i64 0, i64 3			; CHECK-NEXT: [[ARRAYIDX120:%.]] = getelementptr [4 x float], ptr [[B:%.]], i64 0, i64 3
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[ARRAYIDX120]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[ARRAYIDX120]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[TMP3:%.*]] = load float, ptr null, align 4			; CHECK-NEXT: [[TMP3:%.*]] = load float, ptr null, align 4
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, ptr [[A:%.]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP0]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[TMP6]], float [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x float> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[SHUFFLE]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x float> [[SHUFFLE]], zeroinitializer
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[SHUFFLE1]], [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <4 x float> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x float> [[TMP9]], zeroinitializer
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <4 x float> [[TMP12]], zeroinitializer			; CHECK-NEXT: store <4 x float> [[TMP10]], ptr [[RESULT]], align 4
	; CHECK-NEXT: store <4 x float> [[TMP13]], ptr [[RESULT]], align 4
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%result = alloca [4 x [4 x float]], i32 0, align 4			%result = alloca [4 x [4 x float]], i32 0, align 4
	%arrayidx11 = getelementptr [4 x [4 x float]], ptr %b, i64 0, i64 1			%arrayidx11 = getelementptr [4 x [4 x float]], ptr %b, i64 0, i64 1
	%0 = load float, ptr %arrayidx11, align 4			%0 = load float, ptr %arrayidx11, align 4
	%1 = load float, ptr null, align 4			%1 = load float, ptr null, align 4
	%arrayidx120 = getelementptr [4 x float], ptr %b, i64 0, i64 3			%arrayidx120 = getelementptr [4 x float], ptr %b, i64 0, i64 3
	Show All 33 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 504816

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/DebugInfo/Generic/assignment-tracking/slp-vectorizer/merge-scalars.ll

llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll

llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/buildvector-nodes-dependency.ll

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_clear_undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_netbsd_decompress.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/gather-extractelements-different-bbs.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

llvm/test/Transforms/SLPVectorizer/X86/reduced-gathered-vectorized.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

llvm/test/Transforms/SLPVectorizer/X86/redux-feed-buildvector.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder-clustered-node.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder-reused-masked-gather.ll

llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll

llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reorder.ll

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

llvm/test/Transforms/SLPVectorizer/X86/vect-gather-same-nodes.ll

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic