This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic

Authored by ABataev on Oct 1 2021, 4:10 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
anton-afanasyev
vporpo

Commits

rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph.

Summary

Currently we emit gathers for scalars being vectorized in the tre as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Oct 1 2021, 4:10 PM

Herald added subscribers: kerbowa, hiraditya, nhaehnle, jvesely. · View Herald TranscriptOct 1 2021, 4:10 PM

ABataev requested review of this revision.Oct 1 2021, 4:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2021, 4:10 PM

Harbormaster completed remote builds in B126755: Diff 376651.Oct 1 2021, 4:10 PM

Rebase

Harbormaster completed remote builds in B126915: Diff 377013.Oct 4 2021, 2:29 PM

RKSimon retitled this revision from [SLP]Improve gathering of the scals used in the graph. to [SLP]Improve gathering of the scalars used in the graph..Oct 5 2021, 6:35 AM

Rebase + bug fixes

Harbormaster completed remote builds in B133811: Diff 386648.Nov 11 2021, 2:47 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Rebase

Harbormaster completed remote builds in B135503: Diff 389033.Nov 22 2021, 7:36 PM

RKSimon added inline comments.Nov 29 2021, 9:13 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
297	Is it worth merging the isa<> and cast<> into a dyn_cast<>?
553	return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block?
6208	What targets are we still missing support for?

ABataev added inline comments.Nov 29 2021, 9:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6208	AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.

Rebase + address comments.

Harbormaster completed remote builds in B136480: Diff 390398.Nov 29 2021, 11:39 AM

Rebase

Harbormaster completed remote builds in B136694: Diff 390702.Nov 30 2021, 8:08 AM

Rebase

Harbormaster completed remote builds in B136747: Diff 390783.Nov 30 2021, 1:09 PM

Rebase

Harbormaster completed remote builds in B138215: Diff 392842.Dec 8 2021, 12:09 PM

Rebase

RKSimon added inline comments.Dec 14 2021, 8:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6171	Wshadow warning vs Idx @ Line 4688?
6205	Wshadow warning vs Idx @ Line 4688?

Address comments

Harbormaster completed remote builds in B139236: Diff 394269.Dec 14 2021, 9:48 AM

Rebase

Harbormaster completed remote builds in B141051: Diff 396715.Dec 30 2021, 2:15 PM

ABataev mentioned this in D123587: [SLP] Generate shuffles if we can reorder an existing node.Apr 12 2022, 12:05 PM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 7:51 AM

Herald added subscribers: • pcwang-thead, nlopes, kosarev. · View Herald Transcript

nlopes added inline comments.Aug 26 2022, 7:54 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8675	Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you!

ABataev added inline comments.Aug 26 2022, 8:08 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8675	Sure, thanks!

Address comments

Harbormaster completed remote builds in B183623: Diff 455933.Aug 26 2022, 10:50 AM

Rebase

Harbormaster completed remote builds in B186399: Diff 459790.Sep 13 2022, 11:19 AM

ABataev mentioned this in rG796af0c02728: [SLP] Move getInsertIndex function, NFC..Sep 14 2022, 6:24 AM

ABataev mentioned this in rGd647312e3f57: [SLP][NFC]Extract getLastInstructionInBundle function for better.Sep 14 2022, 8:44 AM

Rebase

Harbormaster completed remote builds in B192832: Diff 468668.Oct 18 2022, 1:42 PM

nhaehnle removed a subscriber: nhaehnle.Oct 19 2022, 2:00 AM

Large update.
Includes:

Unifies all shuffle builders and shuffle demission operands.
Generalizes emission and cost model estimation of the buildvectors/gathers.

Will be splitted into several smaller patches eventually.

Harbormaster completed remote builds in B201460: Diff 480583.Dec 6 2022, 9:34 PM

ABataev mentioned this in D139718: [SLP][NFC]Inital redesign of ShuffleInstructionBuilder, NFC..Dec 9 2022, 7:50 AM

ABataev mentioned this in rGecac8192dbf6: [SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC..Dec 13 2022, 9:54 AM

Rebase

Harbormaster completed remote builds in B202927: Diff 482594.Dec 13 2022, 1:17 PM

Restore accidentally removed code.

Harbormaster completed remote builds in B202945: Diff 482619.Dec 13 2022, 2:43 PM

Rebase

Harbormaster completed remote builds in B204383: Diff 484571.Dec 21 2022, 7:50 AM

ABataev mentioned this in D140499: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 21 2022, 1:54 PM

khchen added a subscriber: khchen.Dec 22 2022, 8:35 AM

ABataev mentioned this in rGac01ae71f0c4: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 28 2022, 6:11 AM

Rebase

Harbormaster completed remote builds in B206131: Diff 486895.Jan 6 2023, 10:07 AM

Rebase

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 9 2023, 9:43 AM

Harbormaster completed remote builds in B206577: Diff 487485.Jan 9 2023, 10:30 AM

ABataev mentioned this in D141512: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 11 2023, 8:33 AM

ABataev mentioned this in D141940: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Jan 17 2023, 8:01 AM

ABataev mentioned this in rG9bdcf8778a5c: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 19 2023, 1:50 PM

ABataev mentioned this in rG708eb1b96d9a: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Feb 20 2023, 6:16 AM

ABataev mentioned this in D144958: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Feb 28 2023, 5:21 AM

ABataev mentioned this in rGa611b3f3059e: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 7 2023, 12:47 PM

Rebase

Restore deleted code/update test

Harbormaster completed remote builds in B218206: Diff 503510.Mar 8 2023, 2:48 PM

ABataev mentioned this in D145732: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function..Mar 9 2023, 2:20 PM

hans mentioned this in rG3b3a4c270bcb: Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather….Mar 10 2023, 5:40 AM

ABataev mentioned this in rG93a9be0cea0a: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 10 2023, 1:22 PM

ABataev mentioned this in rGf3a68ac10c84: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector….Mar 13 2023, 6:27 AM

Rebase

RKSimon added inline comments.Mar 13 2023, 2:27 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6147	Any chance that we can use ShuffleVectorInst::isIdentityMask ?
6485	auto *
6487	auto *

ABataev added inline comments.Mar 13 2023, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6147	Sure, will do it later
6487	Both these cases are the existing code, just the diff is not quite correct because of the big differences.

Restore accidentally removed lines, address comments

Harbormaster completed remote builds in B219182: Diff 504861.Mar 13 2023, 5:18 PM

Rebase

Restore some deleted code

Harbormaster completed remote builds in B219617: Diff 505467.Mar 15 2023, 7:08 AM

ABataev mentioned this in D146167: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars..Mar 15 2023, 2:14 PM

ABataev mentioned this in rG0ad87ffdcc23: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining….Mar 17 2023, 11:21 AM

Rebase

Harbormaster completed remote builds in B220124: Diff 506162.Mar 17 2023, 12:55 PM

ABataev mentioned this in D146564: [SLP]Find reused scalars in buildvector sequences, if any..Mar 21 2023, 2:11 PM

ABataev mentioned this in rG40105a993399: [SLP]Find reused scalars in buildvector sequences, if any..Apr 5 2023, 9:39 AM

Rebase

Harbormaster completed remote builds in B224057: Diff 511474.Apr 6 2023, 11:37 AM

Rebase

Harbormaster completed remote builds in B224133: Diff 511560.Apr 6 2023, 5:26 PM

Rebase

Harbormaster completed remote builds in B224875: Diff 512589.Apr 11 2023, 3:26 PM

ABataev mentioned this in D148174: [SLP]Introduce gather cost estimation function..Apr 12 2023, 2:36 PM

ABataev mentioned this in rGf82eb7e066f3: [SLP]Introduce gather cost estimation function..Apr 13 2023, 10:19 AM

Rebase

Harbormaster completed remote builds in B225410: Diff 513316.Apr 13 2023, 12:33 PM

ABataev mentioned this in D148279: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions..Apr 13 2023, 4:42 PM

ABataev mentioned this in rGcd341f3f4878: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 5:55 AM

ABataev mentioned this in rG1ce4b26a21a0: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 11:54 AM

Rebase

Harbormaster completed remote builds in B227770: Diff 516462.Apr 24 2023, 11:19 AM

dtemirbulatov added a reviewer: vporpo.Apr 27 2023, 5:39 PM

Temp rebase, requires some extra work.

Harbormaster completed remote builds in B230224: Diff 519833.May 5 2023, 7:04 AM

Rebase

Herald added a subscriber: wangpc. · View Herald TranscriptNov 9 2023, 2:20 PM

Harbormaster completed remote builds in B258052: Diff 558067.Nov 9 2023, 6:17 PM

Rebase

Harbormaster completed remote builds in B258083: Diff 558113.Nov 16 2023, 10:49 AM

LGTM.

This revision is now accepted and ready to land.Thu, Nov 30, 7:34 AM

LGTM.

Rebase

Harbormaster completed remote builds in B258147: Diff 558197.Thu, Nov 30, 11:35 AM

Closed by commit rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph. (authored by ABataev). · Explain WhyFri, Dec 1, 11:26 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph..

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

In D110978#4657889, @vporpo wrote:

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

Ping @ABataev ! This is blocking our internal release at Google!

dtemirbulatov added a subscriber: dtemirbulatov.Tue, Dec 12, 1:54 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

1294 lines

test/

Transforms/

SLPVectorizer/

AArch64/

accelerate-vector-functions-inseltpoison.ll

66 lines

accelerate-vector-functions.ll

66 lines

horizontal.ll

2 lines

loadorder.ll

593 lines

slp-fma-loss.ll

135 lines

vectorize-free-extracts-inserts.ll

244 lines

AMDGPU/

add_sub_sat-inseltpoison.ll

12 lines

add_sub_sat.ll

12 lines

crash_extract_subvector_cost.ll

14 lines

X86/

PR35865-inseltpoison.ll

8 lines

PR35865.ll

8 lines

PR39774.ll

81 lines

alternate-calls-inseltpoison.ll

18 lines

alternate-calls.ll

18 lines

alternate-cast-inseltpoison.ll

6 lines

alternate-cast.ll

6 lines

alternate-cmp-swapped-pred.ll

14 lines

alternate-fp-inseltpoison.ll

2 lines

alternate-fp.ll

2 lines

alternate-int-inseltpoison.ll

62 lines

alternate-int.ll

62 lines

arith-fp-inseltpoison.ll

62 lines

arith-fp.ll

62 lines

blending-shuffle-inseltpoison.ll

42 lines

42 lines

8 lines

2 lines

66 lines

crash_exceed_scheduling.ll

9 lines

crash_smallpt.ll

26 lines

cse.ll

34 lines

diamond_broadcast_extra_shuffle.ll

28 lines

extract-scalar-from-undef.ll

22 lines

extract-shuffle-inseltpoison.ll

9 lines

extract-shuffle.ll

9 lines

extract.ll

11 lines

extractelement-multiple-uses.ll

12 lines

extractelement.ll

24 lines

gather-extractelements-different-bbs.ll

37 lines

8 lines

8 lines

6 lines

46 lines

insert-element-build-vector-const-undef.ll

23 lines

insert-element-build-vector-inseltpoison.ll

54 lines

insert-element-build-vector.ll

54 lines

insert-shuffle.ll

7 lines

insertelement-postpone.ll

10 lines

jumbled-load-multiuse.ll

2 lines

jumbled_store_crash.ll

4 lines

landing_pad.ll

24 lines

lookahead.ll

97 lines

malformed_phis.ll

97 lines

matched-shuffled-entries.ll

41 lines

memory-runtime-checks.ll

6 lines

26 lines

32 lines

8 lines

12 lines

51 lines

42 lines

16 lines

2 lines

82 lines

reduction-same-vals.ll

24 lines

reduction-transpose.ll

160 lines

reduction2.ll

4 lines

redux-feed-buildvector.ll

26 lines

remark_extract_broadcast.ll

8 lines

reorder-reused-masked-gather.ll

20 lines

reused-undefs.ll

8 lines

root-trunc-extract-reuse.ll

5 lines

scatter-vectorize-reorder.ll

4 lines

stacksave-dependence.ll

23 lines

tiny-tree.ll

16 lines

vec_list_bias-inseltpoison.ll

6 lines

vec_list_bias.ll

6 lines

vectorize-widest-phis.ll

22 lines

Diff 468668

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines
static Optional<unsigned> getInsertIndex(const Value *InsertInst,		static Optional<unsigned> getInsertIndex(const Value *InsertInst,
unsigned Offset = 0) {		unsigned Offset = 0) {
int Index = Offset;		int Index = Offset;
if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {		if (const auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {
if (const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2))) {		if (const auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2))) {
auto *VT = cast<FixedVectorType>(IE->getType());		auto *VT = cast<FixedVectorType>(IE->getType());
if (CI->getValue().uge(VT->getNumElements()))		if (CI->getValue().uge(VT->getNumElements()))
return None;		return None;
Index *= VT->getNumElements();		Index *= VT->getNumElements();
		RKSimonUnsubmitted Not Done Reply Inline Actions Is it worth merging the isa<> and cast<> into a dyn_cast<>? RKSimon: Is it worth merging the isa<> and cast<> into a dyn_cast<>?
Index += CI->getZExtValue();		Index += CI->getZExtValue();
return Index;		return Index;
}		}
return None;		return None;
}		}

const auto *IV = cast<InsertValueInst>(InsertInst);		const auto *IV = cast<InsertValueInst>(InsertInst);
Type *CurrentType = IV->getType();		Type *CurrentType = IV->getType();
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
/// ret <4 x i8> %ins4		/// ret <4 x i8> %ins4
/// can be transformed into:		/// can be transformed into:
/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,		/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,
/// i32 6>		/// i32 6>
/// %2 = mul <4 x i8> %1, %1		/// %2 = mul <4 x i8> %1, %1
/// ret <4 x i8> %2		/// ret <4 x i8> %2
/// We convert this initially to something like:
/// %x0 = extractelement <4 x i8> %x, i32 0
/// %x3 = extractelement <4 x i8> %x, i32 3
/// %y1 = extractelement <4 x i8> %y, i32 1
/// %y2 = extractelement <4 x i8> %y, i32 2
/// %1 = insertelement <4 x i8> poison, i8 %x0, i32 0
/// %2 = insertelement <4 x i8> %1, i8 %x3, i32 1
/// %3 = insertelement <4 x i8> %2, i8 %y1, i32 2
/// %4 = insertelement <4 x i8> %3, i8 %y2, i32 3
/// %5 = mul <4 x i8> %4, %4
/// %6 = extractelement <4 x i8> %5, i32 0
/// %ins1 = insertelement <4 x i8> poison, i8 %6, i32 0
/// %7 = extractelement <4 x i8> %5, i32 1
/// %ins2 = insertelement <4 x i8> %ins1, i8 %7, i32 1
/// %8 = extractelement <4 x i8> %5, i32 2
/// %ins3 = insertelement <4 x i8> %ins2, i8 %8, i32 2
/// %9 = extractelement <4 x i8> %5, i32 3
/// %ins4 = insertelement <4 x i8> %ins3, i8 %9, i32 3
/// ret <4 x i8> %ins4
/// InstCombiner transforms this into a shuffle and vector mul
/// Mask will return the Shuffle Mask equivalent to the extracted elements.		/// Mask will return the Shuffle Mask equivalent to the extracted elements.
/// TODO: Can we split off and reuse the shuffle mask detection from		/// TODO: Can we split off and reuse the shuffle mask detection from
/// ShuffleVectorInst/getShuffleCost?		/// ShuffleVectorInst/getShuffleCost?
static Optional<TargetTransformInfo::ShuffleKind>		static Optional<TargetTransformInfo::ShuffleKind>
isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {		isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });		find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });
if (It == VL.end())		if (It == VL.end())
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {
if (CommonShuffleMode == Select && Vec2)		if (CommonShuffleMode == Select && Vec2)
return TargetTransformInfo::SK_Select;		return TargetTransformInfo::SK_Select;
// If Vec2 was never used, we have a permutation of a single vector, otherwise		// If Vec2 was never used, we have a permutation of a single vector, otherwise
// we have permutation of 2 vectors.		// we have permutation of 2 vectors.
return Vec2 ? TargetTransformInfo::SK_PermuteTwoSrc		return Vec2 ? TargetTransformInfo::SK_PermuteTwoSrc
: TargetTransformInfo::SK_PermuteSingleSrc;		: TargetTransformInfo::SK_PermuteSingleSrc;
}		}

		/// \returns True if Extract{Value,Element} instruction extracts element Idx.
		static Optional<unsigned> getExtractIndex(Instruction *E) {
		unsigned Opcode = E->getOpcode();
		assert((Opcode == Instruction::ExtractElement \|\|
		Opcode == Instruction::ExtractValue) &&
		"Expected extractelement or extractvalue instruction.");
		if (Opcode == Instruction::ExtractElement) {
		auto *CI = dyn_cast<ConstantInt>(E->getOperand(1));
		if (!CI)
		return None;
		return CI->getZExtValue();
		}
		ExtractValueInst *EI = cast<ExtractValueInst>(E);
		if (EI->getNumIndices() != 1)
		return None;
		return *EI->idx_begin();
		}

		/// Tries to find extractelement instructions with constant indices from fixed
		/// vector type and gather such instructions into a bunch, which highly likely
		/// might be detected as a shuffle of 1 or 2 input vectors. If this attempt was
		/// successful, the matched scalars are replaced by poison values in \p VL for
		/// future analysis.
		static Optional<TTI::ShuffleKind>
		tryToGatherExtractElements(SmallVectorImpl<Value *> &VL,
		SmallVectorImpl<int> &Mask) {
		// Scan list of gathered scalars for extractelements that can be represented
		// as shuffles.
		MapVector<Value *, SmallVector<int>> VectorOpToIdx;
		SmallVector<int> UndefVectorExtracts;
		for (int I = 0, E = VL.size(); I < E; ++I) {
		auto *EI = dyn_cast<ExtractElementInst>(VL[I]);
		if (!EI)
		continue;
		auto *VecTy = dyn_cast<FixedVectorType>(EI->getVectorOperandType());
		if (!VecTy \|\| !isa<ConstantInt, UndefValue>(EI->getIndexOperand()))
		continue;
		Optional<unsigned> Idx = getExtractIndex(EI);
		// Undefined index.
		if (!Idx) {
		UndefVectorExtracts.push_back(I);
		continue;
		}
		SmallVector<int> ExtractMask(VecTy->getNumElements(), 0);
		std::iota(ExtractMask.begin(), ExtractMask.end(), 0);
		ExtractMask[*Idx] = UndefMaskElem;
		if (isUndefVector(EI->getVectorOperand(), ExtractMask).all()) {
		UndefVectorExtracts.push_back(I);
		continue;
		}
		VectorOpToIdx[EI->getVectorOperand()].push_back(I);
		}
		// Sort the vector operands by the maximum number of uses in extractelements.
		MapVector<unsigned, SmallVector<Value *>> VFToVector;
		for (const auto &Data : VectorOpToIdx)
		VFToVector[cast<FixedVectorType>(Data.first->getType())->getNumElements()]
		.push_back(Data.first);
		for (auto &Data : VFToVector) {
		stable_sort(Data.second, [&VectorOpToIdx](Value V1, Value V2) {
		return VectorOpToIdx.find(V1)->second.size() >
		VectorOpToIdx.find(V2)->second.size();
		});
		}
		// Find the best pair of the vectors with the same number of elements or a
		// single vector.
		const int UndefSz = UndefVectorExtracts.size();
		unsigned SingleMax = 0;
		Value *SingleVec = nullptr;
		unsigned PairMax = 0;
		std::pair<Value , Value > PairVec(nullptr, nullptr);
		for (auto &Data : VFToVector) {
		Value *V1 = Data.second.front();
		if (SingleMax < VectorOpToIdx[V1].size() + UndefSz) {
		SingleMax = VectorOpToIdx[V1].size() + UndefSz;
		SingleVec = V1;
		}
		Value *V2 = nullptr;
		if (Data.second.size() > 1)
		V2 = *std::next(Data.second.begin());
		if (V2 && PairMax < VectorOpToIdx[V1].size() + VectorOpToIdx[V2].size() +
		UndefSz) {
		PairMax = VectorOpToIdx[V1].size() + VectorOpToIdx[V2].size() + UndefSz;
		PairVec = std::make_pair(V1, V2);
		}
		}
		if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)
		return None;
		// Check if better to perform a shuffle of 2 vectors or just of a single
		// vector.
		SmallVector<Value *> SavedVL(VL.begin(), VL.end());
		SmallVector<Value *> GatheredExtracts(
		VL.size(), PoisonValue::get(VL.front()->getType()));
		if (SingleMax >= PairMax && SingleMax) {
		RKSimonUnsubmitted Not Done Reply Inline Actions return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block? RKSimon: return None instead to make it obvious it failed? Maybe do this as an early out instead of the…
		for (int Idx : VectorOpToIdx[SingleVec])
		std::swap(GatheredExtracts[Idx], VL[Idx]);
		} else {
		for (Value *V : {PairVec.first, PairVec.second})
		for (int Idx : VectorOpToIdx[V])
		std::swap(GatheredExtracts[Idx], VL[Idx]);
		}
		// Add extracts from undefs too.
		for (int Idx : UndefVectorExtracts)
		std::swap(GatheredExtracts[Idx], VL[Idx]);
		// Check that gather of extractelements can be represented as just a
		// shuffle of a single/two vectors the scalars are extracted from.
		Optional<TTI::ShuffleKind> Res = isFixedVectorShuffle(GatheredExtracts, Mask);
		if (!Res) {
		// Restore the original VL if attempt was not successful.
		VL.swap(SavedVL);
		return None;
		}
		// Restore unused scalars from mask.
		for (int I = 0, E = GatheredExtracts.size(); I > E; ++I) {
		auto *EI = dyn_cast<ExtractElementInst>(VL[I]);
		if (!EI \|\| !isa<FixedVectorType>(EI->getVectorOperandType()) \|\|
		!isa<ConstantInt, UndefValue>(EI->getIndexOperand()) \|\|
		is_contained(UndefVectorExtracts, I))
		continue;
		if (Mask[I] == UndefMaskElem)
		std::swap(VL[I], GatheredExtracts[I]);
		}
		return Res;
		}

namespace {		namespace {

/// Main data required for vectorization of instructions.		/// Main data required for vectorization of instructions.
struct InstructionsState {		struct InstructionsState {
/// The very first instruction in the list with the main opcode.		/// The very first instruction in the list with the main opcode.
Value *OpValue = nullptr;		Value *OpValue = nullptr;

/// The main/alternate instruction.		/// The main/alternate instruction.
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	static bool allSameType(ArrayRef<Value *> VL) {
Type *Ty = VL[0]->getType();		Type *Ty = VL[0]->getType();
for (int i = 1, e = VL.size(); i < e; i++)		for (int i = 1, e = VL.size(); i < e; i++)
if (VL[i]->getType() != Ty)		if (VL[i]->getType() != Ty)
return false;		return false;

return true;		return true;
}		}

/// \returns True if Extract{Value,Element} instruction extracts element Idx.
static Optional<unsigned> getExtractIndex(Instruction *E) {
unsigned Opcode = E->getOpcode();
assert((Opcode == Instruction::ExtractElement \|\|
Opcode == Instruction::ExtractValue) &&
"Expected extractelement or extractvalue instruction.");
if (Opcode == Instruction::ExtractElement) {
auto *CI = dyn_cast<ConstantInt>(E->getOperand(1));
if (!CI)
return None;
return CI->getZExtValue();
}
ExtractValueInst *EI = cast<ExtractValueInst>(E);
if (EI->getNumIndices() != 1)
return None;
return *EI->idx_begin();
}

/// \returns True if in-tree use also needs extract. This refers to		/// \returns True if in-tree use also needs extract. This refers to
/// possible scalar operand in vectorized instruction.		/// possible scalar operand in vectorized instruction.
static bool InTreeUserNeedToExtract(Value Scalar, Instruction UserInst,		static bool InTreeUserNeedToExtract(Value Scalar, Instruction UserInst,
TargetLibraryInfo *TLI) {		TargetLibraryInfo *TLI) {
unsigned Opcode = UserInst->getOpcode();		unsigned Opcode = UserInst->getOpcode();
switch (Opcode) {		switch (Opcode) {
case Instruction::Load: {		case Instruction::Load: {
LoadInst *LI = cast<LoadInst>(UserInst);		LoadInst *LI = cast<LoadInst>(UserInst);
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	void deleteTree() {
ExternalUses.clear();		ExternalUses.clear();
for (auto &Iter : BlocksSchedules) {		for (auto &Iter : BlocksSchedules) {
BlockScheduling *BS = Iter.second.get();		BlockScheduling *BS = Iter.second.get();
BS->clear();		BS->clear();
}		}
MinBWs.clear();		MinBWs.clear();
InstrElementSize.clear();		InstrElementSize.clear();
UserIgnoreList = nullptr;		UserIgnoreList = nullptr;
		PostponedGathers.clear();
}		}

unsigned getTreeSize() const { return VectorizableTree.size(); }		unsigned getTreeSize() const { return VectorizableTree.size(); }

/// Perform LICM and CSE on the newly generated gather sequences.		/// Perform LICM and CSE on the newly generated gather sequences.
void optimizeGatherSequence();		void optimizeGatherSequence();

/// Checks if the specified gather tree entry \p TE can be represented as a		/// Checks if the specified gather tree entry \p TE can be represented as a
▲ Show 20 Lines • Show All 1,216 Lines • ▼ Show 20 Lines	private:
/// non-identity permutation that allows to reuse extract instructions.		/// non-identity permutation that allows to reuse extract instructions.
bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,		bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,
SmallVectorImpl<unsigned> &CurrentOrder) const;		SmallVectorImpl<unsigned> &CurrentOrder) const;

/// Vectorize a single entry in the tree.		/// Vectorize a single entry in the tree.
Value vectorizeTree(TreeEntry E);		Value vectorizeTree(TreeEntry E);

/// Vectorize a single entry in the tree, starting in \p VL.		/// Vectorize a single entry in the tree, starting in \p VL.
Value vectorizeTree(ArrayRef<Value > VL);		Value vectorizeTree(ArrayRef<Value > VL, const EdgeInfo &EI);

/// Create a new vector from a list of scalar values. Produces a sequence
/// which exploits values reused across lanes, and arranges the inserts
/// for ease of later optimization.
Value createBuildVector(ArrayRef<Value > VL);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars. If \p		/// context means the creation of vectors from a group of scalars. If \p
/// NeedToShuffle is true, need to add a cost of reshuffling some of the		/// NeedToShuffle is true, need to add a cost of reshuffling some of the
/// vector elements.		/// vector elements.
InstructionCost getGatherCost(FixedVectorType *Ty,		InstructionCost getGatherCost(FixedVectorType *Ty,
const APInt &ShuffledIndices,		const APInt &ShuffledIndices,
bool NeedToShuffle) const;		bool NeedToShuffle) const;

/// Returns the instruction in the bundle, which can be used as a base point		/// Returns the instruction in the bundle, which can be used as a base point
/// for scheduling. Usually it is the last instruction in the bundle, except		/// for scheduling. Usually it is the last instruction in the bundle, except
/// for the case when all operands are external (in this case, it is the first		/// for the case when all operands are external (in this case, it is the first
/// instruction in the list).		/// instruction in the list).
Instruction &getLastInstructionInBundle(const TreeEntry *E);		Instruction &getLastInstructionInBundle(const TreeEntry *E);

/// Checks if the gathered \p VL can be represented as shuffle(s) of previous		/// Checks if the gathered \p VL can be represented as shuffle(s) of previous
/// tree entries.		/// tree entries.
/// \returns ShuffleKind, if gathered values can be represented as shuffles of		/// \returns ShuffleKind, if gathered values can be represented as shuffles of
/// previous tree entries. \p Mask is filled with the shuffle mask.		/// previous tree entries. \p Mask is filled with the shuffle mask.
Optional<TargetTransformInfo::ShuffleKind>		Optional<TargetTransformInfo::ShuffleKind>
isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,		isGatherShuffledEntry(const TreeEntry TE, ArrayRef<Value > VL,
		SmallVectorImpl<int> &Mask,
SmallVectorImpl<const TreeEntry *> &Entries);		SmallVectorImpl<const TreeEntry *> &Entries);

/// \returns the scalarization cost for this list of values. Assuming that		/// \returns the scalarization cost for this list of values. Assuming that
/// this subtree gets vectorized, we may need to extract the values from the		/// this subtree gets vectorized, we may need to extract the values from the
/// roots. This method calculates the cost of extracting the values.		/// roots. This method calculates the cost of extracting the values.
InstructionCost getGatherCost(ArrayRef<Value *> VL) const;		InstructionCost getGatherCost(ArrayRef<Value *> VL) const;

/// Set the Builder insert point to one after the last instruction in		/// Set the Builder insert point to one after the last instruction in
/// the bundle		/// the bundle
void setInsertPointAfterBundle(const TreeEntry *E);		void setInsertPointAfterBundle(const TreeEntry *E);

/// \returns a vector from a collection of scalars in \p VL.		/// \returns a vector from a collection of scalars in \p VL.
Value gather(ArrayRef<Value > VL);		Value gather(ArrayRef<Value > VL, Value *Root = nullptr);

/// \returns whether the VectorizableTree is fully vectorizable and will		/// \returns whether the VectorizableTree is fully vectorizable and will
/// be beneficial even the tree height is tiny.		/// be beneficial even the tree height is tiny.
bool isFullyVectorizableTinyTree(bool ForReduction) const;		bool isFullyVectorizableTinyTree(bool ForReduction) const;

/// Reorder commutative or alt operands to get better probability of		/// Reorder commutative or alt operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,
▲ Show 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	#endif
SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;		SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;

/// Maps a value to the proposed vectorizable size.		/// Maps a value to the proposed vectorizable size.
SmallDenseMap<Value *, unsigned> InstrElementSize;		SmallDenseMap<Value *, unsigned> InstrElementSize;

/// A list of scalars that we found that we need to keep as scalars.		/// A list of scalars that we found that we need to keep as scalars.
ValueSet MustGather;		ValueSet MustGather;

		/// List of gather nodes, depending on other gather/vector nodes, which should
		/// be emitted after the vector instruction emission process to correctly
		/// handle order of the vector instructions and shuffles.
		SetVector<const TreeEntry *> PostponedGathers;

/// This POD struct describes one external user in the vectorized tree.		/// This POD struct describes one external user in the vectorized tree.
struct ExternalUser {		struct ExternalUser {
ExternalUser(Value S, llvm::User U, int L)		ExternalUser(Value S, llvm::User U, int L)
: Scalar(S), User(U), Lane(L) {}		: Scalar(S), User(U), Lane(L) {}

// Which scalar in our function.		// Which scalar in our function.
Value *Scalar;		Value *Scalar;

▲ Show 20 Lines • Show All 3,369 Lines • ▼ Show 20 Lines	InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
if (MinBWs.count(VL[0]))		if (MinBWs.count(VL[0]))
VecTy = FixedVectorType::get(		VecTy = FixedVectorType::get(
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());		IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());
unsigned EntryVF = E->getVectorFactor();		unsigned EntryVF = E->getVectorFactor();
auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);		auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
// FIXME: it tries to fix a problem with MSVC buildbots.		// FIXME: it tries to fix a problem with MSVC buildbots.
TargetTransformInfo *TTI = this->TTI;		TargetTransformInfo *TTI = this->TTI;
		RKSimonUnsubmitted Not Done Reply Inline Actions Any chance that we can use ShuffleVectorInst::isIdentityMask ? RKSimon: Any chance that we can use ShuffleVectorInst::isIdentityMask ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, will do it later ABataev: Sure, will do it later
auto AdjustExtractsCost = [=](InstructionCost &Cost) {		auto AdjustExtractsCost = [=](InstructionCost &Cost,
		ArrayRef<int> Mask = None) {
DenseMap<Value *, int> ExtractVectorsTys;		DenseMap<Value *, int> ExtractVectorsTys;
SmallPtrSet<Value *, 4> CheckedExtracts;		SmallPtrSet<Value *, 4> CheckedExtracts;
for (auto *V : VL) {		for (auto [I, V] : enumerate(VL)) {
if (isa<UndefValue>(V))		// Ignore non-extractelement scalars.
		if (isa<UndefValue>(V) \|\| (!Mask.empty() && Mask[I] == UndefMaskElem))
continue;		continue;
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
// Also, avoid adjusting the cost for extractelements with multiple uses		// Also, avoid adjusting the cost for extractelements with multiple uses
// in different graph entries.		// in different graph entries.
const TreeEntry *VE = getTreeEntry(V);		const TreeEntry *VE = getTreeEntry(V);
if (!CheckedExtracts.insert(V).second \|\|		if (!CheckedExtracts.insert(V).second \|\|
!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|		!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|
(VE && VE != E))		(VE && VE != E))
continue;		continue;
auto *EE = cast<ExtractElementInst>(V);		auto *EE = cast<ExtractElementInst>(V);
Optional<unsigned> EEIdx = getExtractIndex(EE);		Optional<unsigned> EEIdx = getExtractIndex(EE);
if (!EEIdx)		if (!EEIdx)
continue;		continue;
unsigned Idx = *EEIdx;		unsigned Idx = *EEIdx;
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
if (TTI->getNumberOfParts(VecTy) !=		if (TTI->getNumberOfParts(VecTy) !=
TTI->getNumberOfParts(EE->getVectorOperandType())) {		TTI->getNumberOfParts(EE->getVectorOperandType())) {
auto It =		auto It =
ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;		ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;
It->getSecond() = std::min<int>(It->second, Idx);		It->getSecond() = std::min<int>(It->second, Idx);
}		}
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EE->hasOneUse()) {		if (EE->hasOneUse()) {
Show All 17 Lines	auto AdjustExtractsCost = [=](InstructionCost &Cost,
}		}
// Add a cost for subvector extracts/inserts if required.		// Add a cost for subvector extracts/inserts if required.
for (const auto &Data : ExtractVectorsTys) {		for (const auto &Data : ExtractVectorsTys) {
auto *EEVTy = cast<FixedVectorType>(Data.first->getType());		auto *EEVTy = cast<FixedVectorType>(Data.first->getType());
unsigned NumElts = VecTy->getNumElements();		unsigned NumElts = VecTy->getNumElements();
if (Data.second % NumElts == 0)		if (Data.second % NumElts == 0)
continue;		continue;
if (TTI->getNumberOfParts(EEVTy) > TTI->getNumberOfParts(VecTy)) {		if (TTI->getNumberOfParts(EEVTy) > TTI->getNumberOfParts(VecTy)) {
unsigned Idx = (Data.second / NumElts) * NumElts;		unsigned Idx = (Data.second / NumElts) * NumElts;
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
unsigned EENumElts = EEVTy->getNumElements();		unsigned EENumElts = EEVTy->getNumElements();
		if (Idx % NumElts == 0)
		continue;
		RKSimonUnsubmitted Not Done Reply Inline Actions What targets are we still missing support for? RKSimon: What targets are we still missing support for?
		ABataevAuthorUnsubmitted Done Reply Inline Actions AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts. ABataev: AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.
if (Idx + NumElts <= EENumElts) {		if (Idx + NumElts <= EENumElts) {
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, None, CostKind, Idx, VecTy);		EEVTy, None, CostKind, Idx, VecTy);
} else {		} else {
// Need to round up the subvector type vectorization factor to avoid a		// Need to round up the subvector type vectorization factor to avoid a
// crash in cost model functions. Make SubVT so that Idx + VF of SubVT		// crash in cost model functions. Make SubVT so that Idx + VF of SubVT
// <= EENumElts.		// <= EENumElts.
auto *SubVT =		auto *SubVT =
FixedVectorType::get(VecTy->getElementType(), EENumElts - Idx);		FixedVectorType::get(VecTy->getElementType(), EENumElts - Idx);
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, None, CostKind, Idx, SubVT);		EEVTy, None, CostKind, Idx, SubVT);
}		}
} else {		} else {
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_InsertSubvector,		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_InsertSubvector,
VecTy, None, CostKind, 0, EEVTy);		VecTy, None, CostKind, 0, EEVTy);
}		}
}		}
};		};
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))		if (allConstant(VL))
return 0;		return 0;
if (isa<InsertElementInst>(VL[0]))		if (isa<InsertElementInst>(VL[0]))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();
SmallVector<int> Mask;
SmallVector<const TreeEntry *> Entries;
Optional<TargetTransformInfo::ShuffleKind> Shuffle =
isGatherShuffledEntry(E, Mask, Entries);
if (Shuffle) {
InstructionCost GatherCost = 0;		InstructionCost GatherCost = 0;
if (ShuffleVectorInst::isIdentityMask(Mask)) {		SmallVector<Value *> Gathers(VL.begin(), VL.end());
// Perfect match in the graph, will reuse the previously vectorized		BoUpSLP::ValueSet VectorizedLoads;
// node. Cost is 0.
LLVM_DEBUG(
dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "
<< *VL.front() << ".\n");
if (NeedToShuffleReuses)
GatherCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);
} else {
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
<< " entries for bundle that starts with "
<< *VL.front() << ".\n");
// Detected that instead of gather we can emit a shuffle of single/two
// previously vectorized nodes. Add the cost of the permutation rather
// than gather.
::addMask(Mask, E->ReuseShuffleIndices);
GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);
}
return GatherCost;
}
if ((E->getOpcode() == Instruction::ExtractElement \|\|
all_of(E->Scalars,
[](Value *V) {
return isa<ExtractElementInst, UndefValue>(V);
})) &&
allSameType(VL)) {
// Check that gather of extractelements can be represented as just a
// shuffle of a single/two vectors the scalars are extracted from.
SmallVector<int> Mask;
Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =
isFixedVectorShuffle(VL, Mask);
if (ShuffleKind) {
// Found the bunch of extractelement instructions that must be gathered
// into a vector and can be represented as a permutation elements in a
// single input vector or of 2 input vectors.
InstructionCost Cost =
computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);
AdjustExtractsCost(Cost);
if (NeedToShuffleReuses)
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);
return Cost;
}
}
if (isSplat(VL)) {
// Found the broadcasting of the single scalar, calculate the cost as the
// broadcast.
assert(VecTy == FinalVecTy &&
"No reused scalars expected for broadcast.");
return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy,
/Mask=/None, CostKind, /Index=/0,
/SubTp=/nullptr, /Args=/VL[0]);
}
InstructionCost ReuseShuffleCost = 0;
if (NeedToShuffleReuses)
ReuseShuffleCost = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
// Improve gather cost for gather of loads, if we can group some of the		// Improve gather cost for gather of loads, if we can group some of the
// loads into vector loads.		// loads into vector loads.
if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&		if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&
!E->isAltShuffle()) {		!E->isAltShuffle() &&
BoUpSLP::ValueSet VectorizedLoads;		!all_of(Gathers, [this](Value *V) { return getTreeEntry(V); }) &&
		!isSplat(Gathers)) {
unsigned StartIdx = 0;		unsigned StartIdx = 0;
unsigned VF = VL.size() / 2;		unsigned VF = VL.size() / 2;
unsigned VectorizedCnt = 0;		unsigned VectorizedCnt = 0;
unsigned ScatterVectorizeCnt = 0;		unsigned ScatterVectorizeCnt = 0;
const unsigned Sz = DL->getTypeSizeInBits(E->getMainOp()->getType());		const unsigned Sz = DL->getTypeSizeInBits(E->getMainOp()->getType());
for (unsigned MinVF = getMinVF(2 * Sz); VF >= MinVF; VF /= 2) {		for (unsigned MinVF = getMinVF(2 * Sz); VF >= MinVF; VF /= 2) {
for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;		for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;
Cnt += VF) {		Cnt += VF) {
Show All 28 Lines	if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&
// Check if the whole array was vectorized already - exit.		// Check if the whole array was vectorized already - exit.
if (StartIdx >= VL.size())		if (StartIdx >= VL.size())
break;		break;
// Found vectorizable parts - exit.		// Found vectorizable parts - exit.
if (!VectorizedLoads.empty())		if (!VectorizedLoads.empty())
break;		break;
}		}
if (!VectorizedLoads.empty()) {		if (!VectorizedLoads.empty()) {
InstructionCost GatherCost = 0;
unsigned NumParts = TTI->getNumberOfParts(VecTy);
bool NeedInsertSubvectorAnalysis =
!NumParts \|\| (VL.size() / VF) > NumParts;
// Get the cost for gathered loads.		// Get the cost for gathered loads.
for (unsigned I = 0, End = VL.size(); I < End; I += VF) {		for (unsigned I = 0, End = VL.size(); I < End; I += VF) {
if (VectorizedLoads.contains(VL[I]))		if (!VectorizedLoads.contains(VL[I]))
continue;		continue;
GatherCost += getGatherCost(VL.slice(I, VF));		// Exclude potentially vectorized loads from list of gathered scalars.
		for (unsigned K = I, End = I + VF; K < End; ++K)
		Gathers[K] = PoisonValue::get(Gathers[K]->getType());
}		}
// The cost for vectorized loads.		// The cost for vectorized loads.
InstructionCost ScalarsCost = 0;		InstructionCost ScalarsCost = 0;
for (Value *V : VectorizedLoads) {		for (Value *V : VectorizedLoads) {
auto *LI = cast<LoadInst>(V);		auto *LI = cast<LoadInst>(V);
ScalarsCost +=		ScalarsCost +=
TTI->getMemoryOpCost(Instruction::Load, LI->getType(),		TTI->getMemoryOpCost(Instruction::Load, LI->getType(),
LI->getAlign(), LI->getPointerAddressSpace(),		LI->getAlign(), LI->getPointerAddressSpace(),
CostKind, TTI::OperandValueInfo(), LI);		CostKind, TTI::OperandValueInfo(), LI);
}		}
auto *LI = cast<LoadInst>(E->getMainOp());		auto *LI = cast<LoadInst>(E->getMainOp());
auto *LoadTy = FixedVectorType::get(LI->getType(), VF);		auto *LoadTy = FixedVectorType::get(LI->getType(), VF);
Align Alignment = LI->getAlign();		Align Alignment = LI->getAlign();
GatherCost +=		GatherCost +=
VectorizedCnt *		VectorizedCnt *
TTI->getMemoryOpCost(Instruction::Load, LoadTy, Alignment,		TTI->getMemoryOpCost(Instruction::Load, LoadTy, Alignment,
LI->getPointerAddressSpace(), CostKind,		LI->getPointerAddressSpace(), CostKind,
TTI::OperandValueInfo(), LI);		TTI::OperandValueInfo(), LI);
GatherCost += ScatterVectorizeCnt *		GatherCost += ScatterVectorizeCnt *
TTI->getGatherScatterOpCost(		TTI->getGatherScatterOpCost(
Instruction::Load, LoadTy, LI->getPointerOperand(),		Instruction::Load, LoadTy, LI->getPointerOperand(),
/VariableMask=/false, Alignment, CostKind, LI);		/VariableMask=/false, Alignment, CostKind, LI);
if (NeedInsertSubvectorAnalysis) {		// Add the cost for the subvectors shuffling.
// Add the cost for the subvectors insert.		GatherCost += (VectorizedCnt + ScatterVectorizeCnt - 1) *
for (int I = VF, E = VL.size(); I < E; I += VF)		TTI->getShuffleCost(TTI::SK_Select, VecTy);
GatherCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VecTy,		GatherCost -= ScalarsCost;
None, CostKind, I, LoadTy);		}
		}
		int VF = VL.size();
		SmallVector<int> ExtractMask;
		// Try to gather extractelements, which can be represented as shuffles.
		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
		tryToGatherExtractElements(Gathers, ExtractMask);
		if (Shuffle) {
		// Found the bunch of extractelement instructions that must be gathered
		// into a vector and can be represented as a permutation elements in a
		// single input vector or of 2 input vectors.
		GatherCost = computeExtractCost(VL, VecTy, Shuffle, ExtractMask, TTI);
		AdjustExtractsCost(GatherCost, ExtractMask);
		}
		SmallVector<int> Mask;
		// Adds extract mask to the gather mask and checks if need to use extract
		// mask at all. Maybe, extracts create a perfect diamond match with other
		// vector/gather nodes.
		auto AddExtractMask = [&]() {
		if (ExtractMask.empty())
		return;
		bool NoNeedToGatherExtracts = true;
		for (int I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem) {
		Mask[I] = I;
		} else if (ExtractMask[I] != UndefMaskElem) {
		Mask[I] = I + VF;
		NoNeedToGatherExtracts = false;
		}
		}
		// The extract gathers are not used - no need to count them.
		if (NoNeedToGatherExtracts) {
		ExtractMask.clear();
		GatherCost = 0;
		}
		};
		SmallVector<const TreeEntry *> Entries;
		// Check for reused gathered scalars.
		Shuffle = isGatherShuffledEntry(E, Gathers, Mask, Entries);
		if (Shuffle) {
		// Adjust remaining gathered scalars.
		for (int I = 0; I < VF; ++I)
		if (Mask[I] != UndefMaskElem)
		Gathers[I] = PoisonValue::get(Gathers[I]->getType());
		if (any_of(Gathers, [](Value *V) {
		return isConstant(V) && !isa<UndefValue>(V);
		})) {
		if (*Shuffle == TargetTransformInfo::SK_PermuteSingleSrc) {
		for (int I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem)
		Mask[I] += VF;
		else if (!isa<UndefValue>(Gathers[I]) && isConstant(Gathers[I]))
		Mask[I] = I;
		}
		} else {
		GatherCost += TTI->getShuffleCost(*Shuffle, VecTy, Mask);
		for (int I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem)
		Mask[I] = I;
		else if (!isa<UndefValue>(Gathers[I]) && isConstant(Gathers[I]))
		Mask[I] = I + VF;
		}
		}
		// Add a mask for shuffle of extractelement instruction shuffling.
		AddExtractMask();
		if (ExtractMask.empty()) {
		::addMask(Mask, E->ReuseShuffleIndices);
		// Cost of the first shuffle with input constant vector.
		GatherCost +=
		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
		} else {
		// Cost of the first shuffle with input constant vector.
		GatherCost += TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, Mask);
		if (NeedToShuffleReuses)
		GatherCost += TTI->getShuffleCost(
		TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
		}
		} else {
		AddExtractMask();
		if (Entries.size() == 1 && ShuffleVectorInst::isIdentityMask(Mask)) {
		// Perfect match in the graph, will reuse the previously vectorized
		// node. Cost is 0.
		LLVM_DEBUG(dbgs() << "SLP: perfect diamond match for gather bundle "
		"that starts with "
		<< *VL.front() << ".\n");
		if (NeedToShuffleReuses)
		GatherCost +=
		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
		FinalVecTy, E->ReuseShuffleIndices);
		} else {
		LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
		<< " entries for bundle that starts with "
		<< *VL.front() << ".\n");
		// Detected that instead of gather we can emit a shuffle of single/two
		// previously vectorized nodes. Add the cost of the permutation rather
		// than gather.
		if (ExtractMask.empty() && *Shuffle == TTI::SK_PermuteSingleSrc) {
		::addMask(Mask, E->ReuseShuffleIndices);
		// Cost of the first shuffle with input constant vector.
		GatherCost +=
		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
		} else {
		// Cost of the first shuffle with input constant vector.
		GatherCost +=
		TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, Mask);
		if (NeedToShuffleReuses)
		GatherCost += TTI->getShuffleCost(
		TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
		}
}		}
return ReuseShuffleCost + GatherCost - ScalarsCost;
}		}
		// Add the cost for final shuffle with vectorized loads.
		if (!VectorizedLoads.empty())
		GatherCost += TTI->getShuffleCost(TTI::SK_Select, VecTy);
}		}
		InstructionCost ReuseShuffleCost = 0;
		if (!Shuffle && NeedToShuffleReuses) {
		ReuseShuffleCost = TTI->getShuffleCost(
		TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
		if (!VectorizedLoads.empty())
		GatherCost += ReuseShuffleCost;
		}
		if (Gathers != VL) {
		// Final permute with the vector of scalars.
		if (any_of(Gathers,
		[](Value *V) { return !isa<UndefValue>(V) && isConstant(V); }))
		GatherCost += TTI->getShuffleCost(TTI::SK_Select, VecTy);
		if (all_of(Gathers, isConstant))
		return GatherCost;
		}
		if (isSplat(Gathers) && (Gathers == VL \|\| VL.size() > 2)) {
		// Found the broadcasting of the single scalar, calculate the cost as the
		// broadcast.
		return GatherCost +
		(Gathers == VL ? 0
		: TTI->getShuffleCost(
		TargetTransformInfo::SK_Select, VecTy)) +
		TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
		}
		if (Gathers != VL)
		return GatherCost + getGatherCost(Gathers);
return ReuseShuffleCost + getGatherCost(VL);		return ReuseShuffleCost + getGatherCost(VL);
}		}
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!E->ReorderIndices.empty()) {		if (!E->ReorderIndices.empty()) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
if (E->getOpcode() == Instruction::Store) {		if (E->getOpcode() == Instruction::Store) {
// For stores the order is actually a mask.		// For stores the order is actually a mask.
Show All 10 Lines	if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask))
CommonCost =		CommonCost =
TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
assert(E->getOpcode() &&		assert(E->getOpcode() &&
((allSameType(VL) && allSameBlock(VL)) \|\|		((allSameType(VL) && allSameBlock(VL)) \|\|
(E->getOpcode() == Instruction::GetElementPtr &&		(E->getOpcode() == Instruction::GetElementPtr &&
E->getMainOp()->getType()->isPointerTy())) &&		E->getMainOp()->getType()->isPointerTy())) &&
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
"Invalid VL");		"Invalid VL");
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
		ABataevAuthorUnsubmitted Done Reply Inline Actions Both these cases are the existing code, just the diff is not quite correct because of the big differences. ABataev: Both these cases are the existing code, just the diff is not quite correct because of the big…
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
const unsigned Sz = VL.size();		const unsigned Sz = VL.size();
auto GetCostDiff =		auto GetCostDiff =
[=](function_ref<InstructionCost(unsigned)> ScalarEltCost,		[=](function_ref<InstructionCost(unsigned)> ScalarEltCost,
function_ref<InstructionCost(InstructionCost)> VectorCost) {		function_ref<InstructionCost(InstructionCost)> VectorCost) {
// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
InstructionCost ScalarCost = 0;		InstructionCost ScalarCost = 0;
▲ Show 20 Lines • Show All 1,143 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
if (ViewSLPTree)		if (ViewSLPTree)
ViewGraph(this, "SLP" + F->getName(), false, Str);		ViewGraph(this, "SLP" + F->getName(), false, Str);
#endif		#endif

return Cost;		return Cost;
}		}

Optional<TargetTransformInfo::ShuffleKind>		Optional<TargetTransformInfo::ShuffleKind>
BoUpSLP::isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,		BoUpSLP::isGatherShuffledEntry(const TreeEntry TE, ArrayRef<Value > VL,
		SmallVectorImpl<int> &Mask,
SmallVectorImpl<const TreeEntry *> &Entries) {		SmallVectorImpl<const TreeEntry *> &Entries) {
		Entries.clear();
		// No need to check for the topmost gather node.
		if (TE == VectorizableTree.front().get())
		return None;
		Mask.assign(VL.size(), UndefMaskElem);
		assert(TE->UserTreeIndices.size() == 1 &&
		"Expected only single user of the gather node.");
// TODO: currently checking only for Scalars in the tree entry, need to count		// TODO: currently checking only for Scalars in the tree entry, need to count
// reused elements too for better cost estimation.		// reused elements too for better cost estimation.
Mask.assign(TE->Scalars.size(), UndefMaskElem);		Instruction &UserInst =
Entries.clear();		getLastInstructionInBundle(TE->UserTreeIndices.front().UserTE);
		auto *PHI = dyn_cast<PHINode>(&UserInst);
		auto *NodeUI = DT->getNode(
		PHI ? PHI->getIncomingBlock(TE->UserTreeIndices.front().EdgeIdx)
		: UserInst.getParent());
		assert(NodeUI && "Should only process reachable instructions");
		SmallPtrSet<Value *, 4> GatheredScalars(VL.begin(), VL.end());
		auto CheckOrdering = [&](Instruction *LastEI) {
		// Check if the user node of the TE comes after user node of EntryPtr,
		// otherwise EntryPtr depends on TE.
		auto *EntryParent = LastEI->getParent();
		auto *NodeEUI = DT->getNode(EntryParent);
		if (!NodeEUI)
		return false;
		assert((NodeUI == NodeEUI) ==
		(NodeUI->getDFSNumIn() == NodeEUI->getDFSNumIn()) &&
		"Different nodes should have different DFS numbers");
		// Check the order of the gather nodes users.
		if (UserInst.getParent() != EntryParent &&
		(DT->dominates(NodeUI, NodeEUI) \|\| !DT->dominates(NodeEUI, NodeUI)))
		return false;
		if (UserInst.getParent() == EntryParent && UserInst.comesBefore(LastEI))
		return false;
		return true;
		};
// Build a lists of values to tree entries.		// Build a lists of values to tree entries.
DenseMap<Value , SmallPtrSet<const TreeEntry , 4>> ValueToTEs;		DenseMap<Value , SmallPtrSet<const TreeEntry , 4>> ValueToTEs;
for (const std::unique_ptr<TreeEntry> &EntryPtr : VectorizableTree) {		for (const std::unique_ptr<TreeEntry> &EntryPtr : VectorizableTree) {
if (EntryPtr.get() == TE)		if (EntryPtr.get() == TE)
break;		continue;
if (EntryPtr->State != TreeEntry::NeedToGather)		if (EntryPtr->State != TreeEntry::NeedToGather)
continue;		continue;
		if (!any_of(EntryPtr->Scalars, [&GatheredScalars](Value *V) {
		return GatheredScalars.contains(V);
		}))
		continue;
		assert(EntryPtr->UserTreeIndices.size() == 1 &&
		"Expected only single user of the gather node.");
		Instruction &EntryUserInst =
		getLastInstructionInBundle(EntryPtr->UserTreeIndices.front().UserTE);
		if (&UserInst == &EntryUserInst) {
		// If 2 gathers are operands of the same entry, compare operands indices,
		// use the earlier one as the base.
		if (TE->UserTreeIndices.front().UserTE ==
		EntryPtr->UserTreeIndices.front().UserTE &&
		TE->UserTreeIndices.front().EdgeIdx <
		EntryPtr->UserTreeIndices.front().EdgeIdx)
		continue;
		}
		// Check if the user node of the TE comes after user node of EntryPtr,
		// otherwise EntryPtr depends on TE.
		auto *EntryPHI = dyn_cast<PHINode>(&EntryUserInst);
		auto *EntryI =
		EntryPHI
		? EntryPHI
		->getIncomingBlock(EntryPtr->UserTreeIndices.front().EdgeIdx)
		->getTerminator()
		: &EntryUserInst;
		if (!CheckOrdering(EntryI))
		continue;
for (Value *V : EntryPtr->Scalars)		for (Value *V : EntryPtr->Scalars)
		if (!isConstant(V))
ValueToTEs.try_emplace(V).first->getSecond().insert(EntryPtr.get());		ValueToTEs.try_emplace(V).first->getSecond().insert(EntryPtr.get());
}		}
// Find all tree entries used by the gathered values. If no common entries		// Find all tree entries used by the gathered values. If no common entries
// found - not a shuffle.		// found - not a shuffle.
// Here we build a set of tree nodes for each gathered value and trying to		// Here we build a set of tree nodes for each gathered value and trying to
// find the intersection between these sets. If we have at least one common		// find the intersection between these sets. If we have at least one common
// tree node for each gathered value - we have just a permutation of the		// tree node for each gathered value - we have just a permutation of the
// single vector. If we have 2 different sets, we're in situation where we		// single vector. If we have 2 different sets, we're in situation where we
// have a permutation of 2 input vectors.		// have a permutation of 2 input vectors.
SmallVector<SmallPtrSet<const TreeEntry *, 4>> UsedTEs;		SmallVector<SmallPtrSet<const TreeEntry *, 4>> UsedTEs;
DenseMap<Value *, int> UsedValuesEntry;		DenseMap<Value *, int> UsedValuesEntry;
for (Value *V : TE->Scalars) {		for (Value *V : VL) {
if (isa<UndefValue>(V))		if (isConstant(V))
continue;		continue;
// Build a list of tree entries where V is used.		// Build a list of tree entries where V is used.
SmallPtrSet<const TreeEntry *, 4> VToTEs;		SmallPtrSet<const TreeEntry *, 4> VToTEs;
auto It = ValueToTEs.find(V);		auto It = ValueToTEs.find(V);
if (It != ValueToTEs.end())		if (It != ValueToTEs.end())
VToTEs = It->second;		VToTEs = It->second;
if (const TreeEntry *VTE = getTreeEntry(V))		if (const TreeEntry *VTE = getTreeEntry(V)) {
		Instruction &EntryUserInst = getLastInstructionInBundle(VTE);
		if (&EntryUserInst == &UserInst \|\| !CheckOrdering(&EntryUserInst))
		continue;
VToTEs.insert(VTE);		VToTEs.insert(VTE);
		}
if (VToTEs.empty())		if (VToTEs.empty())
return None;		continue;
if (UsedTEs.empty()) {		if (UsedTEs.empty()) {
// The first iteration, just insert the list of nodes to vector.		// The first iteration, just insert the list of nodes to vector.
UsedTEs.push_back(VToTEs);		UsedTEs.push_back(VToTEs);
		UsedValuesEntry.try_emplace(V, 0);
} else {		} else {
// Need to check if there are any previously used tree nodes which use V.		// Need to check if there are any previously used tree nodes which use V.
// If there are no such nodes, consider that we have another one input		// If there are no such nodes, consider that we have another one input
// vector.		// vector.
SmallPtrSet<const TreeEntry *, 4> SavedVToTEs(VToTEs);		SmallPtrSet<const TreeEntry *, 4> SavedVToTEs(VToTEs);
unsigned Idx = 0;		unsigned Idx = 0;
for (SmallPtrSet<const TreeEntry *, 4> &Set : UsedTEs) {		for (SmallPtrSet<const TreeEntry *, 4> &Set : UsedTEs) {
// Do we have a non-empty intersection of previously listed tree entries		// Do we have a non-empty intersection of previously listed tree entries
// and tree entries using current V?		// and tree entries using current V?
set_intersect(VToTEs, Set);		set_intersect(VToTEs, Set);
if (!VToTEs.empty()) {		if (!VToTEs.empty()) {
// Yes, write the new subset and continue analysis for the next		// Yes, write the new subset and continue analysis for the next
// scalar.		// scalar.
Set.swap(VToTEs);		Set.swap(VToTEs);
break;		break;
}		}
VToTEs = SavedVToTEs;		VToTEs = SavedVToTEs;
++Idx;		++Idx;
}		}
// No non-empty intersection found - need to add a second set of possible		// No non-empty intersection found - need to add a second set of possible
// source vectors.		// source vectors.
if (Idx == UsedTEs.size()) {		if (Idx == UsedTEs.size()) {
// If the number of input vectors is greater than 2 - not a permutation,		// If the number of input vectors is greater than 2 - not a permutation,
// fallback to the regular gather.		// fallback to the regular gather.
		// TODO: support multiple reshuffled nodes.
if (UsedTEs.size() == 2)		if (UsedTEs.size() == 2)
return None;		continue;
UsedTEs.push_back(SavedVToTEs);		UsedTEs.push_back(SavedVToTEs);
Idx = UsedTEs.size() - 1;		Idx = UsedTEs.size() - 1;
}		}
UsedValuesEntry.try_emplace(V, Idx);		UsedValuesEntry.try_emplace(V, Idx);
}		}
}		}

if (UsedTEs.empty()) {		if (UsedTEs.empty())
assert(all_of(TE->Scalars, UndefValue::classof) &&
"Expected vector of undefs only.");
return None;		return None;
}

unsigned VF = 0;		unsigned VF = 0;
if (UsedTEs.size() == 1) {		if (UsedTEs.size() == 1) {
		// Keep the order to avoid non-determinism.
		SmallVector<const TreeEntry *> FirstEntries(UsedTEs.front().begin(),
		UsedTEs.front().end());
		sort(FirstEntries, [](const TreeEntry TE1, const TreeEntry TE2) {
		return TE1->Idx < TE2->Idx;
		});
// Try to find the perfect match in another gather node at first.		// Try to find the perfect match in another gather node at first.
auto It = find_if(UsedTEs.front(), [TE](const TreeEntry *EntryPtr) {		auto It = find_if(FirstEntries, [VL, TE](const TreeEntry EntryPtr) {
return EntryPtr->isSame(TE->Scalars);		return EntryPtr->isSame(VL) \|\| EntryPtr->isSame(TE->Scalars);
});		});
if (It != UsedTEs.front().end()) {		if (It != FirstEntries.end()) {
Entries.push_back(*It);		Entries.push_back(*It);
std::iota(Mask.begin(), Mask.end(), 0);		std::iota(Mask.begin(), Mask.end(), 0);
		// Clear undef scalars.
		for (int I = 0, Sz = VL.size(); I < Sz; ++I)
		if (isa<UndefValue>(TE->Scalars[I]))
		Mask[I] = UndefMaskElem;
return TargetTransformInfo::SK_PermuteSingleSrc;		return TargetTransformInfo::SK_PermuteSingleSrc;
}		}
// No perfect match, just shuffle, so choose the first tree node.		// No perfect match, just shuffle, so choose the first tree node from the
Entries.push_back(*UsedTEs.front().begin());		// tree.
		Entries.push_back(FirstEntries.front());
} else {		} else {
// Try to find nodes with the same vector factor.		// Try to find nodes with the same vector factor.
assert(UsedTEs.size() == 2 && "Expected at max 2 permuted entries.");		assert(UsedTEs.size() == 2 && "Expected at max 2 permuted entries.");
		// Keep the order of tree nodes to avoid non-determinism.
DenseMap<int, const TreeEntry *> VFToTE;		DenseMap<int, const TreeEntry *> VFToTE;
for (const TreeEntry *TE : UsedTEs.front())		for (const TreeEntry *TE : UsedTEs.front()) {
VFToTE.try_emplace(TE->getVectorFactor(), TE);		unsigned VF = TE->getVectorFactor();
for (const TreeEntry *TE : UsedTEs.back()) {		auto It = VFToTE.find(VF);
		if (It != VFToTE.end()) {
		if (It->second->Idx > TE->Idx)
		It->getSecond() = TE;
		continue;
		}
		VFToTE.try_emplace(VF, TE);
		}
		// Same, keep the order to avoid non-determinism.
		SmallVector<const TreeEntry *> SecondEntries(UsedTEs.back().begin(),
		UsedTEs.back().end());
		sort(SecondEntries, [](const TreeEntry TE1, const TreeEntry TE2) {
		return TE1->Idx < TE2->Idx;
		});
		for (const TreeEntry *TE : SecondEntries) {
auto It = VFToTE.find(TE->getVectorFactor());		auto It = VFToTE.find(TE->getVectorFactor());
if (It != VFToTE.end()) {		if (It != VFToTE.end()) {
VF = It->first;		VF = It->first;
Entries.push_back(It->second);		Entries.push_back(It->second);
Entries.push_back(TE);		Entries.push_back(TE);
break;		break;
}		}
}		}
// No 2 source vectors with the same vector factor - give up and do regular		// No 2 source vectors with the same vector factor - give up and do regular
// gather.		// gather.
if (Entries.empty())		if (Entries.empty())
return None;		return None;
}		}

// Build a shuffle mask for better cost estimation and vector emission.		Value *SingleV = nullptr;
for (int I = 0, E = TE->Scalars.size(); I < E; ++I) {		bool IsSplat = all_of(VL, [&SingleV](Value *V) {
Value *V = TE->Scalars[I];		if (!isa<UndefValue>(V)) {
if (isa<UndefValue>(V))		if (!SingleV)
		SingleV = V;
		return SingleV == V;
		};
		return true;
		});
		// CHecks if the 2 PHIs are compatible in terms of high possibility to be
		// vectorized.
		auto AreCompatiblePHIs = [](Value V, Value V1) {
		auto *PHI = cast<PHINode>(V);
		auto *PHI1 = cast<PHINode>(V1);
		// Check that all incoming values are compatible/from same parent (if they
		// are instructions).
		for (int I = 0, E = PHI->getNumIncomingValues(); I < E; ++I) {
		Value *In = PHI->getIncomingValue(I);
		Value *In1 = PHI1->getIncomingValue(I);
		if (isConstant(In) && isConstant(In1))
continue;		continue;
unsigned Idx = UsedValuesEntry.lookup(V);		if (!getSameOpcode({In, In1}).getOpcode())
const TreeEntry *VTE = Entries[Idx];		return false;
int FoundLane = VTE->findLaneForValue(V);		if (cast<Instruction>(In)->getParent() !=
Mask[I] = Idx * VF + FoundLane;		cast<Instruction>(In1)->getParent())
// Extra check required by isSingleSourceMaskImpl function (called by		return false;
// ShuffleVectorInst::isSingleSourceMask).
if (Mask[I] >= 2 * E)
return None;
}		}
		return true;
		};
		auto MightBeIgnored = [=](Value *V) {
		auto *I = dyn_cast<Instruction>(V);
		SmallVector<Value *> IgnoredVals;
		if (UserIgnoreList)
		IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());
		return I && !IsSplat && !ScalarToTreeEntry.count(I) &&
		!isVectorLikeInstWithConstOps(I) &&
		!areAllUsersVectorized(I, IgnoredVals) && isSimple(I);
		};
		auto NeighborMightBeIgnored = [&](Value *V, int Idx) {
		Value *V1 = VL[Idx];
		bool UsedInSameVTE = false;
		auto It = UsedValuesEntry.find(V1);
		if (It != UsedValuesEntry.end())
		UsedInSameVTE = It->second == UsedValuesEntry.find(V)->second;
		return V != V1 && MightBeIgnored(V1) && !UsedInSameVTE &&
		getSameOpcode({V, V1}).getOpcode() &&
		cast<Instruction>(V)->getParent() ==
		cast<Instruction>(V1)->getParent() &&
		(!isa<PHINode>(V1) \|\| AreCompatiblePHIs(V, V1));
		};
		// Build a shuffle mask for better cost estimation and vector emission.
		SmallBitVector UsedIdxs(Entries.size());
		SmallVector<std::pair<unsigned, int>> EntryLanes;
		for (int I = 0, E = VL.size(); I < E; ++I) {
		Value *V = VL[I];
		auto It = UsedValuesEntry.find(V);
		if (It == UsedValuesEntry.end())
		continue;
		// Do not try to shuffle scalars, if they are constants, or instructions
		// that can be vectorized as a result of the following vector build
		// vectorization.
		if (isConstant(V) \|\| (MightBeIgnored(V) &&
		((I > 0 && NeighborMightBeIgnored(V, I - 1)) \|\|
		(I != E - 1 && NeighborMightBeIgnored(V, I + 1)))))
		continue;
		unsigned Idx = It->second;
		EntryLanes.emplace_back(Idx, I);
		UsedIdxs.set(Idx);
		}
		SmallVector<const TreeEntry *> TempEntries;
		for (unsigned I = 0, Sz = Entries.size(); I < Sz; ++I) {
		if (!UsedIdxs.test(I))
		continue;
		for (std::pair<unsigned, int> &Pair : EntryLanes)
		if (Pair.first == I)
		Pair.first = TempEntries.size();
		TempEntries.push_back(Entries[I]);
		}
		Entries.swap(TempEntries);
		for (const std::pair<unsigned, int> &Pair : EntryLanes)
		Mask[Pair.second] = Pair.first * VF +
		Entries[Pair.first]->findLaneForValue(VL[Pair.second]);
switch (Entries.size()) {		switch (Entries.size()) {
case 1:		case 1:
return TargetTransformInfo::SK_PermuteSingleSrc;		return TargetTransformInfo::SK_PermuteSingleSrc;
case 2:		case 2:
return TargetTransformInfo::SK_PermuteTwoSrc;		return TargetTransformInfo::SK_PermuteTwoSrc;
default:		default:
		Entries.clear();
break;		break;
}		}
return None;		return None;
}		}

InstructionCost BoUpSLP::getGatherCost(FixedVectorType *Ty,		InstructionCost BoUpSLP::getGatherCost(FixedVectorType *Ty,
const APInt &ShuffledIndices,		const APInt &ShuffledIndices,
bool NeedToShuffle) const {		bool NeedToShuffle) const {
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	if (IsPHI \|\| (E->State != TreeEntry::NeedToGather &&
// Set the insertion point after the last instruction in the bundle. Set the		// Set the insertion point after the last instruction in the bundle. Set the
// debug location to Front.		// debug location to Front.
Builder.SetInsertPoint(LastInst->getParent(),		Builder.SetInsertPoint(LastInst->getParent(),
std::next(LastInst->getIterator()));		std::next(LastInst->getIterator()));
}		}
Builder.SetCurrentDebugLocation(Front->getDebugLoc());		Builder.SetCurrentDebugLocation(Front->getDebugLoc());
}		}

Value BoUpSLP::gather(ArrayRef<Value > VL) {		Value BoUpSLP::gather(ArrayRef<Value > VL, Value *Root) {
// List of instructions/lanes from current block and/or the blocks which are		// List of instructions/lanes from current block and/or the blocks which are
// part of the current loop. These instructions will be inserted at the end to		// part of the current loop. These instructions will be inserted at the end to
// make it possible to optimize loops and hoist invariant instructions out of		// make it possible to optimize loops and hoist invariant instructions out of
// the loops body with better chances for success.		// the loops body with better chances for success.
SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;		SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;
SmallSet<int, 4> PostponedIndices;		SmallSet<int, 4> PostponedIndices;
Loop *L = LI->getLoopFor(Builder.GetInsertBlock());		Loop *L = LI->getLoopFor(Builder.GetInsertBlock());
auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {		auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {
Show All 23 Lines	if (TreeEntry *Entry = getTreeEntry(V)) {
unsigned FoundLane = Entry->findLaneForValue(V);		unsigned FoundLane = Entry->findLaneForValue(V);
ExternalUses.emplace_back(V, InsElt, FoundLane);		ExternalUses.emplace_back(V, InsElt, FoundLane);
}		}
return Vec;		return Vec;
};		};
Value *Val0 =		Value *Val0 =
isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];		isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];
FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());		FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());
Value *Vec = PoisonValue::get(VecTy);		Value *Vec = Root ? Root : PoisonValue::get(VecTy);
SmallVector<int> NonConsts;		SmallVector<int> NonConsts;
// Insert constant values at first.		// Insert constant values at first.
for (int I = 0, E = VL.size(); I < E; ++I) {		for (int I = 0, E = VL.size(); I < E; ++I) {
if (PostponedIndices.contains(I))		if (PostponedIndices.contains(I))
continue;		continue;
if (!isConstant(VL[I])) {		if (!isConstant(VL[I])) {
NonConsts.push_back(I);		NonConsts.push_back(I);
continue;		continue;
}		}
		if (Root && isa<UndefValue>(VL[I])) {
		if (isa<PoisonValue>(VL[I]))
		continue;
		if (auto *SV = dyn_cast<ShuffleVectorInst>(Root)) {
		if (SV->getMaskValue(I) == UndefMaskElem)
		continue;
		}
		}
Vec = CreateInsertElement(Vec, VL[I], I);		Vec = CreateInsertElement(Vec, VL[I], I);
}		}
// Insert non-constant values.		// Insert non-constant values.
for (int I : NonConsts)		for (int I : NonConsts)
Vec = CreateInsertElement(Vec, VL[I], I);		Vec = CreateInsertElement(Vec, VL[I], I);
// Append instructions, which are/may be part of the loop, in the end to make		// Append instructions, which are/may be part of the loop, in the end to make
// it possible to hoist non-loop-based instructions.		// it possible to hoist non-loop-based instructions.
for (const std::pair<Value *, unsigned> &Pair : PostponedInsts)		for (const std::pair<Value *, unsigned> &Pair : PostponedInsts)
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	public:

~ShuffleInstructionBuilder() {		~ShuffleInstructionBuilder() {
assert((IsFinalized \|\| Mask.empty()) &&		assert((IsFinalized \|\| Mask.empty()) &&
"Shuffle construction must be finalized.");		"Shuffle construction must be finalized.");
}		}
};		};
} // namespace		} // namespace

Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL) {		Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL, const EdgeInfo &EI) {
const unsigned VF = VL.size();		const unsigned VF = VL.size();
InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
// Special processing for GEPs bundle, which may include non-gep values.		// Special processing for GEPs bundle, which may include non-gep values.
if (!S.getOpcode() && VL.front()->getType()->isPointerTy()) {		if (!S.getOpcode() && VL.front()->getType()->isPointerTy()) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return isa<GetElementPtrInst>(V); });		find_if(VL, [](Value *V) { return isa<GetElementPtrInst>(V); });
if (It != VL.end())		if (It != VL.end())
S = getSameOpcode(*It);		S = getSameOpcode(*It);
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if (TreeEntry *E = getTreeEntry(S.OpValue))
GatherShuffleExtractSeq.insert(I);		GatherShuffleExtractSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
}		}
return V;		return V;
}		}
}		}

// Can't vectorize this, so simply build a new vector with each lane		// Find the corresponding gather entry and vectorize it.
// corresponding to the requested value.		// Allows to be more accurate with tree/graph transformations, checks for the
return createBuildVector(VL);		// correctness of the transformations in many cases.
}		auto *I =
Value BoUpSLP::createBuildVector(ArrayRef<Value > VL) {		find_if(VectorizableTree, [EI](const std::unique_ptr<TreeEntry> &TE) {
assert(any_of(VectorizableTree,		return TE->State == TreeEntry::NeedToGather &&
[VL](const std::unique_ptr<TreeEntry> &TE) {		TE->UserTreeIndices.front().EdgeIdx == EI.EdgeIdx &&
return TE->State == TreeEntry::NeedToGather && TE->isSame(VL);		TE->UserTreeIndices.front().UserTE == EI.UserTE;
}) &&		});
"Non-matching gather node.");		assert(I != VectorizableTree.end() && "Gather node is not in the graph.");
unsigned VF = VL.size();		assert(I->get()->UserTreeIndices.size() == 1 &&
// Exploit possible reuse of values across lanes.		"Expected only single user for the gather node.");
SmallVector<int> ReuseShuffleIndicies;		assert(I->get()->isSame(VL) && "Expected same list of scalars.");
SmallVector<Value *> UniqueValues;		return vectorizeTree(I->get());
if (VL.size() > 2) {
DenseMap<Value *, unsigned> UniquePositions;
unsigned NumValues =
std::distance(VL.begin(), find_if(reverse(VL), [](Value *V) {
return !isa<UndefValue>(V);
}).base());
VF = std::max<unsigned>(VF, PowerOf2Ceil(NumValues));
int UniqueVals = 0;
for (Value *V : VL.drop_back(VL.size() - VF)) {
if (isa<UndefValue>(V)) {
ReuseShuffleIndicies.emplace_back(UndefMaskElem);
continue;
}		}
if (isConstant(V)) {
ReuseShuffleIndicies.emplace_back(UniqueValues.size());		namespace {
UniqueValues.emplace_back(V);		/// Merges shuffle masks and emits final shuffle instruction, if required, for
continue;		/// gathered nodes. This is similar to ShuffleInstructionBuilder but supports
		/// shuffling of 2 input vectors. It implements lazy shuffles emission, when the
		/// actual shuffle instruction is generated only this is actually required.
		/// Otherwise, the shuffle instruction emission is delayed till the end of the
		/// process, to reduce the number of emitted instructions and further
		/// analysis/transformations.
		/// TODO: Investigate if these 2 classes might be merged.
		class ShuffleGatherBuilder {
		bool IsFinalized = false;
		SmallVector<int> CommonMask;
		SmallVector<Value *, 2> InVectors;
		function_ref<Value (Value , Value *, ArrayRef<int>)> CreateShuffle;

		public:
		ShuffleGatherBuilder(
		function_ref<Value (Value , Value *, ArrayRef<int>)> CreateShuffle)
		: CreateShuffle(CreateShuffle) {}
		/// Adds 2 input vectors and the mask for their shuffling.
		void add(Value V1, Value V2, ArrayRef<int> Mask) {
		assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");
		if (InVectors.empty()) {
		InVectors.push_back(V1);
		InVectors.push_back(V2);
		CommonMask.assign(Mask.begin(), Mask.end());
		return;
		}
		Value *Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Vec = CreateShuffle(Vec, InVectors.back(), CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		} else if (cast<FixedVectorType>(Vec->getType())->getNumElements() !=
		Mask.size()) {
		Vec = CreateShuffle(Vec, nullptr, CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		}
		V1 = CreateShuffle(V1, V2, Mask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx + Sz;
		InVectors.front() = Vec;
		if (InVectors.size() == 2)
		InVectors.back() = V1;
		else
		InVectors.push_back(V1);
}		}
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		/// Adds another one input vector and the mask for the shuffling.
ReuseShuffleIndicies.emplace_back(Res.first->second);		void add(Value *V1, ArrayRef<int> Mask) {
if (Res.second) {		if (InVectors.empty()) {
UniqueValues.emplace_back(V);		if (!isa<FixedVectorType>(V1->getType())) {
++UniqueVals;		V1 = CreateShuffle(V1, nullptr, CommonMask);
		CommonMask.assign(Mask.size(), UndefMaskElem);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		return;
		}
		InVectors.push_back(V1);
		CommonMask.assign(Mask.begin(), Mask.end());
		return;
		}
		const auto *It = find(InVectors, V1);
		if (It == InVectors.end()) {
		if (InVectors.size() == 2 \|\|
		InVectors.front()->getType() != V1->getType() \|\|
		!isa<FixedVectorType>(V1->getType())) {
		Value *V = InVectors.front();
		if (InVectors.size() == 2) {
		V = CreateShuffle(InVectors.front(), InVectors.back(), CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		} else if (cast<FixedVectorType>(V->getType())->getNumElements() !=
		CommonMask.size()) {
		V = CreateShuffle(InVectors.front(), nullptr, CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] == UndefMaskElem && Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] =
		V->getType() != V1->getType()
		? Idx + Sz
		: Mask[Idx] + cast<FixedVectorType>(V1->getType())
		->getNumElements();
		if (V->getType() != V1->getType())
		V1 = CreateShuffle(V1, nullptr, Mask);
		InVectors.front() = V;
		if (InVectors.size() == 2)
		InVectors.back() = V1;
		else
		InVectors.push_back(V1);
		return;
}		}
		// Check if second vector is required if the used elements are already
		// used from the first one.
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem && CommonMask[Idx] == UndefMaskElem) {
		InVectors.push_back(V1);
		break;
}		}
if (UniqueVals == 1 && UniqueValues.size() == 1) {
// Emit pure splat vector.
ReuseShuffleIndicies.append(VF - ReuseShuffleIndicies.size(),
UndefMaskElem);
} else if (UniqueValues.size() >= VF - 1 \|\| UniqueValues.size() <= 1) {
if (UniqueValues.empty()) {
assert(all_of(VL, UndefValue::classof) && "Expected list of undefs.");
NumValues = VF;
}		}
ReuseShuffleIndicies.clear();		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
UniqueValues.clear();		if (Mask[Idx] != UndefMaskElem && CommonMask[Idx] == UndefMaskElem)
UniqueValues.append(VL.begin(), std::next(VL.begin(), NumValues));		CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : Sz);
}		}
UniqueValues.append(VF - UniqueValues.size(),		/// Finalize emission of the shuffles.
PoisonValue::get(VL[0]->getType()));		Value *
VL = UniqueValues;		finalize(ArrayRef<int> ExtMask,
		function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {
		IsFinalized = true;
		if (Action) {
		Value *Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Vec = CreateShuffle(Vec, InVectors.back(), CommonMask);
		InVectors.pop_back();
		} else {
		Vec = CreateShuffle(Vec, nullptr, CommonMask);
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		Action(Vec, CommonMask);
		InVectors.front() = Vec;
}		}
		if (!ExtMask.empty()) {
ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleExtractSeq,		SmallVector<int> NewMask(ExtMask.size(), UndefMaskElem);
CSEBlocks);		for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {
Value *Vec = gather(VL);		if (ExtMask[I] == UndefMaskElem)
if (!ReuseShuffleIndicies.empty()) {		continue;
ShuffleBuilder.addMask(ReuseShuffleIndicies);		NewMask[I] = CommonMask[ExtMask[I]];
Vec = ShuffleBuilder.finalize(Vec);
}		}
return Vec;		CommonMask.swap(NewMask);
		}
		if (InVectors.size() == 2)
		return CreateShuffle(InVectors.front(), InVectors.back(), CommonMask);
		return CreateShuffle(InVectors.front(), nullptr, CommonMask);
}		}

		~ShuffleGatherBuilder() {
		assert((IsFinalized \|\| CommonMask.empty()) &&
		"Shuffle construction must be finalized.");
		}
		};
		} // namespace

Value BoUpSLP::vectorizeTree(TreeEntry E) {		Value BoUpSLP::vectorizeTree(TreeEntry E) {
IRBuilder<>::InsertPointGuard Guard(Builder);		IRBuilder<>::InsertPointGuard Guard(Builder);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
unsigned VF = E->getVectorFactor();		unsigned VF = E->getVectorFactor();
ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleExtractSeq,		ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleExtractSeq,
CSEBlocks);		CSEBlocks);
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (E->getMainOp())		// Can set insert point safely on for the initial gather node.
		if (E == VectorizableTree.front().get() && E->getMainOp())
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
Value *Vec;		SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),
		E->ReuseShuffleIndices.end());
		SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());
		// Checks if the mask is an identity mask.
		auto IsIdentityMask = [](ArrayRef<int> Mask, FixedVectorType *VecTy) {
		int Limit = Mask.size();
		return VecTy->getNumElements() == Mask.size() &&
		all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) &&
		ShuffleVectorInst::isIdentityMask(Mask);
		};
		// Tries to combine 2 different masks into single one.
		auto CombineMasks = [](SmallVectorImpl<int> &Mask, ArrayRef<int> ExtMask) {
		SmallVector<int> NewMask(ExtMask.size(), UndefMaskElem);
		for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {
		if (ExtMask[I] == UndefMaskElem)
		continue;
		NewMask[I] = Mask[ExtMask[I]];
		}
		Mask.swap(NewMask);
		};
		// Smart shuffle instruction emission, walks through shuffles trees and
		// tries to find the best matching vector for the actual shuffle
		// instruction.
		auto CreateShuffle = [&](Value V1, Value V2,
		ArrayRef<int> Mask) -> Value * {
		assert(V1 && "Expected at least one vector value.");
		SmallVector<int> V2Mask(Mask.size(), UndefMaskElem);
		// Mask elements from V2 vector and unmask all others.
		for (int I = 0, VF = Mask.size(); I < VF; ++I)
		if (Mask[I] >= VF)
		V2Mask[I] = UndefMaskElem;
		else if (Mask[I] == UndefMaskElem)
		V2Mask[I] = I;
		else
		V2Mask[I] = Mask[I];
		if (V2 && !isUndefVector(V2, V2Mask).all()) {
		Value *Vec = Builder.CreateShuffleVector(V1, V2, Mask);
		if (auto *I = dyn_cast<Instruction>(Vec)) {
		GatherShuffleExtractSeq.insert(I);
		CSEBlocks.insert(I->getParent());
		}
		return Vec;
		}
		if (isa<PoisonValue>(V1))
		return PoisonValue::get(FixedVectorType::get(
		cast<VectorType>(V1->getType())->getElementType(), Mask.size()));
		Value *Op = V1;
		SmallVector<int> CombinedMask(Mask.begin(), Mask.end());
		while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
		// Exit if not a fixed vector type or changing size shuffle.
		if (!isa<FixedVectorType>(SV->getType()))
		break;
		// Exit if the identity or broadcast mask is found.
		if (IsIdentityMask(CombinedMask, cast<FixedVectorType>(SV->getType())))
		break;
		SmallVector<int> OpMask(CombinedMask.size(), UndefMaskElem);
		// Mask elements from the required vector and unmask all others.
		for (int I = 0, VF = CombinedMask.size(); I < VF; ++I)
		if (CombinedMask[I] == UndefMaskElem)
		OpMask[I] = I;
		else if (CombinedMask[I] < VF)
		OpMask[I] = UndefMaskElem;
		else
		OpMask[I] = CombinedMask[I] - VF;
		bool IsOp1Undef = isUndefVector(SV->getOperand(0), OpMask).all();
		for (int I = 0, VF = CombinedMask.size(); I < VF; ++I)
		if (CombinedMask[I] == UndefMaskElem)
		OpMask[I] = I;
		else if (CombinedMask[I] >= VF)
		OpMask[I] = UndefMaskElem;
		else
		OpMask[I] = CombinedMask[I];
		bool IsOp2Undef = isUndefVector(SV->getOperand(1), OpMask).all();
		if (!IsOp1Undef && !IsOp2Undef)
		break;
		SmallVector<int> ShuffleMask(SV->getShuffleMask().begin(),
		SV->getShuffleMask().end());
		CombineMasks(ShuffleMask, CombinedMask);
		CombinedMask.swap(ShuffleMask);
		if (IsOp2Undef)
		Op = SV->getOperand(0);
		else
		Op = SV->getOperand(1);
		}
		if (!isa<FixedVectorType>(Op->getType()) \|\|
		!IsIdentityMask(CombinedMask, cast<FixedVectorType>(Op->getType()))) {
		Value *Vec = Builder.CreateShuffleVector(Op, CombinedMask);
		if (auto *I = dyn_cast<Instruction>(Vec)) {
		GatherShuffleExtractSeq.insert(I);
		CSEBlocks.insert(I->getParent());
		}
		return Vec;
		}
		return Op;
		};
		ShuffleGatherBuilder GatherBuilder(CreateShuffle);
		Value *Vec = nullptr;
SmallVector<int> Mask;		SmallVector<int> Mask;
		SmallVector<int> ExtractMask;
		Optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;
		Optional<TargetTransformInfo::ShuffleKind> GatherShuffle;
SmallVector<const TreeEntry *> Entries;		SmallVector<const TreeEntry *> Entries;
Optional<TargetTransformInfo::ShuffleKind> Shuffle =		Type *ScalarTy = GatheredScalars.front()->getType();
isGatherShuffledEntry(E, Mask, Entries);		if (!all_of(GatheredScalars, UndefValue::classof)) {
if (Shuffle) {		// Check for gathered extracts.
		ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);
		SmallVector<Value *> IgnoredVals;
		if (UserIgnoreList)
		IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());
		// Need to remove vectorized extracelement instructions.
		for (int I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
		int Idx = ExtractMask[I];
		if (Idx == UndefMaskElem)
		continue;
		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
		// If all users are vectorized - can delete the extractelement itself.
		if (!areAllUsersVectorized(EI, IgnoredVals))
		continue;
		eraseInstruction(EI);
		}
		// Gather extracts after we check for full matched gathers only.
		if (!(E->getOpcode() == Instruction::Load && !E->isAltShuffle() &&
		!all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) &&
		!isSplat(E->Scalars) &&
		(E->Scalars == GatheredScalars \|\| GatheredScalars.size() > 2)))
		GatherShuffle =
		isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);
		if (GatherShuffle) {
		if (any_of(Entries,
		[](const TreeEntry *TE) { return !TE->VectorizedValue; })) {
		PostponedGathers.insert(E);
		// Postpone gather emission, will be emitted after the end of the
		// process to keep correct order.
		auto *VecTy = FixedVectorType::get(ScalarTy, VF);
		Value *Vec = Builder.CreateAlignedLoad(
		VecTy, PoisonValue::get(VecTy->getPointerTo()), MaybeAlign());
		nlopesUnsubmitted Not Done Reply Inline Actions Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you! nlopes: Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, thanks! ABataev: Sure, thanks!
		E->VectorizedValue = Vec;
		return Vec;
		}
assert((Entries.size() == 1 \|\| Entries.size() == 2) &&		assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");		"Expected shuffle of 1 or 2 entries.");
Vec = Builder.CreateShuffleVector(Entries.front()->VectorizedValue,		// Remove shuffled elements from list of gathers.
		for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
		if (Mask[I] != UndefMaskElem)
		GatheredScalars[I] = PoisonValue::get(ScalarTy);
		}
		if (Entries.size() == 1)
		GatherBuilder.add(Entries.front()->VectorizedValue, Mask);
		else
		GatherBuilder.add(Entries.front()->VectorizedValue,
Entries.back()->VectorizedValue, Mask);		Entries.back()->VectorizedValue, Mask);
if (auto *I = dyn_cast<Instruction>(Vec)) {		} else {
GatherShuffleExtractSeq.insert(I);		// Check that every instruction appears once in this bundle.
CSEBlocks.insert(I->getParent());		SmallVector<Value *> UniqueValues;
		if (GatheredScalars.size() > 2) {
		DenseMap<Value *, unsigned> UniquePositions;
		int UniqueVals = 0;
		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
		Value *V = GatheredScalars[I];
		if (isa<UndefValue>(V)) {
		if (!NeedToShuffleReuses)
		ReuseShuffleIndicies.emplace_back(UndefMaskElem);
		UniqueValues.emplace_back(V);
		continue;
}		}
		if (isConstant(V)) {
		if (!NeedToShuffleReuses)
		ReuseShuffleIndicies.emplace_back(UniqueValues.size());
		UniqueValues.emplace_back(V);
		continue;
		}
		auto Res = UniquePositions.try_emplace(V, UniqueValues.size());
		if (!NeedToShuffleReuses) {
		ReuseShuffleIndicies.emplace_back(Res.first->second);
} else {		} else {
Vec = gather(E->Scalars);		for (unsigned Idx = 0; Idx < VF; ++Idx)
		if (ReuseShuffleIndicies[Idx] == I)
		ReuseShuffleIndicies[Idx] = Res.first->second;
}		}
if (NeedToShuffleReuses) {		if (Res.second) {
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		UniqueValues.emplace_back(V);
Vec = ShuffleBuilder.finalize(Vec);		++UniqueVals;
		}
		}
		if (!NeedToShuffleReuses) {
		if (UniqueVals == 1 && UniqueValues.size() == 1) {
		// Emit pure splat vector.
		ReuseShuffleIndicies.append(VF - ReuseShuffleIndicies.size(),
		UndefMaskElem);
		} else if (UniqueValues.size() >= VF - 1 \|\|
		UniqueValues.size() <= 1) {
		ReuseShuffleIndicies.clear();
		UniqueValues.swap(GatheredScalars);
		}
		}
		UniqueValues.append(VF - UniqueValues.size(),
		PoisonValue::get(ScalarTy));
		GatheredScalars.swap(UniqueValues);
		}
		}
		}
		// Combine generated extracts mask and reused scalars masks and
		// corresponding input vectors.
		if (ExtractShuffle) {
		// Gather of extractelements can be represented as just a shuffle of
		// a single/two vectors the scalars are extracted from.
		// Find input vectors.
		Value *Vec1 = nullptr;
		Value *Vec2 = nullptr;
		for (unsigned I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
		if (ExtractMask[I] == UndefMaskElem \|\|
		(!Mask.empty() && Mask[I] != UndefMaskElem)) {
		ExtractMask[I] = UndefMaskElem;
		continue;
		}
		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
		if (!Vec1) {
		Vec1 = EI->getVectorOperand();
		} else if (Vec1 != EI->getVectorOperand()) {
		assert((!Vec2 \|\| Vec2 == EI->getVectorOperand()) &&
		"Expected only 1 or 2 vectors shuffle.");
		Vec2 = EI->getVectorOperand();
		}
		}
		if (Vec2) {
		GatherBuilder.add(Vec1, Vec2, ExtractMask);
		} else if (Vec1) {
		GatherBuilder.add(Vec1, ExtractMask);
		} else {
		GatherBuilder.add(PoisonValue::get(FixedVectorType::get(
		ScalarTy, GatheredScalars.size())),
		ExtractMask);
		}
		}
		if (ExtractShuffle \|\| GatherShuffle) {
		// Insert non-constant scalars.
		SmallVector<Value *> NonConstants(GatheredScalars);
		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
		if (!isa<Constant>(GatheredScalars[I]))
		GatheredScalars[I] = PoisonValue::get(ScalarTy);
		else
		NonConstants[I] = PoisonValue::get(ScalarTy);
		}
		// Generate constants for final shuffle.
		if (!all_of(GatheredScalars, UndefValue::classof)) {
		Mask.assign(GatheredScalars.size(), UndefMaskElem);
		Value *VecVal = gather(GatheredScalars);
		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
		if (!isa<UndefValue>(GatheredScalars[I]))
		Mask[I] = I;
		}
		GatherBuilder.add(VecVal, Mask);
		}
		// Emit final insertelement instructions for defined values.
		if (!all_of(NonConstants, Constant::classof))
		Vec = GatherBuilder.finalize(
		ReuseShuffleIndicies, [&](Value *&Vec, SmallVectorImpl<int> &Mask) {
		Vec = gather(NonConstants, Vec);
		for (unsigned I = 0, Sz = Mask.size(); I < Sz; ++I)
		if (!isa<Constant>(NonConstants[I]))
		Mask[I] = I;
		});
		else
		Vec = GatherBuilder.finalize(ReuseShuffleIndicies);
		} else {
		// Just generate simple gather, no reused scalars/extracts.
		Vec = gather(GatheredScalars);
		Mask.assign(GatheredScalars.size(), UndefMaskElem);
		for (unsigned Idx = 0, Sz = GatheredScalars.size(); Idx < Sz; ++Idx)
		if (!isa<UndefValue>(GatheredScalars[Idx]))
		Mask[Idx] = Idx;
		GatherBuilder.add(Vec, Mask);
		Vec = GatherBuilder.finalize(ReuseShuffleIndicies);
}		}
E->VectorizedValue = Vec;		E->VectorizedValue = Vec;
return Vec;		return Vec;
}		}

assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
Show All 31 Lines	case Instruction::PHI: {
// PHINodes may have multiple entries from the same block. We want to		// PHINodes may have multiple entries from the same block. We want to
// visit every block once.		// visit every block once.
SmallPtrSet<BasicBlock*, 4> VisitedBBs;		SmallPtrSet<BasicBlock*, 4> VisitedBBs;

for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
BasicBlock *IBB = PH->getIncomingBlock(i);		BasicBlock *IBB = PH->getIncomingBlock(i);

		// Stop emission if all incoming values are generated.
		if (NewPhi->getNumIncomingValues() == PH->getNumIncomingValues()) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return V;
		}

if (!VisitedBBs.insert(IBB).second) {		if (!VisitedBBs.insert(IBB).second) {
NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);		NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);
continue;		continue;
}		}

Builder.SetInsertPoint(IBB->getTerminator());		Builder.SetInsertPoint(IBB->getTerminator());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
Value *Vec = vectorizeTree(E->getOperand(i));		Value *Vec = vectorizeTree(E->getOperand(i), EdgeInfo(E, i));
NewPhi->addIncoming(Vec, IBB);		NewPhi->addIncoming(Vec, IBB);
}		}

assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&		assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&
"Invalid number of incoming values");		"Invalid number of incoming values");
return V;		return V;
}		}

Show All 17 Lines	case Instruction::ExtractValue: {
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
NewV = ShuffleBuilder.finalize(NewV);		NewV = ShuffleBuilder.finalize(NewV);
E->VectorizedValue = NewV;		E->VectorizedValue = NewV;
return NewV;		return NewV;
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
assert(E->ReuseShuffleIndices.empty() && "All inserts should be unique");		assert(E->ReuseShuffleIndices.empty() && "All inserts should be unique");
Builder.SetInsertPoint(cast<Instruction>(E->Scalars.back()));		Builder.SetInsertPoint(cast<Instruction>(E->Scalars.back()));
Value *V = vectorizeTree(E->getOperand(1));		Value *V = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));

// Create InsertVector shuffle if necessary		// Create InsertVector shuffle if necessary
auto FirstInsert = cast<Instruction>(find_if(E->Scalars, [E](Value *V) {		auto FirstInsert = cast<Instruction>(find_if(E->Scalars, [E](Value *V) {
return !is_contained(E->Scalars, cast<Instruction>(V)->getOperand(0));		return !is_contained(E->Scalars, cast<Instruction>(V)->getOperand(0));
}));		}));
const unsigned NumElts =		const unsigned NumElts =
cast<FixedVectorType>(FirstInsert->getType())->getNumElements();		cast<FixedVectorType>(FirstInsert->getType())->getNumElements();
const unsigned NumScalars = E->Scalars.size();		const unsigned NumScalars = E->Scalars.size();
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	switch (ShuffleOrOp) {
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *InVec = vectorizeTree(E->getOperand(0));		Value *InVec = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

auto *CI = cast<CastInst>(VL0);		auto *CI = cast<CastInst>(VL0);
Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);		Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp: {		case Instruction::ICmp: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *L = vectorizeTree(E->getOperand(0));		Value *L = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
Value *R = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *R = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Value *V = Builder.CreateCmp(P0, L, R);		Value *V = Builder.CreateCmp(P0, L, R);
propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, E->Scalars, VL0);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Select: {		case Instruction::Select: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Cond = vectorizeTree(E->getOperand(0));		Value *Cond = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
Value *True = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
Value *False = vectorizeTree(E->getOperand(2));		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *True = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));
		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *False = vectorizeTree(E->getOperand(2), EdgeInfo(E, 2));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateSelect(Cond, True, False);		Value *V = Builder.CreateSelect(Cond, True, False);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op = vectorizeTree(E->getOperand(0));		Value *Op = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateUnOp(		Value *V = Builder.CreateUnOp(
static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);		static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);
Show All 25 Lines	switch (ShuffleOrOp) {
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *LHS = vectorizeTree(E->getOperand(0));		Value *LHS = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
Value *RHS = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *RHS = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateBinOp(		Value *V = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,		static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,
Show All 30 Lines	case Instruction::Load: {
if (TreeEntry *Entry = getTreeEntry(PO)) {		if (TreeEntry *Entry = getTreeEntry(PO)) {
// Find which lane we need to extract.		// Find which lane we need to extract.
unsigned FoundLane = Entry->findLaneForValue(PO);		unsigned FoundLane = Entry->findLaneForValue(PO);
ExternalUses.emplace_back(		ExternalUses.emplace_back(
PO, PO != VecPtr ? cast<User>(VecPtr) : NewLI, FoundLane);		PO, PO != VecPtr ? cast<User>(VecPtr) : NewLI, FoundLane);
}		}
} else {		} else {
assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");		assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");
Value *VecPtr = vectorizeTree(E->getOperand(0));		Value *VecPtr = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
// Use the minimum alignment of the gathered loads.		// Use the minimum alignment of the gathered loads.
Align CommonAlignment = LI->getAlign();		Align CommonAlignment = LI->getAlign();
for (Value *V : E->Scalars)		for (Value *V : E->Scalars)
CommonAlignment =		CommonAlignment =
std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());		std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());
NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);		NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);
}		}
Value *V = propagateMetadata(NewLI, E->Scalars);		Value *V = propagateMetadata(NewLI, E->Scalars);

ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Store: {		case Instruction::Store: {
auto *SI = cast<StoreInst>(VL0);		auto *SI = cast<StoreInst>(VL0);
unsigned AS = SI->getPointerAddressSpace();		unsigned AS = SI->getPointerAddressSpace();

setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *VecValue = vectorizeTree(E->getOperand(0));		Value *VecValue = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
ShuffleBuilder.addMask(E->ReorderIndices);		ShuffleBuilder.addMask(E->ReorderIndices);
VecValue = ShuffleBuilder.finalize(VecValue);		VecValue = ShuffleBuilder.finalize(VecValue);

Value *ScalarPtr = SI->getPointerOperand();		Value *ScalarPtr = SI->getPointerOperand();
Value *VecPtr = Builder.CreateBitCast(		Value *VecPtr = Builder.CreateBitCast(
ScalarPtr, VecValue->getType()->getPointerTo(AS));		ScalarPtr, VecValue->getType()->getPointerTo(AS));
StoreInst *ST =		StoreInst *ST =
Builder.CreateAlignedStore(VecValue, VecPtr, SI->getAlign());		Builder.CreateAlignedStore(VecValue, VecPtr, SI->getAlign());
Show All 14 Lines	case Instruction::Store: {
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
auto *GEP0 = cast<GetElementPtrInst>(VL0);		auto *GEP0 = cast<GetElementPtrInst>(VL0);
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op0 = vectorizeTree(E->getOperand(0));		Value *Op0 = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));

SmallVector<Value *> OpVecs;		SmallVector<Value *> OpVecs;
for (int J = 1, N = GEP0->getNumOperands(); J < N; ++J) {		for (int J = 1, N = GEP0->getNumOperands(); J < N; ++J) {
Value *OpVec = vectorizeTree(E->getOperand(J));		Value *OpVec = vectorizeTree(E->getOperand(J), EdgeInfo(E, J));
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Value *V = Builder.CreateGEP(GEP0->getSourceElementType(), Op0, OpVecs);		Value *V = Builder.CreateGEP(GEP0->getSourceElementType(), Op0, OpVecs);
if (Instruction *I = dyn_cast<GetElementPtrInst>(V)) {		if (Instruction *I = dyn_cast<GetElementPtrInst>(V)) {
SmallVector<Value *> GEPs;		SmallVector<Value *> GEPs;
for (Value *V : E->Scalars) {		for (Value *V : E->Scalars) {
if (isa<GetElementPtrInst>(V))		if (isa<GetElementPtrInst>(V))
Show All 37 Lines	case Instruction::Call: {
CallInst *CEI = cast<CallInst>(VL0);		CallInst *CEI = cast<CallInst>(VL0);
ScalarArg = CEI->getArgOperand(j);		ScalarArg = CEI->getArgOperand(j);
OpVecs.push_back(CEI->getArgOperand(j));		OpVecs.push_back(CEI->getArgOperand(j));
if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))		if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))
TysForDecl.push_back(ScalarArg->getType());		TysForDecl.push_back(ScalarArg->getType());
continue;		continue;
}		}

Value *OpVec = vectorizeTree(E->getOperand(j));		Value *OpVec = vectorizeTree(E->getOperand(j), EdgeInfo(E, j));
LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");		LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))		if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))
TysForDecl.push_back(OpVec->getType());		TysForDecl.push_back(OpVec->getType());
}		}

Function *CF;		Function *CF;
if (!UseIntrinsic) {		if (!UseIntrinsic) {
Show All 38 Lines	case Instruction::ShuffleVector: {
(Instruction::isCast(E->getOpcode()) &&		(Instruction::isCast(E->getOpcode()) &&
Instruction::isCast(E->getAltOpcode())) \|\|		Instruction::isCast(E->getAltOpcode())) \|\|
(isa<CmpInst>(VL0) && isa<CmpInst>(E->getAltOp()))) &&		(isa<CmpInst>(VL0) && isa<CmpInst>(E->getAltOp()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");

Value LHS = nullptr, RHS = nullptr;		Value LHS = nullptr, RHS = nullptr;
if (Instruction::isBinaryOp(E->getOpcode()) \|\| isa<CmpInst>(VL0)) {		if (Instruction::isBinaryOp(E->getOpcode()) \|\| isa<CmpInst>(VL0)) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
RHS = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		RHS = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));
} else {		} else {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
}		}

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value V0, V1;		Value V0, V1;
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {		BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {
// All blocks must be scheduled before any instructions are inserted.		// All blocks must be scheduled before any instructions are inserted.
for (auto &BSIter : BlocksSchedules) {		for (auto &BSIter : BlocksSchedules) {
scheduleBlock(BSIter.second.get());		scheduleBlock(BSIter.second.get());
}		}

Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
auto *VectorRoot = vectorizeTree(VectorizableTree[0].get());		auto *VectorRoot = vectorizeTree(VectorizableTree[0].get());
		// Run through the list of postponed gathers and emit them, replacing the temp
		// emitted allocas with actual vector instructions.
		ArrayRef<const TreeEntry *> PostponedNodes = PostponedGathers.getArrayRef();
		for (const TreeEntry *E : PostponedNodes) {
		auto TE = const_cast<TreeEntry >(E);
		if (auto *VecTE = getTreeEntry(TE->Scalars.front()))
		if (VecTE->isSame(TE->UserTreeIndices.front().UserTE->getOperand(
		TE->UserTreeIndices.front().EdgeIdx)))
		// Found gather node which is absolutely the same as one of the
		// vectorized nodes. It may happen after reordering.
		continue;
		auto *PrevVec = cast<Instruction>(TE->VectorizedValue);
		TE->VectorizedValue = nullptr;
		auto *UserI =
		cast<Instruction>(TE->UserTreeIndices.front().UserTE->VectorizedValue);
		Builder.SetInsertPoint(PrevVec);
		Builder.SetCurrentDebugLocation(UserI->getDebugLoc());
		Value *Vec = vectorizeTree(TE);
		PrevVec->replaceAllUsesWith(Vec);
		eraseInstruction(PrevVec);
		}

// If the vectorized tree can be rewritten in a smaller type, we truncate the		// If the vectorized tree can be rewritten in a smaller type, we truncate the
// vectorized root. InstCombine will then rewrite the entire expression. We		// vectorized root. InstCombine will then rewrite the entire expression. We
// sign extend the extracted values below.		// sign extend the extracted values below.
auto *ScalarRoot = VectorizableTree[0]->Scalars[0];		auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
if (auto *I = dyn_cast<Instruction>(VectorRoot)) {		if (auto *I = dyn_cast<Instruction>(VectorRoot)) {
// If current instr is a phi and not the last phi, insert it after the		// If current instr is a phi and not the last phi, insert it after the
Show All 11 Lines	BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {
}		}

LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()		LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()
<< " values .\n");		<< " values .\n");

SmallVector<ShuffledInsertData> ShuffledInserts;		SmallVector<ShuffledInsertData> ShuffledInserts;
// Maps vector instruction to original insertelement instruction		// Maps vector instruction to original insertelement instruction
DenseMap<Value , InsertElementInst > VectorToInsertElement;		DenseMap<Value , InsertElementInst > VectorToInsertElement;
		// Maps extract Scalar to the corresponding extractelement instruction in the
		// basic block. Only one extractelement per block should be emitted.
		DenseMap<Value , DenseMap<BasicBlock , Value *>> ScalarToEEs;
// Extract all of the elements with the external uses.		// Extract all of the elements with the external uses.
for (const auto &ExternalUse : ExternalUses) {		for (const auto &ExternalUse : ExternalUses) {
Value *Scalar = ExternalUse.Scalar;		Value *Scalar = ExternalUse.Scalar;
llvm::User *User = ExternalUse.User;		llvm::User *User = ExternalUse.User;

// Skip users that we already RAUW. This happens when one instruction		// Skip users that we already RAUW. This happens when one instruction
// has multiple uses of the same value.		// has multiple uses of the same value.
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
continue;		continue;
TreeEntry *E = getTreeEntry(Scalar);		TreeEntry *E = getTreeEntry(Scalar);
assert(E && "Invalid scalar");		assert(E && "Invalid scalar");
assert(E->State != TreeEntry::NeedToGather &&		assert(E->State != TreeEntry::NeedToGather &&
"Extracting from a gather list");		"Extracting from a gather list");
// Non-instruction pointers are not deleted, just skip them.		// Non-instruction pointers are not deleted, just skip them.
if (E->getOpcode() == Instruction::GetElementPtr &&		if (E->getOpcode() == Instruction::GetElementPtr &&
!isa<GetElementPtrInst>(Scalar))		!isa<GetElementPtrInst>(Scalar))
continue;		continue;

Value *Vec = E->VectorizedValue;		Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");		assert(Vec && "Can't find vectorizable value");

Value *Lane = Builder.getInt32(ExternalUse.Lane);		Value *Lane = Builder.getInt32(ExternalUse.Lane);
auto ExtractAndExtendIfNeeded = [&](Value *Vec) {		auto ExtractAndExtendIfNeeded = [&](Value *Vec) {
if (Scalar->getType() != Vec->getType()) {		if (Scalar->getType() != Vec->getType()) {
Value *Ex;		Value *Ex = nullptr;
		auto It = ScalarToEEs.find(Scalar);
		if (It != ScalarToEEs.end()) {
		// No need to emit many extracts, just move the only one in the
		// current block.
		auto EEIt = It->second.find(Builder.GetInsertBlock());
		if (EEIt != It->second.end()) {
		auto *I = cast<Instruction>(EEIt->second);
		if (Builder.GetInsertPoint() != Builder.GetInsertBlock()->end() &&
		Builder.GetInsertPoint()->comesBefore(I))
		I->moveBefore(&*Builder.GetInsertPoint());
		Ex = I;
		}
		}
		if (!Ex) {
// "Reuse" the existing extract to improve final codegen.		// "Reuse" the existing extract to improve final codegen.
if (auto *ES = dyn_cast<ExtractElementInst>(Scalar)) {		if (auto *ES = dyn_cast<ExtractElementInst>(Scalar)) {
Ex = Builder.CreateExtractElement(ES->getOperand(0),		Ex = Builder.CreateExtractElement(ES->getOperand(0),
ES->getOperand(1));		ES->getOperand(1));
} else {		} else {
Ex = Builder.CreateExtractElement(Vec, Lane);		Ex = Builder.CreateExtractElement(Vec, Lane);
}		}
		ScalarToEEs[Scalar].try_emplace(Builder.GetInsertBlock(), Ex);
		}
// The then branch of the previous if may produce constants, since 0		// The then branch of the previous if may produce constants, since 0
// operand might be a constant.		// operand might be a constant.
if (auto *ExI = dyn_cast<Instruction>(Ex)) {		if (auto *ExI = dyn_cast<Instruction>(Ex)) {
GatherShuffleExtractSeq.insert(ExI);		GatherShuffleExtractSeq.insert(ExI);
CSEBlocks.insert(ExI->getParent());		CSEBlocks.insert(ExI->getParent());
}		}
// If necessary, sign-extend or zero-extend ScalarRoot		// If necessary, sign-extend or zero-extend ScalarRoot
// to the larger type.		// to the larger type.
Show All 13 Lines	for (const auto &ExternalUse : ExternalUses) {
// If User == nullptr, the Scalar is used as extra arg. Generate		// If User == nullptr, the Scalar is used as extra arg. Generate
// ExtractElement instruction and update the record for this scalar in		// ExtractElement instruction and update the record for this scalar in
// ExternallyUsedValues.		// ExternallyUsedValues.
if (!User) {		if (!User) {
assert(ExternallyUsedValues.count(Scalar) &&		assert(ExternallyUsedValues.count(Scalar) &&
"Scalar with nullptr as an external user must be registered in "		"Scalar with nullptr as an external user must be registered in "
"ExternallyUsedValues map");		"ExternallyUsedValues map");
if (auto *VecI = dyn_cast<Instruction>(Vec)) {		if (auto *VecI = dyn_cast<Instruction>(Vec)) {
		if (auto *PHI = dyn_cast<PHINode>(VecI))
		Builder.SetInsertPoint(PHI->getParent()->getFirstNonPHI());
		else
Builder.SetInsertPoint(VecI->getParent(),		Builder.SetInsertPoint(VecI->getParent(),
std::next(VecI->getIterator()));		std::next(VecI->getIterator()));
} else {		} else {
Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
}		}
Value *NewInst = ExtractAndExtendIfNeeded(Vec);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
auto &NewInstLocs = ExternallyUsedValues[NewInst];		auto &NewInstLocs = ExternallyUsedValues[NewInst];
auto It = ExternallyUsedValues.find(Scalar);		auto It = ExternallyUsedValues.find(Scalar);
assert(It != ExternallyUsedValues.end() &&		assert(It != ExternallyUsedValues.end() &&
"Externally used scalar is not found in ExternallyUsedValues");		"Externally used scalar is not found in ExternallyUsedValues");
▲ Show 20 Lines • Show All 3,998 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

	Show All 18 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	Show All 19 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 468 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

	Show All 18 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	Show All 19 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 468 Lines • ▼ Show 20 Lines
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0			; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1			; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -slp-threshold=-3 -S -pass-remarks-output=%t < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -slp-threshold=-2 -S -pass-remarks-output=%t < %s \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=YAML %s			; RUN: cat %t \| FileCheck -check-prefix=YAML %s


	; FIXME: The threshold is changed to keep this test case a bit smaller.			; FIXME: The threshold is changed to keep this test case a bit smaller.
	; The AArch64 cost model should not give such high costs to select statements.			; The AArch64 cost model should not give such high costs to select statements.

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux"			target triple = "aarch64--linux"
	▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines
define i16 @reduce_blockstrided4(i16* nocapture noundef readonly %x, i16* nocapture noundef readonly %y, i32 noundef %stride) {		define i16 @reduce_blockstrided4(i16* nocapture noundef readonly %x, i16* nocapture noundef readonly %y, i32 noundef %stride) {
; CHECK-LABEL: @reduce_blockstrided4(		; CHECK-LABEL: @reduce_blockstrided4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64		; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[IDXPROM]]
; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, i16 [[Y:%.*]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, i16 [[Y:%.*]], i64 [[IDXPROM]]
; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[X]] to <4 x i16>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[X]] to <4 x i16>*
; CHECK-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> [[TMP0]], align 2		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> [[TMP0]], align 2
; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[ARRAYIDX4]] to <4 x i16>*		; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[Y]] to <4 x i16>*
; CHECK-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[TMP2]], align 2		; CHECK-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[TMP2]], align 2
; CHECK-NEXT: [[TMP4:%.]] = bitcast i16 [[Y]] to <4 x i16>*		; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i16> [[TMP3]], [[TMP1]]
; CHECK-NEXT: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[TMP4]], align 2		; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[ARRAYIDX4]] to <4 x i16>*
; CHECK-NEXT: [[TMP6:%.]] = bitcast i16 [[ARRAYIDX20]] to <4 x i16>*		; CHECK-NEXT: [[TMP6:%.]] = load <4 x i16>, <4 x i16> [[TMP5]], align 2
; CHECK-NEXT: [[TMP7:%.]] = load <4 x i16>, <4 x i16> [[TMP6]], align 2		; CHECK-NEXT: [[TMP7:%.]] = bitcast i16 [[ARRAYIDX20]] to <4 x i16>*
; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i16> [[TMP5]], [[TMP1]]		; CHECK-NEXT: [[TMP8:%.]] = load <4 x i16>, <4 x i16> [[TMP7]], align 2
; CHECK-NEXT: [[TMP9:%.*]] = mul <4 x i16> [[TMP7]], [[TMP3]]		; CHECK-NEXT: [[TMP9:%.*]] = mul <4 x i16> [[TMP8]], [[TMP6]]
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[TMP10:%.*]] = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> [[TMP4]])
; CHECK-NEXT: [[TMP11:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP10]])		; CHECK-NEXT: [[TMP11:%.*]] = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> [[TMP9]])
; CHECK-NEXT: ret i16 [[TMP11]]		; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP10]], [[TMP11]]
		; CHECK-NEXT: ret i16 [[OP_RDX]]
;		;
entry:		entry:
%0 = load i16, i16* %x, align 2		%0 = load i16, i16* %x, align 2
%arrayidx1 = getelementptr inbounds i16, i16* %x, i64 1		%arrayidx1 = getelementptr inbounds i16, i16* %x, i64 1
%1 = load i16, i16* %arrayidx1, align 2		%1 = load i16, i16* %arrayidx1, align 2
%arrayidx2 = getelementptr inbounds i16, i16* %x, i64 2		%arrayidx2 = getelementptr inbounds i16, i16* %x, i64 2
%2 = load i16, i16* %arrayidx2, align 2		%2 = load i16, i16* %arrayidx2, align 2
%arrayidx3 = getelementptr inbounds i16, i16* %x, i64 3		%arrayidx3 = getelementptr inbounds i16, i16* %x, i64 3
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, i8 [[P1:%.*]], i64 4		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, i8 [[P1:%.*]], i64 4
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, i8 [[P2:%.*]], i64 4		; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, i8 [[P2:%.*]], i64 4
; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 [[IDX_EXT63]]
; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 4
; CHECK-NEXT: [[ARRAYIDX5_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 4
; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[P1]] to <4 x i8>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[P1]] to <4 x i8>*
; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[P2]] to <4 x i8>*		; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[TMP1]] to <4 x i32>
; CHECK-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> [[TMP2]], align 1		; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[ARRAYIDX3]] to <4 x i8>*
; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[ARRAYIDX3]] to <4 x i8>*		; CHECK-NEXT: [[TMP4:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1
; CHECK-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1		; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[ARRAYIDX5]] to <4 x i8>*		; CHECK-NEXT: [[TMP6:%.*]] = mul nuw nsw <4 x i32> [[TMP2]], [[TMP5]]
; CHECK-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> [[TMP6]], align 1		; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[P2]] to <4 x i8>*
; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[ADD_PTR]] to <4 x i8>*		; CHECK-NEXT: [[TMP8:%.]] = load <4 x i8>, <4 x i8> [[TMP7]], align 1
; CHECK-NEXT: [[TMP9:%.]] = load <4 x i8>, <4 x i8> [[TMP8]], align 1		; CHECK-NEXT: [[TMP9:%.*]] = zext <4 x i8> [[TMP8]] to <4 x i32>
; CHECK-NEXT: [[TMP10:%.]] = bitcast i8 [[ADD_PTR64]] to <4 x i8>*		; CHECK-NEXT: [[TMP10:%.]] = bitcast i8 [[ARRAYIDX5]] to <4 x i8>*
; CHECK-NEXT: [[TMP11:%.]] = load <4 x i8>, <4 x i8> [[TMP10]], align 1		; CHECK-NEXT: [[TMP11:%.]] = load <4 x i8>, <4 x i8> [[TMP10]], align 1
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> [[TMP3]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP12:%.*]] = zext <4 x i8> [[TMP11]] to <4 x i32>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i8> [[TMP9]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP13:%.*]] = mul nuw nsw <4 x i32> [[TMP9]], [[TMP12]]
; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <16 x i8> [[TMP12]], <16 x i8> [[TMP13]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP14:%.]] = bitcast i8 [[ADD_PTR]] to <4 x i8>*
; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <4 x i8> [[TMP11]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP15:%.]] = load <4 x i8>, <4 x i8> [[TMP14]], align 1
; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <16 x i8> [[TMP14]], <16 x i8> [[TMP15]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[TMP16:%.*]] = zext <4 x i8> [[TMP15]] to <4 x i32>
; CHECK-NEXT: [[TMP17:%.*]] = zext <16 x i8> [[TMP16]] to <16 x i32>		; CHECK-NEXT: [[TMP17:%.]] = bitcast i8 [[ARRAYIDX3_1]] to <4 x i8>*
; CHECK-NEXT: [[TMP18:%.]] = bitcast i8 [[ARRAYIDX3_1]] to <4 x i8>*		; CHECK-NEXT: [[TMP18:%.]] = load <4 x i8>, <4 x i8> [[TMP17]], align 1
; CHECK-NEXT: [[TMP19:%.]] = load <4 x i8>, <4 x i8> [[TMP18]], align 1		; CHECK-NEXT: [[TMP19:%.*]] = zext <4 x i8> [[TMP18]] to <4 x i32>
; CHECK-NEXT: [[TMP20:%.]] = bitcast i8 [[ARRAYIDX5_1]] to <4 x i8>*		; CHECK-NEXT: [[TMP20:%.*]] = mul nuw nsw <4 x i32> [[TMP16]], [[TMP19]]
; CHECK-NEXT: [[TMP21:%.]] = load <4 x i8>, <4 x i8> [[TMP20]], align 1		; CHECK-NEXT: [[TMP21:%.]] = bitcast i8 [[ADD_PTR64]] to <4 x i8>*
; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <4 x i8> [[TMP5]], <4 x i8> [[TMP7]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP22:%.]] = load <4 x i8>, <4 x i8> [[TMP21]], align 1
; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <4 x i8> [[TMP19]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP23:%.*]] = zext <4 x i8> [[TMP22]] to <4 x i32>
; CHECK-NEXT: [[TMP24:%.*]] = shufflevector <16 x i8> [[TMP22]], <16 x i8> [[TMP23]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP24:%.]] = bitcast i8 [[ARRAYIDX5_1]] to <4 x i8>*
; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <4 x i8> [[TMP21]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP25:%.]] = load <4 x i8>, <4 x i8> [[TMP24]], align 1
; CHECK-NEXT: [[TMP26:%.*]] = shufflevector <16 x i8> [[TMP24]], <16 x i8> [[TMP25]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[TMP26:%.*]] = zext <4 x i8> [[TMP25]] to <4 x i32>
; CHECK-NEXT: [[TMP27:%.*]] = zext <16 x i8> [[TMP26]] to <16 x i32>		; CHECK-NEXT: [[TMP27:%.*]] = mul nuw nsw <4 x i32> [[TMP23]], [[TMP26]]
; CHECK-NEXT: [[TMP28:%.*]] = mul nuw nsw <16 x i32> [[TMP17]], [[TMP27]]		; CHECK-NEXT: [[TMP28:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP6]])
; CHECK-NEXT: [[TMP29:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP28]])		; CHECK-NEXT: [[TMP29:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP13]])
; CHECK-NEXT: ret i32 [[TMP29]]		; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP28]], [[TMP29]]
		; CHECK-NEXT: [[TMP30:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP20]])
		; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], [[TMP30]]
		; CHECK-NEXT: [[TMP31:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP27]])
		; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX1]], [[TMP31]]
		; CHECK-NEXT: ret i32 [[OP_RDX2]]
;		;
entry:		entry:
%idx.ext = sext i32 %off1 to i64		%idx.ext = sext i32 %off1 to i64
%idx.ext63 = sext i32 %off2 to i64		%idx.ext63 = sext i32 %off2 to i64

%0 = load i8, i8* %p1, align 1		%0 = load i8, i8* %p1, align 1
%conv = zext i8 %0 to i32		%conv = zext i8 %0 to i32
%1 = load i8, i8* %p2, align 1		%1 = load i8, i8* %p2, align 1
▲ Show 20 Lines • Show All 384 Lines • ▼ Show 20 Lines
}		}

define void @store_blockstrided4(i16* nocapture noundef readonly %x, i16* nocapture noundef readonly %y, i32 noundef %stride, i16 *%dst0) {		define void @store_blockstrided4(i16* nocapture noundef readonly %x, i16* nocapture noundef readonly %y, i32 noundef %stride, i16 *%dst0) {
; CHECK-LABEL: @store_blockstrided4(		; CHECK-LABEL: @store_blockstrided4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64		; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[STRIDE:%.]] to i64
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[IDXPROM]]
; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, i16 [[Y:%.*]], i64 [[IDXPROM]]		; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i16, i16 [[Y:%.*]], i64 [[IDXPROM]]
		; CHECK-NEXT: [[DST4:%.]] = getelementptr inbounds i16, i16 [[DST0:%.*]], i64 4
; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[X]] to <4 x i16>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[X]] to <4 x i16>*
; CHECK-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> [[TMP0]], align 2		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> [[TMP0]], align 2
; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[ARRAYIDX4]] to <4 x i16>*		; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[Y]] to <4 x i16>*
; CHECK-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[TMP2]], align 2		; CHECK-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[TMP2]], align 2
; CHECK-NEXT: [[TMP4:%.]] = bitcast i16 [[Y]] to <4 x i16>*		; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i16> [[TMP3]], [[TMP1]]
; CHECK-NEXT: [[TMP5:%.]] = load <4 x i16>, <4 x i16> [[TMP4]], align 2		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[TMP4]], <4 x i16> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP6:%.]] = bitcast i16 [[ARRAYIDX20]] to <4 x i16>*		; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[DST0]] to <4 x i16>*
		; CHECK-NEXT: [[TMP6:%.]] = bitcast i16 [[ARRAYIDX4]] to <4 x i16>*
; CHECK-NEXT: [[TMP7:%.]] = load <4 x i16>, <4 x i16> [[TMP6]], align 2		; CHECK-NEXT: [[TMP7:%.]] = load <4 x i16>, <4 x i16> [[TMP6]], align 2
; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i16> [[TMP5]], [[TMP1]]		; CHECK-NEXT: [[TMP8:%.]] = bitcast i16 [[ARRAYIDX20]] to <4 x i16>*
; CHECK-NEXT: [[TMP9:%.*]] = mul <4 x i16> [[TMP7]], [[TMP3]]		; CHECK-NEXT: [[TMP9:%.]] = load <4 x i16>, <4 x i16> [[TMP8]], align 2
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>		; CHECK-NEXT: [[TMP10:%.*]] = mul <4 x i16> [[TMP9]], [[TMP7]]
; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[DST0:%.]] to <8 x i16>		; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP5]], align 2
; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* [[TMP10]], align 2		; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i16> [[TMP10]], <4 x i16> poison, <4 x i32> <i32 1, i32 0, i32 3, i32 2>
		; CHECK-NEXT: [[TMP11:%.]] = bitcast i16 [[DST4]] to <4 x i16>*
		; CHECK-NEXT: store <4 x i16> [[SHUFFLE1]], <4 x i16>* [[TMP11]], align 2
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%0 = load i16, i16* %x, align 2		%0 = load i16, i16* %x, align 2
%arrayidx1 = getelementptr inbounds i16, i16* %x, i64 1		%arrayidx1 = getelementptr inbounds i16, i16* %x, i64 1
%1 = load i16, i16* %arrayidx1, align 2		%1 = load i16, i16* %arrayidx1, align 2
%arrayidx2 = getelementptr inbounds i16, i16* %x, i64 2		%arrayidx2 = getelementptr inbounds i16, i16* %x, i64 2
%2 = load i16, i16* %arrayidx2, align 2		%2 = load i16, i16* %arrayidx2, align 2
▲ Show 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

define dso_local i32 @full(i8* nocapture noundef readonly %p1, i32 noundef %st1, i8* nocapture noundef readonly %p2, i32 noundef %st2) {		define dso_local i32 @full(i8* nocapture noundef readonly %p1, i32 noundef %st1, i8* nocapture noundef readonly %p2, i32 noundef %st2) {
; CHECK-LABEL: @full(		; CHECK-LABEL: @full(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST1:%.]] to i64		; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST1:%.]] to i64
; CHECK-NEXT: [[IDX_EXT63:%.]] = sext i32 [[ST2:%.]] to i64		; CHECK-NEXT: [[IDX_EXT63:%.]] = sext i32 [[ST2:%.]] to i64
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, i8 [[P1:%.*]], i64 4		; CHECK-NEXT: [[TMP0:%.]] = load i8, i8 [[P1:%.*]], align 1
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, i8 [[P2:%.*]], i64 4		; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP0]] to i32
		; CHECK-NEXT: [[TMP1:%.]] = load i8, i8 [[P2:%.*]], align 1
		; CHECK-NEXT: [[CONV2:%.*]] = zext i8 [[TMP1]] to i32
		; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[CONV]], [[CONV2]]
		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 4
		; CHECK-NEXT: [[TMP2:%.]] = load i8, i8 [[ARRAYIDX3]], align 1
		; CHECK-NEXT: [[CONV4:%.*]] = zext i8 [[TMP2]] to i32
		; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 4
		; CHECK-NEXT: [[TMP3:%.]] = load i8, i8 [[ARRAYIDX5]], align 1
		; CHECK-NEXT: [[CONV6:%.*]] = zext i8 [[TMP3]] to i32
		; CHECK-NEXT: [[SUB7:%.*]] = sub nsw i32 [[CONV4]], [[CONV6]]
		; CHECK-NEXT: [[SHL:%.*]] = shl nsw i32 [[SUB7]], 16
		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[SHL]], [[SUB]]
		; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 1
		; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[ARRAYIDX8]], align 1
		; CHECK-NEXT: [[CONV9:%.*]] = zext i8 [[TMP4]] to i32
		; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 1
		; CHECK-NEXT: [[TMP5:%.]] = load i8, i8 [[ARRAYIDX10]], align 1
		; CHECK-NEXT: [[CONV11:%.*]] = zext i8 [[TMP5]] to i32
		; CHECK-NEXT: [[SUB12:%.*]] = sub nsw i32 [[CONV9]], [[CONV11]]
		; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 5
		; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[ARRAYIDX13]], align 1
		; CHECK-NEXT: [[CONV14:%.*]] = zext i8 [[TMP6]] to i32
		; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 5
		; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[ARRAYIDX15]], align 1
		; CHECK-NEXT: [[CONV16:%.*]] = zext i8 [[TMP7]] to i32
		; CHECK-NEXT: [[SUB17:%.*]] = sub nsw i32 [[CONV14]], [[CONV16]]
		; CHECK-NEXT: [[SHL18:%.*]] = shl nsw i32 [[SUB17]], 16
		; CHECK-NEXT: [[ADD19:%.*]] = add nsw i32 [[SHL18]], [[SUB12]]
		; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 2
		; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[ARRAYIDX20]], align 1
		; CHECK-NEXT: [[CONV21:%.*]] = zext i8 [[TMP8]] to i32
		; CHECK-NEXT: [[ARRAYIDX22:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 2
		; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 [[ARRAYIDX22]], align 1
		; CHECK-NEXT: [[CONV23:%.*]] = zext i8 [[TMP9]] to i32
		; CHECK-NEXT: [[SUB24:%.*]] = sub nsw i32 [[CONV21]], [[CONV23]]
		; CHECK-NEXT: [[ARRAYIDX25:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 6
		; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 [[ARRAYIDX25]], align 1
		; CHECK-NEXT: [[CONV26:%.*]] = zext i8 [[TMP10]] to i32
		; CHECK-NEXT: [[ARRAYIDX27:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 6
		; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 [[ARRAYIDX27]], align 1
		; CHECK-NEXT: [[CONV28:%.*]] = zext i8 [[TMP11]] to i32
		; CHECK-NEXT: [[SUB29:%.*]] = sub nsw i32 [[CONV26]], [[CONV28]]
		; CHECK-NEXT: [[SHL30:%.*]] = shl nsw i32 [[SUB29]], 16
		; CHECK-NEXT: [[ADD31:%.*]] = add nsw i32 [[SHL30]], [[SUB24]]
		; CHECK-NEXT: [[ARRAYIDX32:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 3
		; CHECK-NEXT: [[TMP12:%.]] = load i8, i8 [[ARRAYIDX32]], align 1
		; CHECK-NEXT: [[CONV33:%.*]] = zext i8 [[TMP12]] to i32
		; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 3
		; CHECK-NEXT: [[TMP13:%.]] = load i8, i8 [[ARRAYIDX34]], align 1
		; CHECK-NEXT: [[CONV35:%.*]] = zext i8 [[TMP13]] to i32
		; CHECK-NEXT: [[SUB36:%.*]] = sub nsw i32 [[CONV33]], [[CONV35]]
		; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 7
		; CHECK-NEXT: [[TMP14:%.]] = load i8, i8 [[ARRAYIDX37]], align 1
		; CHECK-NEXT: [[CONV38:%.*]] = zext i8 [[TMP14]] to i32
		; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 7
		; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 [[ARRAYIDX39]], align 1
		; CHECK-NEXT: [[CONV40:%.*]] = zext i8 [[TMP15]] to i32
		; CHECK-NEXT: [[SUB41:%.*]] = sub nsw i32 [[CONV38]], [[CONV40]]
		; CHECK-NEXT: [[SHL42:%.*]] = shl nsw i32 [[SUB41]], 16
		; CHECK-NEXT: [[ADD43:%.*]] = add nsw i32 [[SHL42]], [[SUB36]]
		; CHECK-NEXT: [[ADD44:%.*]] = add nsw i32 [[ADD19]], [[ADD]]
		; CHECK-NEXT: [[SUB45:%.*]] = sub nsw i32 [[ADD]], [[ADD19]]
		; CHECK-NEXT: [[ADD46:%.*]] = add nsw i32 [[ADD43]], [[ADD31]]
		; CHECK-NEXT: [[SUB47:%.*]] = sub nsw i32 [[ADD31]], [[ADD43]]
		; CHECK-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD46]], [[ADD44]]
		; CHECK-NEXT: [[SUB51:%.*]] = sub nsw i32 [[ADD44]], [[ADD46]]
		; CHECK-NEXT: [[ADD55:%.*]] = add nsw i32 [[SUB47]], [[SUB45]]
		; CHECK-NEXT: [[SUB59:%.*]] = sub nsw i32 [[SUB45]], [[SUB47]]
; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64:%.]] = getelementptr inbounds i8, i8 [[P2]], i64 [[IDX_EXT63]]
		; CHECK-NEXT: [[TMP16:%.]] = load i8, i8 [[ADD_PTR]], align 1
		; CHECK-NEXT: [[CONV_1:%.*]] = zext i8 [[TMP16]] to i32
		; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 [[ADD_PTR64]], align 1
		; CHECK-NEXT: [[CONV2_1:%.*]] = zext i8 [[TMP17]] to i32
		; CHECK-NEXT: [[SUB_1:%.*]] = sub nsw i32 [[CONV_1]], [[CONV2_1]]
; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 4
		; CHECK-NEXT: [[TMP18:%.]] = load i8, i8 [[ARRAYIDX3_1]], align 1
		; CHECK-NEXT: [[CONV4_1:%.*]] = zext i8 [[TMP18]] to i32
; CHECK-NEXT: [[ARRAYIDX5_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 4
		; CHECK-NEXT: [[TMP19:%.]] = load i8, i8 [[ARRAYIDX5_1]], align 1
		; CHECK-NEXT: [[CONV6_1:%.*]] = zext i8 [[TMP19]] to i32
		; CHECK-NEXT: [[SUB7_1:%.*]] = sub nsw i32 [[CONV4_1]], [[CONV6_1]]
		; CHECK-NEXT: [[SHL_1:%.*]] = shl nsw i32 [[SUB7_1]], 16
		; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[SHL_1]], [[SUB_1]]
		; CHECK-NEXT: [[ARRAYIDX8_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 1
		; CHECK-NEXT: [[TMP20:%.]] = load i8, i8 [[ARRAYIDX8_1]], align 1
		; CHECK-NEXT: [[CONV9_1:%.*]] = zext i8 [[TMP20]] to i32
		; CHECK-NEXT: [[ARRAYIDX10_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 1
		; CHECK-NEXT: [[TMP21:%.]] = load i8, i8 [[ARRAYIDX10_1]], align 1
		; CHECK-NEXT: [[CONV11_1:%.*]] = zext i8 [[TMP21]] to i32
		; CHECK-NEXT: [[SUB12_1:%.*]] = sub nsw i32 [[CONV9_1]], [[CONV11_1]]
		; CHECK-NEXT: [[ARRAYIDX13_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 5
		; CHECK-NEXT: [[TMP22:%.]] = load i8, i8 [[ARRAYIDX13_1]], align 1
		; CHECK-NEXT: [[CONV14_1:%.*]] = zext i8 [[TMP22]] to i32
		; CHECK-NEXT: [[ARRAYIDX15_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 5
		; CHECK-NEXT: [[TMP23:%.]] = load i8, i8 [[ARRAYIDX15_1]], align 1
		; CHECK-NEXT: [[CONV16_1:%.*]] = zext i8 [[TMP23]] to i32
		; CHECK-NEXT: [[SUB17_1:%.*]] = sub nsw i32 [[CONV14_1]], [[CONV16_1]]
		; CHECK-NEXT: [[SHL18_1:%.*]] = shl nsw i32 [[SUB17_1]], 16
		; CHECK-NEXT: [[ADD19_1:%.*]] = add nsw i32 [[SHL18_1]], [[SUB12_1]]
		; CHECK-NEXT: [[ARRAYIDX20_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 2
		; CHECK-NEXT: [[TMP24:%.]] = load i8, i8 [[ARRAYIDX20_1]], align 1
		; CHECK-NEXT: [[CONV21_1:%.*]] = zext i8 [[TMP24]] to i32
		; CHECK-NEXT: [[ARRAYIDX22_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 2
		; CHECK-NEXT: [[TMP25:%.]] = load i8, i8 [[ARRAYIDX22_1]], align 1
		; CHECK-NEXT: [[CONV23_1:%.*]] = zext i8 [[TMP25]] to i32
		; CHECK-NEXT: [[SUB24_1:%.*]] = sub nsw i32 [[CONV21_1]], [[CONV23_1]]
		; CHECK-NEXT: [[ARRAYIDX25_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 6
		; CHECK-NEXT: [[TMP26:%.]] = load i8, i8 [[ARRAYIDX25_1]], align 1
		; CHECK-NEXT: [[CONV26_1:%.*]] = zext i8 [[TMP26]] to i32
		; CHECK-NEXT: [[ARRAYIDX27_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 6
		; CHECK-NEXT: [[TMP27:%.]] = load i8, i8 [[ARRAYIDX27_1]], align 1
		; CHECK-NEXT: [[CONV28_1:%.*]] = zext i8 [[TMP27]] to i32
		; CHECK-NEXT: [[SUB29_1:%.*]] = sub nsw i32 [[CONV26_1]], [[CONV28_1]]
		; CHECK-NEXT: [[SHL30_1:%.*]] = shl nsw i32 [[SUB29_1]], 16
		; CHECK-NEXT: [[ADD31_1:%.*]] = add nsw i32 [[SHL30_1]], [[SUB24_1]]
		; CHECK-NEXT: [[ARRAYIDX32_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 3
		; CHECK-NEXT: [[TMP28:%.]] = load i8, i8 [[ARRAYIDX32_1]], align 1
		; CHECK-NEXT: [[CONV33_1:%.*]] = zext i8 [[TMP28]] to i32
		; CHECK-NEXT: [[ARRAYIDX34_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 3
		; CHECK-NEXT: [[TMP29:%.]] = load i8, i8 [[ARRAYIDX34_1]], align 1
		; CHECK-NEXT: [[CONV35_1:%.*]] = zext i8 [[TMP29]] to i32
		; CHECK-NEXT: [[SUB36_1:%.*]] = sub nsw i32 [[CONV33_1]], [[CONV35_1]]
		; CHECK-NEXT: [[ARRAYIDX37_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 7
		; CHECK-NEXT: [[TMP30:%.]] = load i8, i8 [[ARRAYIDX37_1]], align 1
		; CHECK-NEXT: [[CONV38_1:%.*]] = zext i8 [[TMP30]] to i32
		; CHECK-NEXT: [[ARRAYIDX39_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 7
		; CHECK-NEXT: [[TMP31:%.]] = load i8, i8 [[ARRAYIDX39_1]], align 1
		; CHECK-NEXT: [[CONV40_1:%.*]] = zext i8 [[TMP31]] to i32
		; CHECK-NEXT: [[SUB41_1:%.*]] = sub nsw i32 [[CONV38_1]], [[CONV40_1]]
		; CHECK-NEXT: [[SHL42_1:%.*]] = shl nsw i32 [[SUB41_1]], 16
		; CHECK-NEXT: [[ADD43_1:%.*]] = add nsw i32 [[SHL42_1]], [[SUB36_1]]
		; CHECK-NEXT: [[ADD44_1:%.*]] = add nsw i32 [[ADD19_1]], [[ADD_1]]
		; CHECK-NEXT: [[SUB45_1:%.*]] = sub nsw i32 [[ADD_1]], [[ADD19_1]]
		; CHECK-NEXT: [[ADD46_1:%.*]] = add nsw i32 [[ADD43_1]], [[ADD31_1]]
		; CHECK-NEXT: [[SUB47_1:%.*]] = sub nsw i32 [[ADD31_1]], [[ADD43_1]]
		; CHECK-NEXT: [[ADD48_1:%.*]] = add nsw i32 [[ADD46_1]], [[ADD44_1]]
		; CHECK-NEXT: [[SUB51_1:%.*]] = sub nsw i32 [[ADD44_1]], [[ADD46_1]]
		; CHECK-NEXT: [[ADD55_1:%.*]] = add nsw i32 [[SUB47_1]], [[SUB45_1]]
		; CHECK-NEXT: [[SUB59_1:%.*]] = sub nsw i32 [[SUB45_1]], [[SUB47_1]]
; CHECK-NEXT: [[ADD_PTR_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64]], i64 [[IDX_EXT63]]
		; CHECK-NEXT: [[TMP32:%.]] = load i8, i8 [[ADD_PTR_1]], align 1
		; CHECK-NEXT: [[CONV_2:%.*]] = zext i8 [[TMP32]] to i32
		; CHECK-NEXT: [[TMP33:%.]] = load i8, i8 [[ADD_PTR64_1]], align 1
		; CHECK-NEXT: [[CONV2_2:%.*]] = zext i8 [[TMP33]] to i32
		; CHECK-NEXT: [[SUB_2:%.*]] = sub nsw i32 [[CONV_2]], [[CONV2_2]]
; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 4
		; CHECK-NEXT: [[TMP34:%.]] = load i8, i8 [[ARRAYIDX3_2]], align 1
		; CHECK-NEXT: [[CONV4_2:%.*]] = zext i8 [[TMP34]] to i32
; CHECK-NEXT: [[ARRAYIDX5_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 4
		; CHECK-NEXT: [[TMP35:%.]] = load i8, i8 [[ARRAYIDX5_2]], align 1
		; CHECK-NEXT: [[CONV6_2:%.*]] = zext i8 [[TMP35]] to i32
		; CHECK-NEXT: [[SUB7_2:%.*]] = sub nsw i32 [[CONV4_2]], [[CONV6_2]]
		; CHECK-NEXT: [[SHL_2:%.*]] = shl nsw i32 [[SUB7_2]], 16
		; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[SHL_2]], [[SUB_2]]
		; CHECK-NEXT: [[ARRAYIDX8_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 1
		; CHECK-NEXT: [[TMP36:%.]] = load i8, i8 [[ARRAYIDX8_2]], align 1
		; CHECK-NEXT: [[CONV9_2:%.*]] = zext i8 [[TMP36]] to i32
		; CHECK-NEXT: [[ARRAYIDX10_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 1
		; CHECK-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX10_2]], align 1
		; CHECK-NEXT: [[CONV11_2:%.*]] = zext i8 [[TMP37]] to i32
		; CHECK-NEXT: [[SUB12_2:%.*]] = sub nsw i32 [[CONV9_2]], [[CONV11_2]]
		; CHECK-NEXT: [[ARRAYIDX13_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 5
		; CHECK-NEXT: [[TMP38:%.]] = load i8, i8 [[ARRAYIDX13_2]], align 1
		; CHECK-NEXT: [[CONV14_2:%.*]] = zext i8 [[TMP38]] to i32
		; CHECK-NEXT: [[ARRAYIDX15_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 5
		; CHECK-NEXT: [[TMP39:%.]] = load i8, i8 [[ARRAYIDX15_2]], align 1
		; CHECK-NEXT: [[CONV16_2:%.*]] = zext i8 [[TMP39]] to i32
		; CHECK-NEXT: [[SUB17_2:%.*]] = sub nsw i32 [[CONV14_2]], [[CONV16_2]]
		; CHECK-NEXT: [[SHL18_2:%.*]] = shl nsw i32 [[SUB17_2]], 16
		; CHECK-NEXT: [[ADD19_2:%.*]] = add nsw i32 [[SHL18_2]], [[SUB12_2]]
		; CHECK-NEXT: [[ARRAYIDX20_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 2
		; CHECK-NEXT: [[TMP40:%.]] = load i8, i8 [[ARRAYIDX20_2]], align 1
		; CHECK-NEXT: [[CONV21_2:%.*]] = zext i8 [[TMP40]] to i32
		; CHECK-NEXT: [[ARRAYIDX22_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 2
		; CHECK-NEXT: [[TMP41:%.]] = load i8, i8 [[ARRAYIDX22_2]], align 1
		; CHECK-NEXT: [[CONV23_2:%.*]] = zext i8 [[TMP41]] to i32
		; CHECK-NEXT: [[SUB24_2:%.*]] = sub nsw i32 [[CONV21_2]], [[CONV23_2]]
		; CHECK-NEXT: [[ARRAYIDX25_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 6
		; CHECK-NEXT: [[TMP42:%.]] = load i8, i8 [[ARRAYIDX25_2]], align 1
		; CHECK-NEXT: [[CONV26_2:%.*]] = zext i8 [[TMP42]] to i32
		; CHECK-NEXT: [[ARRAYIDX27_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 6
		; CHECK-NEXT: [[TMP43:%.]] = load i8, i8 [[ARRAYIDX27_2]], align 1
		; CHECK-NEXT: [[CONV28_2:%.*]] = zext i8 [[TMP43]] to i32
		; CHECK-NEXT: [[SUB29_2:%.*]] = sub nsw i32 [[CONV26_2]], [[CONV28_2]]
		; CHECK-NEXT: [[SHL30_2:%.*]] = shl nsw i32 [[SUB29_2]], 16
		; CHECK-NEXT: [[ADD31_2:%.*]] = add nsw i32 [[SHL30_2]], [[SUB24_2]]
		; CHECK-NEXT: [[ARRAYIDX32_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 3
		; CHECK-NEXT: [[TMP44:%.]] = load i8, i8 [[ARRAYIDX32_2]], align 1
		; CHECK-NEXT: [[CONV33_2:%.*]] = zext i8 [[TMP44]] to i32
		; CHECK-NEXT: [[ARRAYIDX34_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 3
		; CHECK-NEXT: [[TMP45:%.]] = load i8, i8 [[ARRAYIDX34_2]], align 1
		; CHECK-NEXT: [[CONV35_2:%.*]] = zext i8 [[TMP45]] to i32
		; CHECK-NEXT: [[SUB36_2:%.*]] = sub nsw i32 [[CONV33_2]], [[CONV35_2]]
		; CHECK-NEXT: [[ARRAYIDX37_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 7
		; CHECK-NEXT: [[TMP46:%.]] = load i8, i8 [[ARRAYIDX37_2]], align 1
		; CHECK-NEXT: [[CONV38_2:%.*]] = zext i8 [[TMP46]] to i32
		; CHECK-NEXT: [[ARRAYIDX39_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 7
		; CHECK-NEXT: [[TMP47:%.]] = load i8, i8 [[ARRAYIDX39_2]], align 1
		; CHECK-NEXT: [[CONV40_2:%.*]] = zext i8 [[TMP47]] to i32
		; CHECK-NEXT: [[SUB41_2:%.*]] = sub nsw i32 [[CONV38_2]], [[CONV40_2]]
		; CHECK-NEXT: [[SHL42_2:%.*]] = shl nsw i32 [[SUB41_2]], 16
		; CHECK-NEXT: [[ADD43_2:%.*]] = add nsw i32 [[SHL42_2]], [[SUB36_2]]
		; CHECK-NEXT: [[ADD44_2:%.*]] = add nsw i32 [[ADD19_2]], [[ADD_2]]
		; CHECK-NEXT: [[SUB45_2:%.*]] = sub nsw i32 [[ADD_2]], [[ADD19_2]]
		; CHECK-NEXT: [[ADD46_2:%.*]] = add nsw i32 [[ADD43_2]], [[ADD31_2]]
		; CHECK-NEXT: [[SUB47_2:%.*]] = sub nsw i32 [[ADD31_2]], [[ADD43_2]]
		; CHECK-NEXT: [[ADD48_2:%.*]] = add nsw i32 [[ADD46_2]], [[ADD44_2]]
		; CHECK-NEXT: [[SUB51_2:%.*]] = sub nsw i32 [[ADD44_2]], [[ADD46_2]]
		; CHECK-NEXT: [[ADD55_2:%.*]] = add nsw i32 [[SUB47_2]], [[SUB45_2]]
		; CHECK-NEXT: [[SUB59_2:%.*]] = sub nsw i32 [[SUB45_2]], [[SUB47_2]]
; CHECK-NEXT: [[ADD_PTR_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 [[IDX_EXT]]		; CHECK-NEXT: [[ADD_PTR_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_1]], i64 [[IDX_EXT]]
; CHECK-NEXT: [[ADD_PTR64_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 [[IDX_EXT63]]		; CHECK-NEXT: [[ADD_PTR64_2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_1]], i64 [[IDX_EXT63]]
		; CHECK-NEXT: [[TMP48:%.]] = load i8, i8 [[ADD_PTR_2]], align 1
		; CHECK-NEXT: [[CONV_3:%.*]] = zext i8 [[TMP48]] to i32
		; CHECK-NEXT: [[TMP49:%.]] = load i8, i8 [[ADD_PTR64_2]], align 1
		; CHECK-NEXT: [[CONV2_3:%.*]] = zext i8 [[TMP49]] to i32
		; CHECK-NEXT: [[SUB_3:%.*]] = sub nsw i32 [[CONV_3]], [[CONV2_3]]
; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 4
		; CHECK-NEXT: [[TMP50:%.]] = load i8, i8 [[ARRAYIDX3_3]], align 1
		; CHECK-NEXT: [[CONV4_3:%.*]] = zext i8 [[TMP50]] to i32
; CHECK-NEXT: [[ARRAYIDX5_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 4		; CHECK-NEXT: [[ARRAYIDX5_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 4
; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[P1]] to <4 x i8>*		; CHECK-NEXT: [[TMP51:%.]] = load i8, i8 [[ARRAYIDX5_3]], align 1
; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1		; CHECK-NEXT: [[CONV6_3:%.*]] = zext i8 [[TMP51]] to i32
; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[P2]] to <4 x i8>*		; CHECK-NEXT: [[SUB7_3:%.*]] = sub nsw i32 [[CONV4_3]], [[CONV6_3]]
; CHECK-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> [[TMP2]], align 1		; CHECK-NEXT: [[SHL_3:%.*]] = shl nsw i32 [[SUB7_3]], 16
; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[ARRAYIDX3]] to <4 x i8>*		; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 [[SHL_3]], [[SUB_3]]
; CHECK-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1		; CHECK-NEXT: [[ARRAYIDX8_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 1
; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[ARRAYIDX5]] to <4 x i8>*		; CHECK-NEXT: [[TMP52:%.]] = load i8, i8 [[ARRAYIDX8_3]], align 1
; CHECK-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> [[TMP6]], align 1		; CHECK-NEXT: [[CONV9_3:%.*]] = zext i8 [[TMP52]] to i32
; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[ADD_PTR]] to <4 x i8>*		; CHECK-NEXT: [[ARRAYIDX10_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 1
; CHECK-NEXT: [[TMP9:%.]] = load <4 x i8>, <4 x i8> [[TMP8]], align 1		; CHECK-NEXT: [[TMP53:%.]] = load i8, i8 [[ARRAYIDX10_3]], align 1
; CHECK-NEXT: [[TMP10:%.]] = bitcast i8 [[ADD_PTR64]] to <4 x i8>*		; CHECK-NEXT: [[CONV11_3:%.*]] = zext i8 [[TMP53]] to i32
; CHECK-NEXT: [[TMP11:%.]] = load <4 x i8>, <4 x i8> [[TMP10]], align 1		; CHECK-NEXT: [[SUB12_3:%.*]] = sub nsw i32 [[CONV9_3]], [[CONV11_3]]
; CHECK-NEXT: [[TMP12:%.]] = bitcast i8 [[ARRAYIDX3_1]] to <4 x i8>*		; CHECK-NEXT: [[ARRAYIDX13_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 5
; CHECK-NEXT: [[TMP13:%.]] = load <4 x i8>, <4 x i8> [[TMP12]], align 1		; CHECK-NEXT: [[TMP54:%.]] = load i8, i8 [[ARRAYIDX13_3]], align 1
; CHECK-NEXT: [[TMP14:%.]] = bitcast i8 [[ARRAYIDX5_1]] to <4 x i8>*		; CHECK-NEXT: [[CONV14_3:%.*]] = zext i8 [[TMP54]] to i32
; CHECK-NEXT: [[TMP15:%.]] = load <4 x i8>, <4 x i8> [[TMP14]], align 1		; CHECK-NEXT: [[ARRAYIDX15_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 5
; CHECK-NEXT: [[TMP16:%.]] = bitcast i8 [[ADD_PTR_1]] to <4 x i8>*		; CHECK-NEXT: [[TMP55:%.]] = load i8, i8 [[ARRAYIDX15_3]], align 1
; CHECK-NEXT: [[TMP17:%.]] = load <4 x i8>, <4 x i8> [[TMP16]], align 1		; CHECK-NEXT: [[CONV16_3:%.*]] = zext i8 [[TMP55]] to i32
; CHECK-NEXT: [[TMP18:%.]] = bitcast i8 [[ADD_PTR64_1]] to <4 x i8>*		; CHECK-NEXT: [[SUB17_3:%.*]] = sub nsw i32 [[CONV14_3]], [[CONV16_3]]
; CHECK-NEXT: [[TMP19:%.]] = load <4 x i8>, <4 x i8> [[TMP18]], align 1		; CHECK-NEXT: [[SHL18_3:%.*]] = shl nsw i32 [[SUB17_3]], 16
; CHECK-NEXT: [[TMP20:%.]] = bitcast i8 [[ARRAYIDX3_2]] to <4 x i8>*		; CHECK-NEXT: [[ADD19_3:%.*]] = add nsw i32 [[SHL18_3]], [[SUB12_3]]
; CHECK-NEXT: [[TMP21:%.]] = load <4 x i8>, <4 x i8> [[TMP20]], align 1		; CHECK-NEXT: [[ARRAYIDX20_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 2
; CHECK-NEXT: [[TMP22:%.]] = bitcast i8 [[ARRAYIDX5_2]] to <4 x i8>*		; CHECK-NEXT: [[TMP56:%.]] = load i8, i8 [[ARRAYIDX20_3]], align 1
; CHECK-NEXT: [[TMP23:%.]] = load <4 x i8>, <4 x i8> [[TMP22]], align 1		; CHECK-NEXT: [[CONV21_3:%.*]] = zext i8 [[TMP56]] to i32
; CHECK-NEXT: [[TMP24:%.]] = bitcast i8 [[ADD_PTR_2]] to <4 x i8>*		; CHECK-NEXT: [[ARRAYIDX22_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 2
; CHECK-NEXT: [[TMP25:%.]] = load <4 x i8>, <4 x i8> [[TMP24]], align 1		; CHECK-NEXT: [[TMP57:%.]] = load i8, i8 [[ARRAYIDX22_3]], align 1
; CHECK-NEXT: [[TMP26:%.*]] = shufflevector <4 x i8> [[TMP25]], <4 x i8> [[TMP17]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[CONV23_3:%.*]] = zext i8 [[TMP57]] to i32
; CHECK-NEXT: [[TMP27:%.*]] = shufflevector <4 x i8> [[TMP9]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SUB24_3:%.*]] = sub nsw i32 [[CONV21_3]], [[CONV23_3]]
; CHECK-NEXT: [[TMP28:%.*]] = shufflevector <16 x i8> [[TMP26]], <16 x i8> [[TMP27]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX25_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 6
; CHECK-NEXT: [[TMP29:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP58:%.]] = load i8, i8 [[ARRAYIDX25_3]], align 1
; CHECK-NEXT: [[TMP30:%.*]] = shufflevector <16 x i8> [[TMP28]], <16 x i8> [[TMP29]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[CONV26_3:%.*]] = zext i8 [[TMP58]] to i32
; CHECK-NEXT: [[TMP31:%.*]] = zext <16 x i8> [[TMP30]] to <16 x i32>		; CHECK-NEXT: [[ARRAYIDX27_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 6
; CHECK-NEXT: [[TMP32:%.]] = bitcast i8 [[ADD_PTR64_2]] to <4 x i8>*		; CHECK-NEXT: [[TMP59:%.]] = load i8, i8 [[ARRAYIDX27_3]], align 1
; CHECK-NEXT: [[TMP33:%.]] = load <4 x i8>, <4 x i8> [[TMP32]], align 1		; CHECK-NEXT: [[CONV28_3:%.*]] = zext i8 [[TMP59]] to i32
; CHECK-NEXT: [[TMP34:%.*]] = shufflevector <4 x i8> [[TMP33]], <4 x i8> [[TMP19]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SUB29_3:%.*]] = sub nsw i32 [[CONV26_3]], [[CONV28_3]]
; CHECK-NEXT: [[TMP35:%.*]] = shufflevector <4 x i8> [[TMP11]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SHL30_3:%.*]] = shl nsw i32 [[SUB29_3]], 16
; CHECK-NEXT: [[TMP36:%.*]] = shufflevector <16 x i8> [[TMP34]], <16 x i8> [[TMP35]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ADD31_3:%.*]] = add nsw i32 [[SHL30_3]], [[SUB24_3]]
; CHECK-NEXT: [[TMP37:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX32_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 3
; CHECK-NEXT: [[TMP38:%.*]] = shufflevector <16 x i8> [[TMP36]], <16 x i8> [[TMP37]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[TMP60:%.]] = load i8, i8 [[ARRAYIDX32_3]], align 1
; CHECK-NEXT: [[TMP39:%.*]] = zext <16 x i8> [[TMP38]] to <16 x i32>		; CHECK-NEXT: [[CONV33_3:%.*]] = zext i8 [[TMP60]] to i32
; CHECK-NEXT: [[TMP40:%.*]] = sub nsw <16 x i32> [[TMP31]], [[TMP39]]		; CHECK-NEXT: [[ARRAYIDX34_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 3
; CHECK-NEXT: [[TMP41:%.]] = bitcast i8 [[ARRAYIDX3_3]] to <4 x i8>*		; CHECK-NEXT: [[TMP61:%.]] = load i8, i8 [[ARRAYIDX34_3]], align 1
; CHECK-NEXT: [[TMP42:%.]] = load <4 x i8>, <4 x i8> [[TMP41]], align 1		; CHECK-NEXT: [[CONV35_3:%.*]] = zext i8 [[TMP61]] to i32
; CHECK-NEXT: [[TMP43:%.*]] = shufflevector <4 x i8> [[TMP42]], <4 x i8> [[TMP21]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SUB36_3:%.*]] = sub nsw i32 [[CONV33_3]], [[CONV35_3]]
; CHECK-NEXT: [[TMP44:%.*]] = shufflevector <4 x i8> [[TMP13]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ARRAYIDX37_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR_2]], i64 7
; CHECK-NEXT: [[TMP45:%.*]] = shufflevector <16 x i8> [[TMP43]], <16 x i8> [[TMP44]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP62:%.]] = load i8, i8 [[ARRAYIDX37_3]], align 1
; CHECK-NEXT: [[TMP46:%.*]] = shufflevector <4 x i8> [[TMP5]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[CONV38_3:%.*]] = zext i8 [[TMP62]] to i32
; CHECK-NEXT: [[TMP47:%.*]] = shufflevector <16 x i8> [[TMP45]], <16 x i8> [[TMP46]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[ARRAYIDX39_3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR64_2]], i64 7
; CHECK-NEXT: [[TMP48:%.*]] = zext <16 x i8> [[TMP47]] to <16 x i32>		; CHECK-NEXT: [[TMP63:%.]] = load i8, i8 [[ARRAYIDX39_3]], align 1
; CHECK-NEXT: [[TMP49:%.]] = bitcast i8 [[ARRAYIDX5_3]] to <4 x i8>*		; CHECK-NEXT: [[CONV40_3:%.*]] = zext i8 [[TMP63]] to i32
; CHECK-NEXT: [[TMP50:%.]] = load <4 x i8>, <4 x i8> [[TMP49]], align 1		; CHECK-NEXT: [[SUB41_3:%.*]] = sub nsw i32 [[CONV38_3]], [[CONV40_3]]
; CHECK-NEXT: [[TMP51:%.*]] = shufflevector <4 x i8> [[TMP50]], <4 x i8> [[TMP23]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SHL42_3:%.*]] = shl nsw i32 [[SUB41_3]], 16
; CHECK-NEXT: [[TMP52:%.*]] = shufflevector <4 x i8> [[TMP15]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ADD43_3:%.*]] = add nsw i32 [[SHL42_3]], [[SUB36_3]]
; CHECK-NEXT: [[TMP53:%.*]] = shufflevector <16 x i8> [[TMP51]], <16 x i8> [[TMP52]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[ADD44_3:%.*]] = add nsw i32 [[ADD19_3]], [[ADD_3]]
; CHECK-NEXT: [[TMP54:%.*]] = shufflevector <4 x i8> [[TMP7]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SUB45_3:%.*]] = sub nsw i32 [[ADD_3]], [[ADD19_3]]
; CHECK-NEXT: [[TMP55:%.*]] = shufflevector <16 x i8> [[TMP53]], <16 x i8> [[TMP54]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[ADD46_3:%.*]] = add nsw i32 [[ADD43_3]], [[ADD31_3]]
; CHECK-NEXT: [[TMP56:%.*]] = zext <16 x i8> [[TMP55]] to <16 x i32>		; CHECK-NEXT: [[SUB47_3:%.*]] = sub nsw i32 [[ADD31_3]], [[ADD43_3]]
; CHECK-NEXT: [[TMP57:%.*]] = sub nsw <16 x i32> [[TMP48]], [[TMP56]]		; CHECK-NEXT: [[ADD48_3:%.*]] = add nsw i32 [[ADD46_3]], [[ADD44_3]]
; CHECK-NEXT: [[TMP58:%.*]] = shl nsw <16 x i32> [[TMP57]], <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: [[SUB51_3:%.*]] = sub nsw i32 [[ADD44_3]], [[ADD46_3]]
; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i32> [[TMP58]], [[TMP40]]		; CHECK-NEXT: [[ADD55_3:%.*]] = add nsw i32 [[SUB47_3]], [[SUB45_3]]
; CHECK-NEXT: [[TMP60:%.*]] = shufflevector <16 x i32> [[TMP59]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>		; CHECK-NEXT: [[SUB59_3:%.*]] = sub nsw i32 [[SUB45_3]], [[SUB47_3]]
; CHECK-NEXT: [[TMP61:%.*]] = add nsw <16 x i32> [[TMP59]], [[TMP60]]		; CHECK-NEXT: [[ADD78:%.*]] = add nsw i32 [[ADD48_1]], [[ADD48]]
; CHECK-NEXT: [[TMP62:%.*]] = sub nsw <16 x i32> [[TMP59]], [[TMP60]]		; CHECK-NEXT: [[SUB86:%.*]] = sub nsw i32 [[ADD48]], [[ADD48_1]]
; CHECK-NEXT: [[TMP63:%.*]] = shufflevector <16 x i32> [[TMP61]], <16 x i32> [[TMP62]], <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 22, i32 18, i32 26, i32 30, i32 5, i32 1, i32 9, i32 13, i32 20, i32 16, i32 24, i32 28>		; CHECK-NEXT: [[ADD94:%.*]] = add nsw i32 [[ADD48_3]], [[ADD48_2]]
; CHECK-NEXT: [[TMP64:%.*]] = shufflevector <16 x i32> [[TMP63]], <16 x i32> poison, <16 x i32> <i32 9, i32 8, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 1, i32 0, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[SUB102:%.*]] = sub nsw i32 [[ADD48_2]], [[ADD48_3]]
; CHECK-NEXT: [[TMP65:%.*]] = add nsw <16 x i32> [[TMP63]], [[TMP64]]		; CHECK-NEXT: [[ADD103:%.*]] = add nsw i32 [[ADD94]], [[ADD78]]
; CHECK-NEXT: [[TMP66:%.*]] = sub nsw <16 x i32> [[TMP63]], [[TMP64]]		; CHECK-NEXT: [[SUB104:%.*]] = sub nsw i32 [[ADD78]], [[ADD94]]
; CHECK-NEXT: [[TMP67:%.*]] = shufflevector <16 x i32> [[TMP65]], <16 x i32> [[TMP66]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; CHECK-NEXT: [[ADD105:%.*]] = add nsw i32 [[SUB102]], [[SUB86]]
; CHECK-NEXT: [[TMP68:%.*]] = shufflevector <16 x i32> [[TMP67]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>		; CHECK-NEXT: [[SUB106:%.*]] = sub nsw i32 [[SUB86]], [[SUB102]]
; CHECK-NEXT: [[TMP69:%.*]] = add nsw <16 x i32> [[TMP67]], [[TMP68]]		; CHECK-NEXT: [[SHR_I:%.*]] = lshr i32 [[ADD103]], 15
; CHECK-NEXT: [[TMP70:%.*]] = sub nsw <16 x i32> [[TMP67]], [[TMP68]]		; CHECK-NEXT: [[AND_I:%.*]] = and i32 [[SHR_I]], 65537
; CHECK-NEXT: [[TMP71:%.*]] = shufflevector <16 x i32> [[TMP69]], <16 x i32> [[TMP70]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 20, i32 5, i32 6, i32 23, i32 24, i32 9, i32 10, i32 27, i32 28, i32 13, i32 14, i32 31>		; CHECK-NEXT: [[MUL_I:%.*]] = mul nuw i32 [[AND_I]], 65535
; CHECK-NEXT: [[TMP72:%.*]] = shufflevector <16 x i32> [[TMP71]], <16 x i32> poison, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>		; CHECK-NEXT: [[ADD_I:%.*]] = add i32 [[MUL_I]], [[ADD103]]
; CHECK-NEXT: [[TMP73:%.*]] = add nsw <16 x i32> [[TMP71]], [[TMP72]]		; CHECK-NEXT: [[XOR_I:%.*]] = xor i32 [[ADD_I]], [[MUL_I]]
; CHECK-NEXT: [[TMP74:%.*]] = sub nsw <16 x i32> [[TMP71]], [[TMP72]]		; CHECK-NEXT: [[SHR_I184:%.*]] = lshr i32 [[ADD105]], 15
; CHECK-NEXT: [[TMP75:%.*]] = shufflevector <16 x i32> [[TMP73]], <16 x i32> [[TMP74]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>		; CHECK-NEXT: [[AND_I185:%.*]] = and i32 [[SHR_I184]], 65537
; CHECK-NEXT: [[TMP76:%.*]] = lshr <16 x i32> [[TMP75]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[MUL_I186:%.*]] = mul nuw i32 [[AND_I185]], 65535
; CHECK-NEXT: [[TMP77:%.*]] = and <16 x i32> [[TMP76]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[ADD_I187:%.*]] = add i32 [[MUL_I186]], [[ADD105]]
; CHECK-NEXT: [[TMP78:%.*]] = mul nuw <16 x i32> [[TMP77]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[XOR_I188:%.*]] = xor i32 [[ADD_I187]], [[MUL_I186]]
; CHECK-NEXT: [[TMP79:%.*]] = add <16 x i32> [[TMP78]], [[TMP75]]		; CHECK-NEXT: [[SHR_I189:%.*]] = lshr i32 [[SUB104]], 15
; CHECK-NEXT: [[TMP80:%.*]] = xor <16 x i32> [[TMP79]], [[TMP78]]		; CHECK-NEXT: [[AND_I190:%.*]] = and i32 [[SHR_I189]], 65537
; CHECK-NEXT: [[TMP81:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP80]])		; CHECK-NEXT: [[MUL_I191:%.*]] = mul nuw i32 [[AND_I190]], 65535
; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[TMP81]], 65535		; CHECK-NEXT: [[ADD_I192:%.*]] = add i32 [[MUL_I191]], [[SUB104]]
; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP81]], 16		; CHECK-NEXT: [[XOR_I193:%.*]] = xor i32 [[ADD_I192]], [[MUL_I191]]
		; CHECK-NEXT: [[SHR_I194:%.*]] = lshr i32 [[SUB106]], 15
		; CHECK-NEXT: [[AND_I195:%.*]] = and i32 [[SHR_I194]], 65537
		; CHECK-NEXT: [[MUL_I196:%.*]] = mul nuw i32 [[AND_I195]], 65535
		; CHECK-NEXT: [[ADD_I197:%.*]] = add i32 [[MUL_I196]], [[SUB106]]
		; CHECK-NEXT: [[XOR_I198:%.*]] = xor i32 [[ADD_I197]], [[MUL_I196]]
		; CHECK-NEXT: [[ADD110:%.*]] = add i32 [[XOR_I188]], [[XOR_I]]
		; CHECK-NEXT: [[ADD112:%.*]] = add i32 [[ADD110]], [[XOR_I193]]
		; CHECK-NEXT: [[ADD113:%.*]] = add i32 [[ADD112]], [[XOR_I198]]
		; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 [[ADD55_1]], [[ADD55]]
		; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 [[ADD55]], [[ADD55_1]]
		; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 [[ADD55_3]], [[ADD55_2]]
		; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 [[ADD55_2]], [[ADD55_3]]
		; CHECK-NEXT: [[ADD103_1:%.*]] = add nsw i32 [[ADD94_1]], [[ADD78_1]]
		; CHECK-NEXT: [[SUB104_1:%.*]] = sub nsw i32 [[ADD78_1]], [[ADD94_1]]
		; CHECK-NEXT: [[ADD105_1:%.*]] = add nsw i32 [[SUB102_1]], [[SUB86_1]]
		; CHECK-NEXT: [[SUB106_1:%.*]] = sub nsw i32 [[SUB86_1]], [[SUB102_1]]
		; CHECK-NEXT: [[SHR_I_1:%.*]] = lshr i32 [[ADD103_1]], 15
		; CHECK-NEXT: [[AND_I_1:%.*]] = and i32 [[SHR_I_1]], 65537
		; CHECK-NEXT: [[MUL_I_1:%.*]] = mul nuw i32 [[AND_I_1]], 65535
		; CHECK-NEXT: [[ADD_I_1:%.*]] = add i32 [[MUL_I_1]], [[ADD103_1]]
		; CHECK-NEXT: [[XOR_I_1:%.*]] = xor i32 [[ADD_I_1]], [[MUL_I_1]]
		; CHECK-NEXT: [[SHR_I184_1:%.*]] = lshr i32 [[ADD105_1]], 15
		; CHECK-NEXT: [[AND_I185_1:%.*]] = and i32 [[SHR_I184_1]], 65537
		; CHECK-NEXT: [[MUL_I186_1:%.*]] = mul nuw i32 [[AND_I185_1]], 65535
		; CHECK-NEXT: [[ADD_I187_1:%.*]] = add i32 [[MUL_I186_1]], [[ADD105_1]]
		; CHECK-NEXT: [[XOR_I188_1:%.*]] = xor i32 [[ADD_I187_1]], [[MUL_I186_1]]
		; CHECK-NEXT: [[SHR_I189_1:%.*]] = lshr i32 [[SUB104_1]], 15
		; CHECK-NEXT: [[AND_I190_1:%.*]] = and i32 [[SHR_I189_1]], 65537
		; CHECK-NEXT: [[MUL_I191_1:%.*]] = mul nuw i32 [[AND_I190_1]], 65535
		; CHECK-NEXT: [[ADD_I192_1:%.*]] = add i32 [[MUL_I191_1]], [[SUB104_1]]
		; CHECK-NEXT: [[XOR_I193_1:%.*]] = xor i32 [[ADD_I192_1]], [[MUL_I191_1]]
		; CHECK-NEXT: [[SHR_I194_1:%.*]] = lshr i32 [[SUB106_1]], 15
		; CHECK-NEXT: [[AND_I195_1:%.*]] = and i32 [[SHR_I194_1]], 65537
		; CHECK-NEXT: [[MUL_I196_1:%.*]] = mul nuw i32 [[AND_I195_1]], 65535
		; CHECK-NEXT: [[ADD_I197_1:%.*]] = add i32 [[MUL_I196_1]], [[SUB106_1]]
		; CHECK-NEXT: [[XOR_I198_1:%.*]] = xor i32 [[ADD_I197_1]], [[MUL_I196_1]]
		; CHECK-NEXT: [[ADD108_1:%.*]] = add i32 [[XOR_I188_1]], [[ADD113]]
		; CHECK-NEXT: [[ADD110_1:%.*]] = add i32 [[ADD108_1]], [[XOR_I_1]]
		; CHECK-NEXT: [[ADD112_1:%.*]] = add i32 [[ADD110_1]], [[XOR_I193_1]]
		; CHECK-NEXT: [[ADD113_1:%.*]] = add i32 [[ADD112_1]], [[XOR_I198_1]]
		; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 [[SUB51_1]], [[SUB51]]
		; CHECK-NEXT: [[SUB86_2:%.*]] = sub nsw i32 [[SUB51]], [[SUB51_1]]
		; CHECK-NEXT: [[ADD94_2:%.*]] = add nsw i32 [[SUB51_3]], [[SUB51_2]]
		; CHECK-NEXT: [[SUB102_2:%.*]] = sub nsw i32 [[SUB51_2]], [[SUB51_3]]
		; CHECK-NEXT: [[ADD103_2:%.*]] = add nsw i32 [[ADD94_2]], [[ADD78_2]]
		; CHECK-NEXT: [[SUB104_2:%.*]] = sub nsw i32 [[ADD78_2]], [[ADD94_2]]
		; CHECK-NEXT: [[ADD105_2:%.*]] = add nsw i32 [[SUB102_2]], [[SUB86_2]]
		; CHECK-NEXT: [[SUB106_2:%.*]] = sub nsw i32 [[SUB86_2]], [[SUB102_2]]
		; CHECK-NEXT: [[SHR_I_2:%.*]] = lshr i32 [[ADD103_2]], 15
		; CHECK-NEXT: [[AND_I_2:%.*]] = and i32 [[SHR_I_2]], 65537
		; CHECK-NEXT: [[MUL_I_2:%.*]] = mul nuw i32 [[AND_I_2]], 65535
		; CHECK-NEXT: [[ADD_I_2:%.*]] = add i32 [[MUL_I_2]], [[ADD103_2]]
		; CHECK-NEXT: [[XOR_I_2:%.*]] = xor i32 [[ADD_I_2]], [[MUL_I_2]]
		; CHECK-NEXT: [[SHR_I184_2:%.*]] = lshr i32 [[ADD105_2]], 15
		; CHECK-NEXT: [[AND_I185_2:%.*]] = and i32 [[SHR_I184_2]], 65537
		; CHECK-NEXT: [[MUL_I186_2:%.*]] = mul nuw i32 [[AND_I185_2]], 65535
		; CHECK-NEXT: [[ADD_I187_2:%.*]] = add i32 [[MUL_I186_2]], [[ADD105_2]]
		; CHECK-NEXT: [[XOR_I188_2:%.*]] = xor i32 [[ADD_I187_2]], [[MUL_I186_2]]
		; CHECK-NEXT: [[SHR_I189_2:%.*]] = lshr i32 [[SUB104_2]], 15
		; CHECK-NEXT: [[AND_I190_2:%.*]] = and i32 [[SHR_I189_2]], 65537
		; CHECK-NEXT: [[MUL_I191_2:%.*]] = mul nuw i32 [[AND_I190_2]], 65535
		; CHECK-NEXT: [[ADD_I192_2:%.*]] = add i32 [[MUL_I191_2]], [[SUB104_2]]
		; CHECK-NEXT: [[XOR_I193_2:%.*]] = xor i32 [[ADD_I192_2]], [[MUL_I191_2]]
		; CHECK-NEXT: [[SHR_I194_2:%.*]] = lshr i32 [[SUB106_2]], 15
		; CHECK-NEXT: [[AND_I195_2:%.*]] = and i32 [[SHR_I194_2]], 65537
		; CHECK-NEXT: [[MUL_I196_2:%.*]] = mul nuw i32 [[AND_I195_2]], 65535
		; CHECK-NEXT: [[ADD_I197_2:%.*]] = add i32 [[MUL_I196_2]], [[SUB106_2]]
		; CHECK-NEXT: [[XOR_I198_2:%.*]] = xor i32 [[ADD_I197_2]], [[MUL_I196_2]]
		; CHECK-NEXT: [[ADD108_2:%.*]] = add i32 [[XOR_I188_2]], [[ADD113_1]]
		; CHECK-NEXT: [[ADD110_2:%.*]] = add i32 [[ADD108_2]], [[XOR_I_2]]
		; CHECK-NEXT: [[ADD112_2:%.*]] = add i32 [[ADD110_2]], [[XOR_I193_2]]
		; CHECK-NEXT: [[ADD113_2:%.*]] = add i32 [[ADD112_2]], [[XOR_I198_2]]
		; CHECK-NEXT: [[ADD78_3:%.*]] = add nsw i32 [[SUB59_1]], [[SUB59]]
		; CHECK-NEXT: [[SUB86_3:%.*]] = sub nsw i32 [[SUB59]], [[SUB59_1]]
		; CHECK-NEXT: [[ADD94_3:%.*]] = add nsw i32 [[SUB59_3]], [[SUB59_2]]
		; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 [[SUB59_2]], [[SUB59_3]]
		; CHECK-NEXT: [[ADD103_3:%.*]] = add nsw i32 [[ADD94_3]], [[ADD78_3]]
		; CHECK-NEXT: [[SUB104_3:%.*]] = sub nsw i32 [[ADD78_3]], [[ADD94_3]]
		; CHECK-NEXT: [[ADD105_3:%.*]] = add nsw i32 [[SUB102_3]], [[SUB86_3]]
		; CHECK-NEXT: [[SUB106_3:%.*]] = sub nsw i32 [[SUB86_3]], [[SUB102_3]]
		; CHECK-NEXT: [[SHR_I_3:%.*]] = lshr i32 [[ADD103_3]], 15
		; CHECK-NEXT: [[AND_I_3:%.*]] = and i32 [[SHR_I_3]], 65537
		; CHECK-NEXT: [[MUL_I_3:%.*]] = mul nuw i32 [[AND_I_3]], 65535
		; CHECK-NEXT: [[ADD_I_3:%.*]] = add i32 [[MUL_I_3]], [[ADD103_3]]
		; CHECK-NEXT: [[XOR_I_3:%.*]] = xor i32 [[ADD_I_3]], [[MUL_I_3]]
		; CHECK-NEXT: [[SHR_I184_3:%.*]] = lshr i32 [[ADD105_3]], 15
		; CHECK-NEXT: [[AND_I185_3:%.*]] = and i32 [[SHR_I184_3]], 65537
		; CHECK-NEXT: [[MUL_I186_3:%.*]] = mul nuw i32 [[AND_I185_3]], 65535
		; CHECK-NEXT: [[ADD_I187_3:%.*]] = add i32 [[MUL_I186_3]], [[ADD105_3]]
		; CHECK-NEXT: [[XOR_I188_3:%.*]] = xor i32 [[ADD_I187_3]], [[MUL_I186_3]]
		; CHECK-NEXT: [[SHR_I189_3:%.*]] = lshr i32 [[SUB104_3]], 15
		; CHECK-NEXT: [[AND_I190_3:%.*]] = and i32 [[SHR_I189_3]], 65537
		; CHECK-NEXT: [[MUL_I191_3:%.*]] = mul nuw i32 [[AND_I190_3]], 65535
		; CHECK-NEXT: [[ADD_I192_3:%.*]] = add i32 [[MUL_I191_3]], [[SUB104_3]]
		; CHECK-NEXT: [[XOR_I193_3:%.*]] = xor i32 [[ADD_I192_3]], [[MUL_I191_3]]
		; CHECK-NEXT: [[SHR_I194_3:%.*]] = lshr i32 [[SUB106_3]], 15
		; CHECK-NEXT: [[AND_I195_3:%.*]] = and i32 [[SHR_I194_3]], 65537
		; CHECK-NEXT: [[MUL_I196_3:%.*]] = mul nuw i32 [[AND_I195_3]], 65535
		; CHECK-NEXT: [[ADD_I197_3:%.*]] = add i32 [[MUL_I196_3]], [[SUB106_3]]
		; CHECK-NEXT: [[XOR_I198_3:%.*]] = xor i32 [[ADD_I197_3]], [[MUL_I196_3]]
		; CHECK-NEXT: [[ADD108_3:%.*]] = add i32 [[XOR_I188_3]], [[ADD113_2]]
		; CHECK-NEXT: [[ADD110_3:%.*]] = add i32 [[ADD108_3]], [[XOR_I_3]]
		; CHECK-NEXT: [[ADD112_3:%.*]] = add i32 [[ADD110_3]], [[XOR_I193_3]]
		; CHECK-NEXT: [[ADD113_3:%.*]] = add i32 [[ADD112_3]], [[XOR_I198_3]]
		; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[ADD113_3]], 65535
		; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ADD113_3]], 16
; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]		; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]
; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1		; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1
; CHECK-NEXT: ret i32 [[SHR120]]		; CHECK-NEXT: ret i32 [[SHR120]]
;		;
entry:		entry:
%idx.ext = sext i32 %st1 to i64		%idx.ext = sext i32 %st1 to i64
%idx.ext63 = sext i32 %st2 to i64		%idx.ext63 = sext i32 %st2 to i64
%0 = load i8, i8* %p1, align 1		%0 = load i8, i8* %p1, align 1
▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -passes=slp-vectorizer -mtriple=arm64-apple-ios -S %s \| FileCheck %s		; RUN: opt -passes=slp-vectorizer -mtriple=arm64-apple-ios -S %s \| FileCheck %s

; Test case where not vectorizing is more profitable because multiple		; Test case where not vectorizing is more profitable because multiple
; fmul/{fadd,fsub} pairs can be lowered to fma instructions.		; fmul/{fadd,fsub} pairs can be lowered to fma instructions.
define void @slp_not_profitable_with_fast_fmf(ptr %A, ptr %B) {		define void @slp_not_profitable_with_fast_fmf(ptr %A, ptr %B) {
; CHECK-LABEL: @slp_not_profitable_with_fast_fmf(		; CHECK-LABEL: @slp_not_profitable_with_fast_fmf(
; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1		; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4		; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
		; CHECK-NEXT: [[B_1:%.*]] = load float, ptr [[GEP_B_1]], align 4
		; CHECK-NEXT: [[MUL_0:%.*]] = fmul fast float [[B_1]], [[A_0]]
; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4		; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4		; CHECK-NEXT: [[GEP_B_2:%.*]] = getelementptr inbounds float, ptr [[B]], i64 2
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0		; CHECK-NEXT: [[B_2:%.*]] = load float, ptr [[GEP_B_2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1		; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[B_2]], [[B_0]]
; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[SUB:%.*]] = fsub fast float [[MUL_0]], [[MUL_1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[B_0]], [[B_1]]
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0		; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[B_2]], [[A_0]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1		; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL_3]], [[MUL_2]]
; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP6]]		; CHECK-NEXT: store float [[SUB]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: [[GEP_A_1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1
; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: store float [[ADD]], ptr [[GEP_A_1]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: store float [[B_2]], ptr [[B]], align 4
; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1		%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
%A.0 = load float, ptr %A, align 4		%A.0 = load float, ptr %A, align 4
%B.1 = load float, ptr %gep.B.1, align 4		%B.1 = load float, ptr %gep.B.1, align 4
%mul.0 = fmul fast float %B.1, %A.0		%mul.0 = fmul fast float %B.1, %A.0
%B.0 = load float, ptr %B, align 4		%B.0 = load float, ptr %B, align 4
%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2		%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
Show All 9 Lines	;
store float %B.2, ptr %B, align 4		store float %B.2, ptr %B, align 4
ret void		ret void
}		}

define void @slp_not_profitable_with_reassoc_fmf(ptr %A, ptr %B) {		define void @slp_not_profitable_with_reassoc_fmf(ptr %A, ptr %B) {
; CHECK-LABEL: @slp_not_profitable_with_reassoc_fmf(		; CHECK-LABEL: @slp_not_profitable_with_reassoc_fmf(
; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1		; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4		; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
		; CHECK-NEXT: [[B_1:%.*]] = load float, ptr [[GEP_B_1]], align 4
		; CHECK-NEXT: [[MUL_0:%.*]] = fmul reassoc float [[B_1]], [[A_0]]
; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4		; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4		; CHECK-NEXT: [[GEP_B_2:%.*]] = getelementptr inbounds float, ptr [[B]], i64 2
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0		; CHECK-NEXT: [[B_2:%.*]] = load float, ptr [[GEP_B_2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1		; CHECK-NEXT: [[MUL_1:%.*]] = fmul float [[B_2]], [[B_0]]
; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[SUB:%.*]] = fsub reassoc float [[MUL_0]], [[MUL_1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[MUL_2:%.*]] = fmul float [[B_0]], [[B_1]]
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0		; CHECK-NEXT: [[MUL_3:%.*]] = fmul reassoc float [[B_2]], [[A_0]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1		; CHECK-NEXT: [[ADD:%.*]] = fadd reassoc float [[MUL_3]], [[MUL_2]]
; CHECK-NEXT: [[TMP7:%.*]] = fmul reassoc <2 x float> [[TMP1]], [[TMP6]]		; CHECK-NEXT: store float [[SUB]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP8:%.*]] = fsub reassoc <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: [[GEP_A_1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1
; CHECK-NEXT: [[TMP9:%.*]] = fadd reassoc <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: store float [[ADD]], ptr [[GEP_A_1]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: store float [[B_2]], ptr [[B]], align 4
; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1		%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
%A.0 = load float, ptr %A, align 4		%A.0 = load float, ptr %A, align 4
%B.1 = load float, ptr %gep.B.1, align 4		%B.1 = load float, ptr %gep.B.1, align 4
%mul.0 = fmul reassoc float %B.1, %A.0		%mul.0 = fmul reassoc float %B.1, %A.0
%B.0 = load float, ptr %B, align 4		%B.0 = load float, ptr %B, align 4
%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2		%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
Show All 10 Lines	;
ret void		ret void
}		}

; FMA cannot be used due to missing fast-math flags, so SLP should kick in.		; FMA cannot be used due to missing fast-math flags, so SLP should kick in.
define void @slp_profitable_missing_fmf_on_fadd_fsub(ptr %A, ptr %B) {		define void @slp_profitable_missing_fmf_on_fadd_fsub(ptr %A, ptr %B) {
; CHECK-LABEL: @slp_profitable_missing_fmf_on_fadd_fsub(		; CHECK-LABEL: @slp_profitable_missing_fmf_on_fadd_fsub(
; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1		; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4		; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
		; CHECK-NEXT: [[B_1:%.*]] = load float, ptr [[GEP_B_1]], align 4
		; CHECK-NEXT: [[MUL_0:%.*]] = fmul fast float [[B_1]], [[A_0]]
; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4		; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4		; CHECK-NEXT: [[GEP_B_2:%.*]] = getelementptr inbounds float, ptr [[B]], i64 2
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0		; CHECK-NEXT: [[B_2:%.*]] = load float, ptr [[GEP_B_2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1		; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[B_2]], [[B_0]]
; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[SUB:%.*]] = fsub float [[MUL_0]], [[MUL_1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[B_0]], [[B_1]]
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0		; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[B_2]], [[A_0]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1		; CHECK-NEXT: [[ADD:%.*]] = fadd float [[MUL_3]], [[MUL_2]]
; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP6]]		; CHECK-NEXT: store float [[SUB]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: [[GEP_A_1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1
; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: store float [[ADD]], ptr [[GEP_A_1]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: store float [[B_2]], ptr [[B]], align 4
; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1		%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
%A.0 = load float, ptr %A, align 4		%A.0 = load float, ptr %A, align 4
%B.1 = load float, ptr %gep.B.1, align 4		%B.1 = load float, ptr %gep.B.1, align 4
%mul.0 = fmul fast float %B.1, %A.0		%mul.0 = fmul fast float %B.1, %A.0
%B.0 = load float, ptr %B, align 4		%B.0 = load float, ptr %B, align 4
%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2		%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
Show All 10 Lines	;
ret void		ret void
}		}

; FMA cannot be used due to missing fast-math flags, so SLP should kick in.		; FMA cannot be used due to missing fast-math flags, so SLP should kick in.
define void @slp_profitable_missing_fmf_on_fmul_fadd_fsub(ptr %A, ptr %B) {		define void @slp_profitable_missing_fmf_on_fmul_fadd_fsub(ptr %A, ptr %B) {
; CHECK-LABEL: @slp_profitable_missing_fmf_on_fmul_fadd_fsub(		; CHECK-LABEL: @slp_profitable_missing_fmf_on_fmul_fadd_fsub(
; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1		; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4		; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
		; CHECK-NEXT: [[B_1:%.*]] = load float, ptr [[GEP_B_1]], align 4
		; CHECK-NEXT: [[MUL_0:%.*]] = fmul float [[B_1]], [[A_0]]
; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4		; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4		; CHECK-NEXT: [[GEP_B_2:%.*]] = getelementptr inbounds float, ptr [[B]], i64 2
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0		; CHECK-NEXT: [[B_2:%.*]] = load float, ptr [[GEP_B_2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1		; CHECK-NEXT: [[MUL_1:%.*]] = fmul float [[B_2]], [[B_0]]
; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[SUB:%.*]] = fsub float [[MUL_0]], [[MUL_1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[MUL_2:%.*]] = fmul float [[B_0]], [[B_1]]
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0		; CHECK-NEXT: [[MUL_3:%.*]] = fmul float [[B_2]], [[A_0]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1		; CHECK-NEXT: [[ADD:%.*]] = fadd float [[MUL_3]], [[MUL_2]]
; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x float> [[TMP1]], [[TMP6]]		; CHECK-NEXT: store float [[SUB]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: [[GEP_A_1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1
; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: store float [[ADD]], ptr [[GEP_A_1]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: store float [[B_2]], ptr [[B]], align 4
; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1		%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
%A.0 = load float, ptr %A, align 4		%A.0 = load float, ptr %A, align 4
%B.1 = load float, ptr %gep.B.1, align 4		%B.1 = load float, ptr %gep.B.1, align 4
%mul.0 = fmul float %B.1, %A.0		%mul.0 = fmul float %B.1, %A.0
%B.0 = load float, ptr %B, align 4		%B.0 = load float, ptr %B, align 4
%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2		%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
Show All 10 Lines	;
ret void		ret void
}		}

; FMA cannot be used due to missing fast-math flags, so SLP should kick in.		; FMA cannot be used due to missing fast-math flags, so SLP should kick in.
define void @slp_profitable_missing_fmf_nnans_only(ptr %A, ptr %B) {		define void @slp_profitable_missing_fmf_nnans_only(ptr %A, ptr %B) {
; CHECK-LABEL: @slp_profitable_missing_fmf_nnans_only(		; CHECK-LABEL: @slp_profitable_missing_fmf_nnans_only(
; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1		; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4		; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
		; CHECK-NEXT: [[B_1:%.*]] = load float, ptr [[GEP_B_1]], align 4
		; CHECK-NEXT: [[MUL_0:%.*]] = fmul nnan float [[B_1]], [[A_0]]
; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4		; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4		; CHECK-NEXT: [[GEP_B_2:%.*]] = getelementptr inbounds float, ptr [[B]], i64 2
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0		; CHECK-NEXT: [[B_2:%.*]] = load float, ptr [[GEP_B_2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1		; CHECK-NEXT: [[MUL_1:%.*]] = fmul nnan float [[B_2]], [[B_0]]
; CHECK-NEXT: [[TMP4:%.*]] = fmul nnan <2 x float> [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[SUB:%.*]] = fsub nnan float [[MUL_0]], [[MUL_1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[MUL_2:%.*]] = fmul nnan float [[B_0]], [[B_1]]
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0		; CHECK-NEXT: [[MUL_3:%.*]] = fmul nnan float [[B_2]], [[A_0]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1		; CHECK-NEXT: [[ADD:%.*]] = fadd nnan float [[MUL_3]], [[MUL_2]]
; CHECK-NEXT: [[TMP7:%.*]] = fmul nnan <2 x float> [[TMP1]], [[TMP6]]		; CHECK-NEXT: store float [[SUB]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP8:%.*]] = fsub nnan <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: [[GEP_A_1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1
; CHECK-NEXT: [[TMP9:%.*]] = fadd nnan <2 x float> [[TMP7]], [[SHUFFLE]]		; CHECK-NEXT: store float [[ADD]], ptr [[GEP_A_1]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: store float [[B_2]], ptr [[B]], align 4
; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1		%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
%A.0 = load float, ptr %A, align 4		%A.0 = load float, ptr %A, align 4
%B.1 = load float, ptr %gep.B.1, align 4		%B.1 = load float, ptr %gep.B.1, align 4
%mul.0 = fmul nnan float %B.1, %A.0		%mul.0 = fmul nnan float %B.1, %A.0
%B.0 = load float, ptr %B, align 4		%B.0 = load float, ptr %B, align 4
%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2		%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S %s \| FileCheck %s

	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
	target triple = "arm64-apple-darwin"			target triple = "arm64-apple-darwin"

	declare void @use(double)			declare void @use(double)

	; The extracts %v1.lane.0 and %v1.lane.1 should be considered free during SLP,			; The extracts %v1.lane.0 and %v1.lane.1 should be considered free during SLP,
	; because they will be directly in a vector register on AArch64.			; because they will be directly in a vector register on AArch64.
	define void @noop_extracts_first_2_lanes(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extracts_first_2_lanes(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extracts_first_2_lanes(			; CHECK-LABEL: @noop_extracts_first_2_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[V2_LANE_3:%.*]] = extractelement <4 x double> [[V_2]], i32 3			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> [[V_1]], [[TMP0]]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_3]], i32 1			; CHECK-NEXT: call void @use(double [[TMP2]])
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[V_1]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: call void @use(double [[TMP3]])			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP1]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 17 Lines
	define void @extracts_first_2_lanes_different_vectors(<2 x double>* %ptr.1, <4 x double>* %ptr.2, <2 x double>* %ptr.3) {			define void @extracts_first_2_lanes_different_vectors(<2 x double>* %ptr.1, <4 x double>* %ptr.2, <2 x double>* %ptr.3) {
	; CHECK-LABEL: @extracts_first_2_lanes_different_vectors(			; CHECK-LABEL: @extracts_first_2_lanes_different_vectors(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8			; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8
	; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1			; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <2 x double> [[V_1]], <2 x double> [[V_3]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V3_LANE_1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V3_LANE_1]])			; CHECK-NEXT: call void @use(double [[V3_LANE_1]])
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8			%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8
	%v3.lane.1 = extractelement <2 x double> %v.3, i32 1			%v3.lane.1 = extractelement <2 x double> %v.3, i32 1

	Show All 17 Lines
	; because they will be directly in a vector register on AArch64.			; because they will be directly in a vector register on AArch64.
	define void @noop_extract_second_2_lanes(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extract_second_2_lanes(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extract_second_2_lanes(			; CHECK-LABEL: @noop_extract_second_2_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_1]], <4 x double> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[TMP3]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <4 x double> %v.1, i32 3			%v1.lane.3 = extractelement <4 x double> %v.1, i32 3

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 13 Lines

	; %v1.lane.0 and %v1.lane.1 are used in reverse-order, so they won't be			; %v1.lane.0 and %v1.lane.1 are used in reverse-order, so they won't be
	; directly in a vector register on AArch64.			; directly in a vector register on AArch64.
	define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extract_reverse_order(			; CHECK-LABEL: @extract_reverse_order(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> [[V_1]], [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[V_1]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: call void @use(double [[TMP4]])			; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[V_1]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: call void @use(double [[TMP5]])
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 15 Lines
	; %v1.lane.1 and %v1.lane.2 are extracted from different vector registers on AArch64.			; %v1.lane.1 and %v1.lane.2 are extracted from different vector registers on AArch64.
	define void @extract_lanes_1_and_2(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extract_lanes_1_and_2(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extract_lanes_1_and_2(			; CHECK-LABEL: @extract_lanes_1_and_2(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_1]], <4 x double> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_1]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_2]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[TMP3]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.1 = extractelement <4 x double> %v.1, i32 1			%v1.lane.1 = extractelement <4 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 19 Lines
	; CHECK-LABEL: @noop_extracts_existing_vector_4_lanes(			; CHECK-LABEL: @noop_extracts_existing_vector_4_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_0]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_1]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_0]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP5]], <4 x double> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x double> [[TMP6]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[TMP7]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP3]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	Show All 23 Lines
	; CHECK-LABEL: @extracts_jumbled_4_lanes(			; CHECK-LABEL: @extracts_jumbled_4_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 2, i32 1, i32 2, i32 0>
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_1]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_3]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[V2_LANE_2]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x double> [[TMP6]], double [[V2_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x double> [[TMP3]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x double> [[TMP8]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[TMP9]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP3]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	Show All 20 Lines

	; Even more complex case where the extracted lanes are directly from a vector			; Even more complex case where the extracted lanes are directly from a vector
	; register on AArch64 and should be considered free, because we can			; register on AArch64 and should be considered free, because we can
	; directly use the source vector register.			; directly use the source vector register.
	define void @noop_extracts_9_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extracts_9_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extracts_9_lanes(			; CHECK-LABEL: @noop_extracts_9_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V1_LANE_4:%.*]] = extractelement <9 x double> [[V_1]], i32 4
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5			; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 0, i32 1>
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 0, i32 2, i32 1, i32 0, i32 2, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_3]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = fmul <8 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_4]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_5]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_6]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_7]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_8]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_0]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_1]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_0]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 1, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x double> [[TMP2]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP12]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP3]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 6, i32 7, i32 8, i32 0, i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_7]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 2, i32 1, i32 0, i32 2, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = fmul <8 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x double> [[TMP23]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 1, i32 2, i32 0, i32 1>
	; CHECK-NEXT: [[TMP24:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <8 x double> [[TMP24]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[TMP6]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP25]], double [[B_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP7]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	}			}

	; Extracted lanes used in first fmul chain are not used in the right order, so			; Extracted lanes used in first fmul chain are not used in the right order, so
	; we cannot reuse the source vector registers directly.			; we cannot reuse the source vector registers directly.
	define void @first_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @first_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @first_mul_chain_jumbled(			; CHECK-LABEL: @first_mul_chain_jumbled(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V1_LANE_4:%.*]] = extractelement <9 x double> [[V_1]], i32 4
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5			; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 4, i32 3, i32 6, i32 5, i32 8, i32 7, i32 1, i32 0>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 1, i32 0, i32 2, i32 0, i32 2, i32 1, i32 0, i32 2>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <8 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_6]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_5]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_1]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_0]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_2]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x double> [[TMP2]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP12]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP3]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 6, i32 7, i32 8, i32 0, i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_7]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = fmul <8 x double> [[TMP4]], [[TMP1]]
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE1]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <8 x double> [[TMP21]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x double> [[TMP5]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP22]], double [[B_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP6]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	}			}

	; Extracted lanes used in both fmul chain are not used in the right order, so			; Extracted lanes used in both fmul chain are not used in the right order, so
	; we cannot reuse the source vector registers directly.			; we cannot reuse the source vector registers directly.
	define void @first_and_second_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @first_and_second_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @first_and_second_mul_chain_jumbled(			; CHECK-LABEL: @first_and_second_mul_chain_jumbled(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V1_LANE_4:%.*]] = extractelement <9 x double> [[V_1]], i32 4			; CHECK-NEXT: [[V1_LANE_4:%.*]] = extractelement <9 x double> [[V_1]], i32 4
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 4, i32 3, i32 5, i32 6, i32 8, i32 7, i32 1, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 0, i32 2, i32 1, i32 2, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_5]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = fmul <8 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_6]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_0]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x double> [[TMP2]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP12]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP3]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_7]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 7, i32 6, i32 8, i32 1, i32 0, i32 3, i32 2, i32 5>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_6]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 2, i32 1, i32 0, i32 2, i32 0, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = fmul <8 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_1]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_0]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_3]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_2]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_5]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x double> [[TMP23]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP24:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_2]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_2]]
	; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <8 x double> [[TMP24]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[TMP6]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP25]], double [[B_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP7]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[TMP3]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[TMP3]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	Show All 29 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])
	; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_31]]			; GFX8-NEXT: ret <4 x i16> [[INS_31]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[TMP3]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[TMP3]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	Show All 29 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])
	; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_31]]			; GFX8-NEXT: ret <4 x i16> [[INS_31]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=slp-vectorizer %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=slp-vectorizer %s \| FileCheck %s

	define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {			define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {
	; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(			; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[ARG0_1:%.*]] = extractelement <9 x i16> undef, i64 7			; CHECK-NEXT: [[TMP0:%.]] = shufflevector <9 x i16> [[ARG0:%.]], <9 x i16> poison, <2 x i32> <i32 undef, i32 8>
	; CHECK-NEXT: [[ARG0_2:%.]] = extractelement <9 x i16> [[ARG0:%.]], i64 8			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <9 x i16> [[ARG1:%.]], <9 x i16> poison, <2 x i32> <i32 7, i32 8>
	; CHECK-NEXT: [[ARG1_1:%.]] = extractelement <9 x i16> [[ARG1:%.]], i64 7			; CHECK-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; CHECK-NEXT: [[ARG1_2:%.*]] = extractelement <9 x i16> [[ARG1]], i64 8			; CHECK-NEXT: ret <2 x i16> [[TMP2]]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> poison, i16 [[ARG0_1]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 [[ARG0_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i16> poison, i16 [[ARG1_1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i16> [[TMP2]], i16 [[ARG1_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP3]])
	; CHECK-NEXT: ret <2 x i16> [[TMP4]]
	;			;
	bb:			bb:
	%arg0.1 = extractelement <9 x i16> undef, i64 7			%arg0.1 = extractelement <9 x i16> undef, i64 7
	%arg0.2 = extractelement <9 x i16> %arg0, i64 8			%arg0.2 = extractelement <9 x i16> %arg0, i64 8
	%arg1.1 = extractelement <9 x i16> %arg1, i64 7			%arg1.1 = extractelement <9 x i16> %arg1, i64 7
	%arg1.2 = extractelement <9 x i16> %arg1, i64 8			%arg1.2 = extractelement <9 x i16> %arg1, i64 8
	%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)			%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)
	%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)			%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)
	%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0			%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0
	%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1			%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1
	ret <2 x i16> %ins.2			ret <2 x i16> %ins.2
	}			}

	declare i16 @llvm.uadd.sat.i16(i16, i16) #0			declare i16 @llvm.uadd.sat.i16(i16, i16) #0
	attributes #0 = { nounwind readnone speculatable willreturn }			attributes #0 = { nounwind readnone speculatable willreturn }

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -passes=slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -passes=slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 0, i32 1, i32 14, i32 15>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; CHECK-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[TMP3]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP14:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP5:%.]] = phi <2 x i32> [ [[TMP14:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP4]])
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE6]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE7]])			; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP7]], [[TMP8]]
	; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP6]])
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])			; CHECK-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP9]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP0]]			; CHECK-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP0]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP0]]			; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP0]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]			; CHECK-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX4]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX4]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i32> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i32> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i32> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i32> [[TMP9]], [[TMP11]]
	; CHECK-NEXT: [[TMP14]] = shufflevector <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP14]] = shufflevector <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; FORCE_REDUCTION-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; FORCE_REDUCTION-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[TMP3]], <16 x i32> poison, <16 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP5:%.]] = phi <2 x i32> [ [[TMP21:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 1240, i32 285, i32 55, i32 0>
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE6]])			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP4]])
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE7]])			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])
	; FORCE_REDUCTION-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]			; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])			; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP10]], <i32 1496, i32 12529>
	; FORCE_REDUCTION-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]			; FORCE_REDUCTION-NEXT: [[TMP12:%.*]] = add <2 x i32> [[TMP10]], <i32 8555, i32 13685>
	; FORCE_REDUCTION-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP13:%.*]] = and <2 x i32> [[TMP11]], [[TMP12]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
	; FORCE_REDUCTION-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]			; FORCE_REDUCTION-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
	; FORCE_REDUCTION-NEXT: [[OP_RDX5:%.*]] = and i32 [[OP_RDX4]], [[TMP4]]			; FORCE_REDUCTION-NEXT: [[OP_RDX9:%.*]] = and i32 [[TMP14]], [[TMP15]]
	; FORCE_REDUCTION-NEXT: [[VAL_43:%.*]] = add i32 [[TMP4]], 14910			; FORCE_REDUCTION-NEXT: [[TMP16:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP6]])
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX5]], i32 0			; FORCE_REDUCTION-NEXT: [[OP_RDX13:%.*]] = and i32 [[TMP16]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[TMP10]] = insertelement <2 x i32> [[TMP9]], i32 [[VAL_43]], i32 1			; FORCE_REDUCTION-NEXT: [[OP_RDX14:%.*]] = and i32 [[TMP0]], [[TMP0]]
				; FORCE_REDUCTION-NEXT: [[OP_RDX15:%.*]] = and i32 [[TMP8]], [[TMP9]]
				; FORCE_REDUCTION-NEXT: [[OP_RDX16:%.*]] = and i32 [[OP_RDX13]], [[OP_RDX14]]
				; FORCE_REDUCTION-NEXT: [[OP_RDX17:%.*]] = and i32 [[OP_RDX16]], [[OP_RDX15]]
				; FORCE_REDUCTION-NEXT: [[OP_RDX11:%.*]] = and i32 [[OP_RDX9]], [[TMP7]]
				; FORCE_REDUCTION-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[OP_RDX17]], i32 1
				; FORCE_REDUCTION-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> <i32 14910, i32 poison>, i32 [[OP_RDX11]], i32 1
				; FORCE_REDUCTION-NEXT: [[TMP19:%.*]] = add <2 x i32> [[TMP17]], [[TMP18]]
				; FORCE_REDUCTION-NEXT: [[TMP20:%.*]] = and <2 x i32> [[TMP17]], [[TMP18]]
				; FORCE_REDUCTION-NEXT: [[TMP21]] = shufflevector <2 x i32> [[TMP19]], <2 x i32> [[TMP20]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]			%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]
	%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]			%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0			; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SSE-NEXT: ret <8 x float> [[R71]]			; SSE-NEXT: ret <8 x float> [[R71]]
	;			;
	; SLM-LABEL: @ceil_floor(			; SLM-LABEL: @ceil_floor(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0			; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R71]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0			; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SSE-NEXT: ret <8 x float> [[R71]]			; SSE-NEXT: ret <8 x float> [[R71]]
	;			;
	; SLM-LABEL: @ceil_floor(			; SLM-LABEL: @ceil_floor(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0			; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R71]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	;
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; CHECK-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
; CHECK-NEXT: [[R71:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R71]]		; CHECK-NEXT: ret <8 x float> [[R71]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
Show All 21 Lines
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[TMP12]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; CHECK-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[TMP12]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	;
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; CHECK-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
; CHECK-NEXT: [[R71:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R71]]		; CHECK-NEXT: ret <8 x float> [[R71]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
Show All 21 Lines
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[TMP12]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; CHECK-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[TMP12]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cmp-swapped-pred.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer -S \| FileCheck %s

	define i16 @test(i16 %call37) {			define i16 @test(i16 %call37) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2			; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 poison, i16 poison>, i16 [[CALL37:%.]], i32 3			; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 poison, i16 poison>, i16 [[CALL37:%.]], i32 3
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 3, i32 4, i32 3, i32 5>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 3, i32 4, i32 3, i32 5>
	; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <8 x i16> [[SHUFFLE]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <8 x i16> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <8 x i16> [[SHUFFLE]], zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <8 x i16> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i1> [[TMP3]], <8 x i1> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i1> [[TMP4]] to <8 x i16>			; CHECK-NEXT: [[TMP6:%.*]] = zext <8 x i1> [[TMP5]] to <8 x i16>
	; CHECK-NEXT: [[TMP6:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP5]])			; CHECK-NEXT: [[TMP7:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP6]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP6]], 0			; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP7]], 0
	; CHECK-NEXT: ret i16 [[OP_RDX]]			; CHECK-NEXT: ret i16 [[OP_RDX]]
	;			;
	entry:			entry:
	%call = load i16, i16* undef, align 2			%call = load i16, i16* undef, align 2
	%0 = icmp slt i16 %call, 0			%0 = icmp slt i16 %call, 0
	%cond = zext i1 %0 to i16			%cond = zext i1 %0 to i16
	%1 = add i16 %cond, 0			%1 = add i16 %cond, 0
	%2 = icmp slt i16 0, 0			%2 = icmp slt i16 0, 0
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i64 2			; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i64 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>			; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i64 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i64 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i64 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i64 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i64 2			; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i64 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>			; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i64 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i64 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i64 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i64 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R71]]
;		;
; SLM-LABEL: @ashr_shl_v8i32_const(		; SLM-LABEL: @ashr_shl_v8i32_const(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SLM-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R71]]		; SLM-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
; AVX1-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
Show All 39 Lines
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6		; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7
; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6		; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R71]]		; SLM-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R71]]		; AVX1-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i64 6		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i64 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @sdiv_v8i32_undefs(		; AVX2-LABEL: @sdiv_v8i32_undefs(
; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1		; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1
; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5		; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5
; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i64 1		; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i64 1
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5		; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @sdiv_v8i32_undefs(		; AVX512-LABEL: @sdiv_v8i32_undefs(
; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1		; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1
; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5		; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5
; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i64 1		; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i64 1
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5		; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
Show All 23 Lines	;
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {		define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {
; CHECK-LABEL: @add_sub_v8i32_splat(		; CHECK-LABEL: @add_sub_v8i32_splat(
; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i64 0
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = add <8 x i32> [[SHUFFLE]], [[A:%.]]		; CHECK-NEXT: [[TMP3:%.]] = add <8 x i32> [[TMP2]], [[A:%.]]
; CHECK-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[SHUFFLE]], [[A]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i32> [[TMP2]], [[A]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[TMP4]]		; CHECK-NEXT: ret <8 x i32> [[TMP5]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R71]]
;		;
; SLM-LABEL: @ashr_shl_v8i32_const(		; SLM-LABEL: @ashr_shl_v8i32_const(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SLM-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R71]]		; SLM-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
; AVX1-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
Show All 39 Lines
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6		; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7
; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6		; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7
; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R71]]		; SLM-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R71]]		; AVX1-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i64 6		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i64 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @sdiv_v8i32_undefs(		; AVX2-LABEL: @sdiv_v8i32_undefs(
; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1		; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1
; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5		; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5
; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i64 1		; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i64 1
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5		; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @sdiv_v8i32_undefs(		; AVX512-LABEL: @sdiv_v8i32_undefs(
; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1		; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i64 1
; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5		; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i64 5
; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i64 1		; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i64 1
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5		; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i64 5
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
Show All 23 Lines	;
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {		define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {
; CHECK-LABEL: @add_sub_v8i32_splat(		; CHECK-LABEL: @add_sub_v8i32_splat(
; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i64 0
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = add <8 x i32> [[SHUFFLE]], [[A:%.]]		; CHECK-NEXT: [[TMP3:%.]] = add <8 x i32> [[TMP2]], [[A:%.]]
; CHECK-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[SHUFFLE]], [[A]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i32> [[TMP2]], [[A]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[TMP4]]		; CHECK-NEXT: ret <8 x i32> [[TMP5]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5			; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
	; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1			; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2			; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3			; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
	; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4			; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5			; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6			; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7			; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1			; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0			; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1
	; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]
	; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1
	; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
	; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1
	; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]
	; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
	; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1
	; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
	; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1
	; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]
	; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
	; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1
	; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
	; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1
	; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]
	; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP21]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R73]]			; SLM-NEXT: ret <8 x double> [[R73]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5			; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
	; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1			; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2			; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3			; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
	; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4			; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5			; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6			; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7			; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1			; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0			; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1
	; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]
	; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1
	; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
	; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1
	; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]
	; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
	; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1
	; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
	; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1
	; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]
	; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
	; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1
	; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
	; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1
	; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]
	; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP21]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R73]]			; SLM-NEXT: ret <8 x double> [[R73]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	;
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i64 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i64 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i64 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i64 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i64 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i64 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i64 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i64 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i64 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i64 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	;
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i64 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i64 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i64 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i64 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i64 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i64 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i64 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i64 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i64 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i64 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/broadcast_long.ll

	Show All 11 Lines
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '2'			; YAML-NEXT: - TreeSize: '2'

	define void @bcast_long(i32 %A, i32 %S) {			define void @bcast_long(i32 %A, i32 %S) {
	; CHECK-LABEL: @bcast_long(			; CHECK-LABEL: @bcast_long(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A0:%.]] = load i32, i32 [[A:%.*]], align 8			; CHECK-NEXT: [[A0:%.]] = load i32, i32 [[A:%.*]], align 8
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[A0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[A0]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> <i32 0, i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> <i32 0, i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IDXS0]] to <8 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[IDXS0]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* [[TMP1]], align 8			; CHECK-NEXT: store <8 x i32> [[TMP1]], <8 x i32>* [[TMP2]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%A0 = load i32, i32 *%A, align 8			%A0 = load i32, i32 *%A, align 8

	%idxS0 = getelementptr inbounds i32, i32* %S, i64 0			%idxS0 = getelementptr inbounds i32, i32* %S, i64 0
	%idxS1 = getelementptr inbounds i32, i32* %S, i64 1			%idxS1 = getelementptr inbounds i32, i32* %S, i64 1
	%idxS2 = getelementptr inbounds i32, i32* %S, i64 2			%idxS2 = getelementptr inbounds i32, i32* %S, i64 2
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP25]], 0.000000e+00			; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP25]], 0.000000e+00
	; CHECK-NEXT: br i1 [[CMP]], label [[CLEANUP:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[CLEANUP:%.]], label [[IF_END:%.]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[CALL:%.*]] = tail call double @sqrt(double noundef [[TMP25]])			; CHECK-NEXT: [[CALL:%.*]] = tail call double @sqrt(double noundef [[TMP25]])
	; CHECK-NEXT: [[FNEG87:%.*]] = fneg double [[TMP12]]			; CHECK-NEXT: [[FNEG87:%.*]] = fneg double [[TMP12]]
	; CHECK-NEXT: [[MUL88:%.*]] = fmul double [[TMP4]], 2.000000e+00			; CHECK-NEXT: [[MUL88:%.*]] = fmul double [[TMP4]], 2.000000e+00
	; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1
	; CHECK-NEXT: [[TMP28:%.*]] = insertelement <2 x double> poison, double [[CALL]], i32 0			; CHECK-NEXT: [[TMP28:%.*]] = shufflevector <2 x double> [[TMP27]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1			; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1
	; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]			; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0			; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0
	; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x double> [[TMP31]], double [[MUL88]], i32 1			; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x double> [[TMP31]], double [[MUL88]], i32 1
	; CHECK-NEXT: [[TMP33:%.*]] = fdiv <2 x double> [[TMP30]], [[TMP32]]			; CHECK-NEXT: [[TMP33:%.*]] = fdiv <2 x double> [[TMP30]], [[TMP32]]
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x double> [[TMP33]], i32 1			; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x double> [[TMP33]], i32 1
	; CHECK-NEXT: [[CMP93:%.*]] = fcmp olt double [[TMP34]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP93:%.*]] = fcmp olt double [[TMP34]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[TMP35:%.*]] = extractelement <2 x double> [[TMP33]], i32 0			; CHECK-NEXT: [[TMP35:%.*]] = extractelement <2 x double> [[TMP33]], i32 0
	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

Show All 10 Lines
@cle32 = external unnamed_addr global [32 x i32], align 16		@cle32 = external unnamed_addr global [32 x i32], align 16


; Check that we correctly detect a splat/broadcast by leveraging the		; Check that we correctly detect a splat/broadcast by leveraging the
; commutativity property of `xor`.		; commutativity property of `xor`.

define void @splat(i8 %a, i8 %b, i8 %c) {		define void @splat(i8 %a, i8 %b, i8 %c) {
; SSE-LABEL: @splat(		; SSE-LABEL: @splat(
; SSE-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0		; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x i8> poison, i8 [[A:%.]], i32 0
; SSE-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1		; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x i8> [[TMP1]], i8 [[B:%.]], i32 1
; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <2 x i8> [[TMP2]], <2 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
; SSE-NEXT: [[TMP3:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0		; SSE-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
; SSE-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> zeroinitializer		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer
; SSE-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]		; SSE-NEXT: [[TMP6:%.*]] = xor <16 x i8> [[TMP3]], [[TMP5]]
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @splat(		; AVX-LABEL: @splat(
; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0		; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> poison, i8 [[A:%.]], i32 0
; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1		; AVX-NEXT: [[TMP2:%.]] = insertelement <2 x i8> [[TMP1]], i8 [[B:%.]], i32 1
; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>		; AVX-NEXT: [[TMP3:%.*]] = shufflevector <2 x i8> [[TMP2]], <2 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
; AVX-NEXT: [[TMP3:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0		; AVX-NEXT: [[TMP4:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> zeroinitializer		; AVX-NEXT: [[TMP5:%.*]] = shufflevector <16 x i8> [[TMP4]], <16 x i8> poison, <16 x i32> zeroinitializer
; AVX-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]		; AVX-NEXT: [[TMP6:%.*]] = xor <16 x i8> [[TMP3]], [[TMP5]]
; AVX-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16		; AVX-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%1 = xor i8 %c, %a		%1 = xor i8 %c, %a
store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16		store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16
%2 = xor i8 %a, %c		%2 = xor i8 %a, %c
store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)		store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)
%3 = xor i8 %a, %c		%3 = xor i8 %a, %c
store i8 %3, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 2)		store i8 %3, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 2)
Show All 26 Lines	;
ret void		ret void
}		}

; Check that we correctly detect that we can have the same opcode on one side by		; Check that we correctly detect that we can have the same opcode on one side by
; leveraging the commutativity property of `xor`.		; leveraging the commutativity property of `xor`.

define void @same_opcode_on_one_side(i32 %a, i32 %b, i32 %c) {		define void @same_opcode_on_one_side(i32 %a, i32 %b, i32 %c) {
; SSE-LABEL: @same_opcode_on_one_side(		; SSE-LABEL: @same_opcode_on_one_side(
; SSE-NEXT: [[ADD1:%.]] = add i32 [[C:%.]], [[A:%.*]]		; SSE-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
; SSE-NEXT: [[ADD2:%.*]] = add i32 [[C]], [[A]]		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
; SSE-NEXT: [[ADD3:%.*]] = add i32 [[A]], [[C]]		; SSE-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
; SSE-NEXT: [[ADD4:%.*]] = add i32 [[C]], [[A]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
; SSE-NEXT: [[TMP1:%.*]] = xor i32 [[ADD1]], [[A]]		; SSE-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16		; SSE-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 undef, i32 4, i32 0>
; SSE-NEXT: [[TMP2:%.]] = xor i32 [[B:%.]], [[ADD2]]		; SSE-NEXT: [[TMP7:%.]] = insertelement <4 x i32> [[TMP6]], i32 [[B:%.]], i32 1
; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1), align 4		; SSE-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP5]], [[TMP7]]
; SSE-NEXT: [[TMP3:%.*]] = xor i32 [[C]], [[ADD3]]		; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
; SSE-NEXT: store i32 [[TMP3]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2), align 4
; SSE-NEXT: [[TMP4:%.*]] = xor i32 [[A]], [[ADD4]]
; SSE-NEXT: store i32 [[TMP4]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @same_opcode_on_one_side(		; AVX-LABEL: @same_opcode_on_one_side(
; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0		; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0		; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer		; AVX-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
; AVX-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; AVX-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 1		; AVX-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 undef, i32 4, i32 0>
; AVX-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[C]], i32 2		; AVX-NEXT: [[TMP7:%.]] = insertelement <4 x i32> [[TMP6]], i32 [[B:%.]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 3		; AVX-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP5]], [[TMP7]]
; AVX-NEXT: [[TMP7:%.*]] = xor <4 x i32> [[TMP3]], [[TMP6]]		; AVX-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
; AVX-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%add1 = add i32 %c, %a		%add1 = add i32 %c, %a
%add2 = add i32 %c, %a		%add2 = add i32 %c, %a
%add3 = add i32 %a, %c		%add3 = add i32 %a, %c
%add4 = add i32 %c, %a		%add4 = add i32 %c, %a
%1 = xor i32 %add1, %a		%1 = xor i32 %add1, %a
store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16		store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16
%2 = xor i32 %b, %add2		%2 = xor i32 %b, %add2
store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)		store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)
%3 = xor i32 %c, %add3		%3 = xor i32 %c, %add3
store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)		store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)
%4 = xor i32 %a, %add4		%4 = xor i32 %a, %add4
store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)		store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 24 Lines
	; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP5]], <2 x i32> <i32 0, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP11]], [[TMP9]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP11]], [[TMP9]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> [[TMP5]], <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP13]], <2 x double> [[TMP6]], <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <2 x double> [[TMP13]], undef
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	; CHECK: label:			; CHECK: label:
	; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x double> [ [[TMP12]], [[BB1]] ], [ [[TMP15]], [[BB2]] ]			; CHECK-NEXT: [[TMP15:%.*]] = phi <2 x double> [ [[TMP12]], [[BB1]] ], [ [[TMP14]], [[BB2]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%i10 = fdiv fast double %0, %1			%i10 = fdiv fast double %0, %1
	%ix = fmul double %i10, undef			%ix = fmul double %i10, undef
	%ixx0 = fsub double undef, undef			%ixx0 = fsub double undef, undef
	%ixx1 = fsub double undef, undef			%ixx1 = fsub double undef, undef
	%ixx2 = fsub double undef, undef			%ixx2 = fsub double undef, undef
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show All 28 Lines
	; CHECK: cond.true48.us:			; CHECK: cond.true48.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]
	; CHECK: cond.false66.us:			; CHECK: cond.false66.us:
	; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef			; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double poison, double 0xBFA5CC2D1960285F>, double [[ADD_I276_US]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double poison, double 0xBFA5CC2D1960285F>, double [[ADD_I276_US]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> <double 0.000000e+00, double undef>, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> <double 0.000000e+00, double undef>, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.400000e+02, double 1.400000e+02>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.400000e+02, double 1.400000e+02>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 5.000000e+01, double 5.200000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 5.000000e+01, double 5.200000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP6]], align 8			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP4]], i32 0			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP7]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cond.true63.us:			; CHECK: cond.true63.us:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: for.body42.lr.ph.us:			; CHECK: for.body42.lr.ph.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]
	; CHECK: _Z5clampd.exit.1:			; CHECK: _Z5clampd.exit.1:
	; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]			; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]
	;			;
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }			%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }

	define void @_Z8radianceRK3RayiPt() #0 {			define void @_Z8radianceRK3RayiPt() #0 {
	; CHECK-LABEL: @_Z8radianceRK3RayiPt(			; CHECK-LABEL: @_Z8radianceRK3RayiPt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]
	; CHECK: if.then38:			; CHECK: if.then38:
	; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double undef, double poison>, double undef, i32 1			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> undef, [[TMP0]]			; CHECK-NEXT: store <2 x double> undef, <2 x double>* [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> undef, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> undef, [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; CHECK-NEXT: br label [[RETURN:%.*]]			; CHECK-NEXT: br label [[RETURN:%.*]]
	; CHECK: if.then78:			; CHECK: if.then78:
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.then78, label %if.then38			br i1 undef, label %if.then78, label %if.then38
	Show All 34 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 15 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00			; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP5]], 4.000000e+00
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP2]], double [[MUL11]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[MUL11]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 7.000000e+00, double 8.000000e+00>			; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*			; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %G, i64 5			%arrayidx = getelementptr inbounds double, double* %G, i64 5
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	%add = fadd double %mul, 1.000000e+00			%add = fadd double %mul, 1.000000e+00
	store double %add, double* %G, align 8			store double %add, double* %G, align 8
	Show All 23 Lines
	define i32 @foo(double* nocapture %A, i32 %n) {			define i32 @foo(double* nocapture %A, i32 %n) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double			; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <4 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <4 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP1]], <double 7.900000e+00, double 7.700000e+00, double 7.600000e+00, double 7.400000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP1]], <double 7.900000e+00, double 7.700000e+00, double 7.600000e+00, double 7.400000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> poison, double [[CONV]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> poison, double [[CONV]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP3]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[TMP3]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <4 x double> [[SHUFFLE]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <4 x double> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x double> [[TMP4]], <double 6.000000e+00, double 2.000000e+00, double 3.000000e+00, double 4.000000e+00>			; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x double> [[TMP5]], <double 6.000000e+00, double 2.000000e+00, double 3.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[A]] to <4 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[A]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP6]], align 8			; CHECK-NEXT: store <4 x double> [[TMP6]], <4 x double>* [[TMP7]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load double, double* %A, align 8			%0 = load double, double* %A, align 8
	%mul = fmul double %0, 7.900000e+00			%mul = fmul double %0, 7.900000e+00
	%conv = sitofp i32 %n to double			%conv = sitofp i32 %n to double
	%mul1 = fmul double %conv, %mul			%mul1 = fmul double %conv, %mul
	%add = fadd double %mul1, 6.000000e+00			%add = fadd double %mul1, 6.000000e+00
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	define i32 @foo4(double* nocapture %A, i32 %n) {			define i32 @foo4(double* nocapture %A, i32 %n) {
	; CHECK-LABEL: @foo4(			; CHECK-LABEL: @foo4(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double			; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <4 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <4 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP1]], <double 7.900000e+00, double 7.900000e+00, double 7.900000e+00, double 7.900000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP1]], <double 7.900000e+00, double 7.900000e+00, double 7.900000e+00, double 7.900000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> poison, double [[CONV]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> poison, double [[CONV]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP3]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[TMP3]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <4 x double> [[SHUFFLE]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <4 x double> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x double> [[TMP4]], <double 6.000000e+00, double 6.000000e+00, double 6.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x double> [[TMP5]], <double 6.000000e+00, double 6.000000e+00, double 6.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[A]] to <4 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[A]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP6]], align 8			; CHECK-NEXT: store <4 x double> [[TMP6]], <4 x double>* [[TMP7]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load double, double* %A, align 8			%0 = load double, double* %A, align 8
	%mul = fmul double %0, 7.900000e+00			%mul = fmul double %0, 7.900000e+00
	%conv = sitofp i32 %n to double			%conv = sitofp i32 %n to double
	%mul1 = fmul double %conv, %mul			%mul1 = fmul double %conv, %mul
	%add = fadd double %mul1, 6.000000e+00			%add = fadd double %mul1, 6.000000e+00
	▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-2 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-2 \| FileCheck %s

	define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast(			; CHECK-LABEL: @diamond_broadcast(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i32> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[B:%.]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	%mul8 = mul i32 %ld, %ld			%mul8 = mul i32 %ld, %ld
	%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1			%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1
	store i32 %mul8, i32* %arrayidx9, align 4			store i32 %mul8, i32* %arrayidx9, align 4
	%mul14 = mul i32 %ld, %ld			%mul14 = mul i32 %ld, %ld
	%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2			%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2
	store i32 %mul14, i32* %arrayidx15, align 4			store i32 %mul14, i32* %arrayidx15, align 4
	%mul20 = mul i32 %ld, undef			%mul20 = mul i32 %ld, undef
	%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3			%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3
	store i32 %mul20, i32* %arrayidx21, align 4			store i32 %mul20, i32* %arrayidx21, align 4
	ret i32 0			ret i32 0
	}			}

	define i32 @diamond_broadcast2(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast2(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast2(			; CHECK-LABEL: @diamond_broadcast2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> <i32 poison, i32 undef, i32 poison, i32 poison>, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i32> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[B:%.]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	%mul8 = mul i32 %ld, %ld			%mul8 = mul i32 %ld, %ld
	%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1			%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1
	store i32 %mul8, i32* %arrayidx9, align 4			store i32 %mul8, i32* %arrayidx9, align 4
	%mul14 = mul i32 %ld, %ld			%mul14 = mul i32 %ld, %ld
	%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2			%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2
	store i32 %mul14, i32* %arrayidx15, align 4			store i32 %mul14, i32* %arrayidx15, align 4
	%mul20 = mul i32 undef, %ld			%mul20 = mul i32 undef, %ld
	%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3			%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3
	store i32 %mul20, i32* %arrayidx21, align 4			store i32 %mul20, i32* %arrayidx21, align 4
	ret i32 0			ret i32 0
	}			}

	define i32 @diamond_broadcast3(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast3(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast3(			; CHECK-LABEL: @diamond_broadcast3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> <i32 poison, i32 undef, i32 poison, i32 poison>, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i32> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[B:%.]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	%mul8 = mul i32 %ld, %ld			%mul8 = mul i32 %ld, %ld
	%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1			%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s

	define i64 @foo(i32 %tmp7) {			define i64 @foo(i32 %tmp7) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2
	; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, <8 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 undef, i32 4			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 4, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> [[TMP3]], <8 x i32> <i32 8, i32 9, i32 2, i32 11, i32 12, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[SHUFFLE]]			; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[SHUFFLE]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP5]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
	; CHECK-NEXT: [[TMP7:%.*]] = add <8 x i32> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = add <8 x i32> zeroinitializer, [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = xor <8 x i32> [[TMP7]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = xor <8 x i32> [[TMP8]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP9]])
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP9]], [[TMP10]]			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64			; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64
	; CHECK-NEXT: ret i64 [[TMP64]]			; CHECK-NEXT: ret i64 [[TMP64]]
	;			;
	bb:			bb:
	%tmp = sub i32 0, 0			%tmp = sub i32 0, 0
	%tmp2 = sub nsw i32 0, %tmp			%tmp2 = sub nsw i32 0, %tmp
	%tmp3 = add i32 0, %tmp2			%tmp3 = add i32 0, %tmp2
	%tmp4 = xor i32 %tmp3, 0			%tmp4 = xor i32 %tmp3, 0
	Show All 40 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s

	define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x i8> [[X:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[Y1:%.]] = extractelement <2 x i8> [[Y:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i8> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i8> poison, i8 [[X0]], i32 0			; CHECK-NEXT: ret <2 x i8> [[TMP2]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i8> [[TMP1]], i8 [[Y1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = mul <2 x i8> [[TMP2]], [[TMP2]]
	; CHECK-NEXT: ret <2 x i8> [[TMP3]]
	;			;
	%x0 = extractelement <2 x i8> %x, i32 0			%x0 = extractelement <2 x i8> %x, i32 0
	%y1 = extractelement <2 x i8> %y, i32 1			%y1 = extractelement <2 x i8> %y, i32 1
	%x0x0 = mul i8 %x0, %x0			%x0x0 = mul i8 %x0, %x0
	%y1y1 = mul i8 %y1, %y1			%y1y1 = mul i8 %y1, %y1
	%ins1 = insertelement <2 x i8> poison, i8 %x0x0, i32 0			%ins1 = insertelement <2 x i8> poison, i8 %x0x0, i32 0
	%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1			%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
	ret <2 x i8> %ins2			ret <2 x i8> %ins2
	}			}

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s

	define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x i8> [[X:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[Y1:%.]] = extractelement <2 x i8> [[Y:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i8> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i8> poison, i8 [[X0]], i32 0			; CHECK-NEXT: ret <2 x i8> [[TMP2]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i8> [[TMP1]], i8 [[Y1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = mul <2 x i8> [[TMP2]], [[TMP2]]
	; CHECK-NEXT: ret <2 x i8> [[TMP3]]
	;			;
	%x0 = extractelement <2 x i8> %x, i32 0			%x0 = extractelement <2 x i8> %x, i32 0
	%y1 = extractelement <2 x i8> %y, i32 1			%y1 = extractelement <2 x i8> %y, i32 1
	%x0x0 = mul i8 %x0, %x0			%x0x0 = mul i8 %x0, %x0
	%y1y1 = mul i8 %y1, %y1			%y1y1 = mul i8 %y1, %y1
	%ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0			%ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0
	%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1			%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
	ret <2 x i8> %ins2			ret <2 x i8> %ins2
	}			}

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

define void @fextr2(double* %ptr) {		define void @fextr2(double* %ptr) {
; CHECK-LABEL: @fextr2(		; CHECK-LABEL: @fextr2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32		; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32
; CHECK-NEXT: [[V0:%.*]] = extractelement <4 x double> [[LD]], i32 0
; CHECK-NEXT: [[V1:%.*]] = extractelement <4 x double> [[LD]], i32 1
; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0		; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0]], i32 0		; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[LD]], <4 x double> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> [[TMP0]], <double 5.500000e+00, double 6.600000e+00>
; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 5.500000e+00, double 6.600000e+00>		; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[P0]] to <2 x double>*
; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[P0]] to <2 x double>*		; CHECK-NEXT: store <2 x double> [[TMP1]], <2 x double>* [[TMP2]], align 4
; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%LD = load <4 x double>, <4 x double>* undef		%LD = load <4 x double>, <4 x double>* undef
%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.		%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.
%V1 = extractelement <4 x double> %LD, i32 1		%V1 = extractelement <4 x double> %LD, i32 1
%P0 = getelementptr inbounds double, double* %ptr, i64 0		%P0 = getelementptr inbounds double, double* %ptr, i64 0
%P1 = getelementptr inbounds double, double* %ptr, i64 1		%P1 = getelementptr inbounds double, double* %ptr, i64 1
%A0 = fadd double %V0, 5.5		%A0 = fadd double %V0, 5.5
%A1 = fadd double %V1, 6.6		%A1 = fadd double %V1, 6.6
store double %A0, double* %P0, align 4		store double %A0, double* %P0, align 4
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+avx2 -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+avx2 -pass-remarks-output=%t \| FileCheck %s
	; RUN: FileCheck %s --input-file=%t --check-prefix=YAML			; RUN: FileCheck %s --input-file=%t --check-prefix=YAML

	; YAML: --- !Passed			; YAML: --- !Passed
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML: Name: VectorizedList			; YAML: Name: VectorizedList
	; YAML: Function: multi_uses			; YAML: Function: multi_uses
	; YAML: Args:			; YAML: Args:
	; YAML: - String: 'SLP vectorized with cost '			; YAML: - String: 'SLP vectorized with cost '
	; YAML: - Cost: '-1'			; YAML: - Cost: '-1'
	; YAML: - String: ' and with tree size '			; YAML: - String: ' and with tree size '
	; YAML: - TreeSize: '3'			; YAML: - TreeSize: '3'

	define float @multi_uses(<2 x float> %x, <2 x float> %y) {			define float @multi_uses(<2 x float> %x, <2 x float> %y) {
	; CHECK-LABEL: @multi_uses(			; CHECK-LABEL: @multi_uses(
	; CHECK-NEXT: [[Y1:%.]] = extractelement <2 x float> [[Y:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[Y:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[Y1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = fmul <2 x float> [[X:%.]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[Y1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = fmul <2 x float> [[X:%.]], [[TMP2]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%y1 = extractelement <2 x float> %y, i32 1			%y1 = extractelement <2 x float> %y, i32 1
	%x0x0 = fmul float %x0, %y1			%x0x0 = fmul float %x0, %y1
	%x1x1 = fmul float %x1, %y1			%x1x1 = fmul float %x1, %y1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0			; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0
	; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1			; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1
	; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]			; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], [[X]]
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]			; THRESH1-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], [[X]]
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]			; THRESH2-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/gather-extractelements-different-bbs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux -mattr="-avx512pf,+avx512f,+avx512bw" -slp-threshold=-100 -slp-min-tree-size=0 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux -mattr="-avx512pf,+avx512f,+avx512bw" -slp-threshold=-100 -slp-min-tree-size=0 < %s \| FileCheck %s

	define i32 @foo(i32 %a) {			define i32 @foo(i32 %a) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 0, i32 poison>, i32 [[A:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 0, i32 poison>, i32 [[A:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <2 x i32> zeroinitializer, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <2 x i32> zeroinitializer, [[TMP0]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0
	; CHECK-NEXT: br i1 false, label [[BB5:%.]], label [[BB1:%.]]			; CHECK-NEXT: br i1 false, label [[BB5:%.]], label [[BB1:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[SHUFFLE]])
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[OP_RDX14:%.*]] = add i32 [[TMP3]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP3]], i32 1			; CHECK-NEXT: [[OP_RDX15:%.*]] = add i32 [[OP_RDX14]], 0
	; CHECK-NEXT: [[SHUFFLE15:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[SHUFFLE15]])
	; CHECK-NEXT: [[OP_RDX16:%.*]] = add i32 [[TMP6]], 0
	; CHECK-NEXT: [[OP_RDX17:%.*]] = add i32 [[OP_RDX16]], 0
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[P1:%.]] = phi i32 [ [[OP_RDX17]], [[BB1]] ], [ 0, [[BB2:%.]] ]			; CHECK-NEXT: [[P1:%.]] = phi i32 [ [[OP_RDX15]], [[BB1]] ], [ 0, [[BB2:%.]] ]
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[SHUFFLE10:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[SHUFFLE]], [[TMP4]]
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE10]]			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP9]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP9]], zeroinitializer
	; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i32> [[TMP12]], zeroinitializer			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1			; CHECK-NEXT: [[OP_RDX12:%.*]] = add i32 [[TMP11]], [[TMP12]]
	; CHECK-NEXT: [[OP_RDX13:%.*]] = add i32 [[TMP14]], [[TMP15]]			; CHECK-NEXT: [[OP_RDX13:%.*]] = add i32 [[OP_RDX12]], [[TMP2]]
	; CHECK-NEXT: [[OP_RDX14:%.*]] = add i32 [[OP_RDX13]], [[TMP2]]			; CHECK-NEXT: ret i32 [[OP_RDX13]]
	; CHECK-NEXT: ret i32 [[OP_RDX14]]
	; CHECK: bb5:			; CHECK: bb5:
	; CHECK-NEXT: br label [[BB4:%.*]]			; CHECK-NEXT: br label [[BB4:%.*]]
	;			;
	entry:			entry:
	%0 = sub nsw i32 0, %a			%0 = sub nsw i32 0, %a
	%local = sub nsw i32 0, 0			%local = sub nsw i32 0, 0
	br i1 false, label %bb5, label %bb1			br i1 false, label %bb5, label %bb1

	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	;
%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6		%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6
%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7		%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7
ret <8 x i16> %r07		ret <8 x i16> %r07
}		}

; PR41892		; PR41892
define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v2f32_store(		; CHECK-LABEL: @test_v4f32_v2f32_store(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[F]], <4 x float> undef, <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[F]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[P:%.]] to <2 x float>		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[P:%.]] to <2 x float>
; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%add01 = fadd float %x0, %x1		%add01 = fadd float %x0, %x1
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64_partial_swizzle(		; CHECK-LABEL: @test_v4f64_partial_swizzle(
; CHECK-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i64 2		; CHECK-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i64 2
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3		; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]		; CHECK-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
; CHECK-NEXT: [[R021:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>
; CHECK-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R021]], double [[R3]], i64 3		; CHECK-NEXT: [[R03:%.*]] = insertelement <4 x double> [[TMP4]], double [[R3]], i64 3
; CHECK-NEXT: ret <4 x double> [[R03]]		; CHECK-NEXT: ret <4 x double> [[R03]]
;		;
%a0 = extractelement <4 x double> %a, i64 0		%a0 = extractelement <4 x double> %a, i64 0
%a1 = extractelement <4 x double> %a, i64 1		%a1 = extractelement <4 x double> %a, i64 1
%b0 = extractelement <4 x double> %b, i64 0		%b0 = extractelement <4 x double> %b, i64 0
%b1 = extractelement <4 x double> %b, i64 1		%b1 = extractelement <4 x double> %b, i64 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
▲ Show 20 Lines • Show All 224 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	;
%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6		%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6
%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7		%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7
ret <8 x i16> %r07		ret <8 x i16> %r07
}		}

; PR41892		; PR41892
define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v2f32_store(		; CHECK-LABEL: @test_v4f32_v2f32_store(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[F]], <4 x float> undef, <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[F]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[P:%.]] to <2 x float>		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[P:%.]] to <2 x float>
; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%add01 = fadd float %x0, %x1		%add01 = fadd float %x0, %x1
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64_partial_swizzle(		; CHECK-LABEL: @test_v4f64_partial_swizzle(
; CHECK-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i64 2		; CHECK-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i64 2
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3		; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]		; CHECK-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
; CHECK-NEXT: [[R021:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>
; CHECK-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R021]], double [[R3]], i64 3		; CHECK-NEXT: [[R03:%.*]] = insertelement <4 x double> [[TMP4]], double [[R3]], i64 3
; CHECK-NEXT: ret <4 x double> [[R03]]		; CHECK-NEXT: ret <4 x double> [[R03]]
;		;
%a0 = extractelement <4 x double> %a, i64 0		%a0 = extractelement <4 x double> %a, i64 0
%a1 = extractelement <4 x double> %a, i64 1		%a1 = extractelement <4 x double> %a, i64 1
%b0 = extractelement <4 x double> %b, i64 0		%b0 = extractelement <4 x double> %b, i64 0
%b1 = extractelement <4 x double> %b, i64 1		%b1 = extractelement <4 x double> %b, i64 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
▲ Show 20 Lines • Show All 224 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hoist.ll

	Show All 10 Lines
	; A[i+2] += n;			; A[i+2] += n;
	; A[i+3] += k;			; A[i+3] += k;
	; }			; }
	;}			;}

	define i32 @foo(i32* nocapture %A, i32 %n, i32 %k) {			define i32 @foo(i32* nocapture %A, i32 %n, i32 %k) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[N:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[N:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> [[TMP0]], i32 [[K:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[K:%.]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_024:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD10:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_024:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD10:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[I_024]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[I_024]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

Show First 20 Lines • Show All 825 Lines • ▼ Show 20 Lines
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X:%.]] to <8 x float>		; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X:%.]] to <8 x float>
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4		; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], 5.000000e+00		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], 5.000000e+00
; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[OP_RDX]], i32 0		; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[OP_RDX]], i32 0
; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[CONV]], i32 1		; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[CONV]], i32 1
; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> <float 8.000000e+00, float poison>, float [[CONV]], i32 1		; THRESHOLD-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> <float 8.000000e+00, float poison>, <2 x i32> <i32 2, i32 1>
; THRESHOLD-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP4]], [[TMP5]]		; THRESHOLD-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP4]], [[TMP5]]
; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0		; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1		; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP7]], [[TMP8]]		; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP7]], [[TMP8]]
; THRESHOLD-NEXT: ret float [[OP_RDX3]]		; THRESHOLD-NEXT: ret float [[OP_RDX3]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	;
%add5 = fadd fast float %add4.2, %a		%add5 = fadd fast float %add4.2, %a
ret float %add5		ret float %add5
}		}

define i32 @wobble(i32 %arg, i32 %bar) {		define i32 @wobble(i32 %arg, i32 %bar) {
; CHECK-LABEL: @wobble(		; CHECK-LABEL: @wobble(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0		; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0
; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; CHECK-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer		; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <4 x i32> [[TMP4]], zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>		; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i1> [[TMP6]] to <4 x i32>
; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP6]], [[TMP3]]		; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[ARG]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], [[ARG]]
; CHECK-NEXT: ret i32 [[OP_RDX2]]		; CHECK-NEXT: ret i32 [[OP_RDX1]]
;		;
; THRESHOLD-LABEL: @wobble(		; THRESHOLD-LABEL: @wobble(
; THRESHOLD-NEXT: bb:		; THRESHOLD-NEXT: bb:
; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0		; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0
; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer		; THRESHOLD-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
; THRESHOLD-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0
; THRESHOLD-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; THRESHOLD-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
; THRESHOLD-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; THRESHOLD-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[TMP1]], [[TMP3]]
; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3		; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; THRESHOLD-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer		; THRESHOLD-NEXT: [[TMP6:%.*]] = icmp eq <4 x i32> [[TMP4]], zeroinitializer
; THRESHOLD-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>		; THRESHOLD-NEXT: [[TMP7:%.*]] = sext <4 x i1> [[TMP6]] to <4 x i32>
; THRESHOLD-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])		; THRESHOLD-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP6]], [[TMP3]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP8]], [[TMP5]]
; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[ARG]]		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], [[ARG]]
; THRESHOLD-NEXT: ret i32 [[OP_RDX2]]		; THRESHOLD-NEXT: ret i32 [[OP_RDX1]]
;		;
bb:		bb:
%x1 = xor i32 %arg, %bar		%x1 = xor i32 %arg, %bar
%i1 = icmp eq i32 %x1, 0		%i1 = icmp eq i32 %x1, 0
%s1 = sext i1 %i1 to i32		%s1 = sext i1 %i1 to i32
%x2 = xor i32 %arg, %bar		%x2 = xor i32 %arg, %bar
%i2 = icmp eq i32 %x2, 0		%i2 = icmp eq i32 %x2, 0
%s2 = sext i1 %i2 to i32		%s2 = sext i1 %i2 to i32
Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-const-undef.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -slp-threshold=0 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -slp-threshold=0 < %s \| FileCheck %s

	define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) {			define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) {
	; CHECK-LABEL: @simple_select(			; CHECK-LABEL: @simple_select(
	; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[TMP1]], zeroinitializer
	; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1			; CHECK-NEXT: [[TMP4:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[TMP3]], <2 x float> [[TMP4]]
	; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0			; CHECK-NEXT: ret <4 x float> [[TMP6]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[A1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP5]], <2 x float> [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x float> [[TMP9]]
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%cmp0 = icmp ne i32 %c0, 0			%cmp0 = icmp ne i32 %c0, 0
	%cmp1 = icmp ne i32 %c1, 0			%cmp1 = icmp ne i32 %c1, 0
	%s0 = select i1 %cmp0, float %a0, float %b0			%s0 = select i1 %cmp0, float %a0, float %b0
	%s1 = select i1 %cmp1, float %a1, float %b1			%s1 = select i1 %cmp1, float %a1, float %b1
	%ra = insertelement <4 x float> <float poison, float poison, float undef, float undef>, float %s0, i32 0			%ra = insertelement <4 x float> <float poison, float poison, float undef, float undef>, float %s0, i32 0
	%rb = insertelement <4 x float> %ra, float %s1, i32 1			%rb = insertelement <4 x float> %ra, float %s1, i32 1
	ret <4 x float> %rb			ret <4 x float> %rb
	}			}

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]		; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]
; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]		; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]
; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[S0]], i32 0		; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[S0]], i32 0
; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1		; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1
; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2		; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2
; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3		; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3
; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0		; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0
; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1		; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1
; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0		; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q1]], i32 1
; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2		; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2
; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3		; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0		; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1
; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]		; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]		; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0		; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1		; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q5]], i32 1
; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]		; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]		; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])		; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
; MINTREESIZE-NEXT: ret <4 x float> undef		; MINTREESIZE-NEXT: ret <4 x float> undef
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[TMP1]], zeroinitializer
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[TMP3:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[TMP4:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[TMP3]], <2 x float> [[TMP4]]
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP7:%.*]] = icmp ne <2 x i32> [[TMP6]], zeroinitializer
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP8]], <2 x float> [[TMP9]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> [[TMP12]], <4 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[A1]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP5]], <2 x float> [[TMP7]]
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[C3]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
; CHECK-NEXT: ret <4 x float> [[RD1]]		; CHECK-NEXT: ret <4 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]		; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]
; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]		; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]
; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[S0]], i32 0		; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[S0]], i32 0
; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1		; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1
; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2		; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2
; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3		; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3
; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0		; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0
; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1		; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1
; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0		; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q1]], i32 1
; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2		; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2
; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3		; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0		; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1
; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]		; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]		; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0		; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1		; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q5]], i32 1
; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]		; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]		; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])		; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
; MINTREESIZE-NEXT: ret <4 x float> undef		; MINTREESIZE-NEXT: ret <4 x float> undef
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[TMP1]], zeroinitializer
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[TMP3:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[TMP4:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[TMP3]], <2 x float> [[TMP4]]
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP7:%.*]] = icmp ne <2 x i32> [[TMP6]], zeroinitializer
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP8]], <2 x float> [[TMP9]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> [[TMP12]], <4 x float> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[A1]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP5]], <2 x float> [[TMP7]]
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[C3]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
; CHECK-NEXT: ret <4 x float> [[RD1]]		; CHECK-NEXT: ret <4 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	%struct.sw = type { float, float, float, float }			%struct.sw = type { float, float, float, float }

	define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {			define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0			; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16			; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> <float undef, float poison, float poison, float undef>, float [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef			; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1			; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/insertelement-postpone.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define <4 x double> @test(double* %p2, double %i1754, double %i1781, double %i1778) {			define <4 x double> @test(double* %p2, double %i1754, double %i1781, double %i1778) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[I1771:%.]] = getelementptr inbounds double, double [[P2:%.*]], i64 54			; CHECK-NEXT: [[I1771:%.]] = getelementptr inbounds double, double [[P2:%.*]], i64 54
	; CHECK-NEXT: [[I1772:%.]] = load double, double [[I1771]], align 8			; CHECK-NEXT: [[I1772:%.]] = load double, double [[I1771]], align 8
	; CHECK-NEXT: [[I1795:%.]] = getelementptr inbounds double, double [[P2]], i64 55			; CHECK-NEXT: [[I1795:%.]] = getelementptr inbounds double, double [[P2]], i64 55
	; CHECK-NEXT: [[I1796:%.]] = load double, double [[I1795]], align 8			; CHECK-NEXT: [[I1796:%.]] = load double, double [[I1795]], align 8
	; CHECK-NEXT: [[I1797:%.]] = fmul fast double [[I1796]], [[I1781:%.]]			; CHECK-NEXT: [[I1797:%.]] = fmul fast double [[I1796]], [[I1781:%.]]
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x double> poison, double [[I1754:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x double> poison, double [[I1754:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> [[TMP0]], double [[I1778:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> [[TMP0]], double [[I1778:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[I1781]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[I1781]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[I1772]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[I1772]], i32 3
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP0]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[TMP3]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <4 x double> [[TMP3]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x double> [[TMP3]], [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double poison>, double [[I1797]], i32 3			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double poison>, double [[I1797]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <4 x double> [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <4 x double> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: ret <4 x double> [[TMP6]]			; CHECK-NEXT: ret <4 x double> [[TMP7]]
	;			;
	entry:			entry:
	%i1771 = getelementptr inbounds double, double* %p2, i64 54			%i1771 = getelementptr inbounds double, double* %p2, i64 54
	%i1772 = load double, double* %i1771, align 8			%i1772 = load double, double* %i1771, align 8
	%i1773 = fmul fast double %i1772, %i1754			%i1773 = fmul fast double %i1772, %i1754
	%i1782 = fmul fast double %i1754, %i1754			%i1782 = fmul fast double %i1754, %i1754
	%i1783 = fadd fast double %i1782, 1.000000e+00			%i1783 = fadd fast double %i1782, 1.000000e+00
	%i1787 = fmul fast double %i1778, %i1754			%i1787 = fmul fast double %i1778, %i1754
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, <4 x i32> <i32 4, i32 1, i32 6, i32 7>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4			%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

	Show All 32 Lines
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP11]], i32 2			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP11]], i32 2
	; CHECK-NEXT: store float [[TMP12]], float* @c, align 4			; CHECK-NEXT: store float [[TMP12]], float* @c, align 4
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP11]], i32 0
	; CHECK-NEXT: store float [[TMP13]], float* @d, align 4			; CHECK-NEXT: store float [[TMP13]], float* @d, align 4
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP11]], i32 3			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP11]], i32 3
	; CHECK-NEXT: store float [[TMP14]], float* @e, align 4			; CHECK-NEXT: store float [[TMP14]], float* @e, align 4
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP11]], i32 1
	; CHECK-NEXT: store float [[TMP15]], float* @f, align 4			; CHECK-NEXT: store float [[TMP15]], float* @f, align 4
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <4 x float> [[SHUFFLE]], <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, <4 x i32> <i32 undef, i32 5, i32 0, i32 7>
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP16]], <4 x float> [[SHUFFLE]], <4 x i32> <i32 0, i32 1, i32 4, i32 3>			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x float> [[TMP16]], float [[CONV19]], i32 0
	; CHECK-NEXT: [[TMP18:%.*]] = fsub <4 x float> [[TMP11]], [[TMP17]]			; CHECK-NEXT: [[TMP18:%.*]] = fsub <4 x float> [[TMP11]], [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.*]] = fadd <4 x float> [[TMP11]], [[TMP17]]			; CHECK-NEXT: [[TMP19:%.*]] = fadd <4 x float> [[TMP11]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> [[TMP19]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> [[TMP19]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP21:%.*]] = fptosi <4 x float> [[TMP20]] to <4 x i32>			; CHECK-NEXT: [[TMP21:%.*]] = fptosi <4 x float> [[TMP20]] to <4 x i32>
	; CHECK-NEXT: [[TMP22:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*			; CHECK-NEXT: [[TMP22:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP21]], <4 x i32>* [[TMP22]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP21]], <4 x i32>* [[TMP22]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/landing_pad.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -S \| FileCheck %s

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @foo() personality i32* ()* @bar {			define void @foo() personality i32* ()* @bar {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2.loopexit:			; CHECK: bb2.loopexit:
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = phi <4 x i32> [ [[SHUFFLE:%.]], [[BB9:%.]] ], [ poison, [[BB2_LOOPEXIT:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <4 x i32> [ [[TMP8:%.]], [[BB9:%.]] ], [ poison, [[BB2_LOOPEXIT:%.]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP5:%.]], [[BB6:%.]] ], [ poison, [[BB1:%.]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP3:%.]], [[BB6:%.]] ], [ poison, [[BB1:%.]] ]
	; CHECK-NEXT: [[TMP2:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 0, i32 0, i32 poison) [ "deopt"() ]			; CHECK-NEXT: [[TMP2:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 0, i32 0, i32 poison) [ "deopt"() ]
	; CHECK-NEXT: to label [[BB4:%.]] unwind label [[BB10:%.]]			; CHECK-NEXT: to label [[BB4:%.]] unwind label [[BB10:%.]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: br i1 poison, label [[BB11:%.]], label [[BB5:%.]]			; CHECK-NEXT: br i1 poison, label [[BB11:%.]], label [[BB5:%.]]
	; CHECK: bb5:			; CHECK: bb5:
	; CHECK-NEXT: br label [[BB7:%.*]]			; CHECK-NEXT: br label [[BB7:%.*]]
	; CHECK: bb6:			; CHECK: bb6:
	; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ <i32 0, i32 poison>, [[BB8:%.]] ]			; CHECK-NEXT: [[TMP3]] = phi <2 x i32> [ <i32 0, i32 poison>, [[BB8:%.*]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 1
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb7:			; CHECK: bb7:
	; CHECK-NEXT: [[LOCAL_5_84111:%.*]] = phi i32 [ poison, [[BB8]] ], [ poison, [[BB5]] ]			; CHECK-NEXT: [[LOCAL_5_84111:%.*]] = phi i32 [ poison, [[BB8]] ], [ poison, [[BB5]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOCAL_5_84111]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[LOCAL_5_84111]], i32 1
	; CHECK-NEXT: [[TMP7:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 poison, i32 poison, i32 poison) [ "deopt"() ]			; CHECK-NEXT: [[TMP5:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 poison, i32 poison, i32 poison) [ "deopt"() ]
	; CHECK-NEXT: to label [[BB8]] unwind label [[BB12:%.*]]			; CHECK-NEXT: to label [[BB8]] unwind label [[BB12:%.*]]
	; CHECK: bb8:			; CHECK: bb8:
	; CHECK-NEXT: br i1 poison, label [[BB7]], label [[BB6]]			; CHECK-NEXT: br i1 poison, label [[BB7]], label [[BB6]]
	; CHECK: bb9:			; CHECK: bb9:
	; CHECK-NEXT: [[INDVARS_IV528799:%.*]] = phi i64 [ poison, [[BB10]] ], [ poison, [[BB12]] ]			; CHECK-NEXT: [[INDVARS_IV528799:%.*]] = phi i64 [ poison, [[BB10]] ], [ poison, [[BB12]] ]
	; CHECK-NEXT: [[TMP8:%.]] = phi <2 x i32> [ [[SHUFFLE1:%.]], [[BB10]] ], [ [[TMP11:%.*]], [[BB12]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i32> [ [[SHUFFLE:%.]], [[BB10]] ], [ [[TMP10:%.*]], [[BB12]] ]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[SHUFFLE]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			; CHECK-NEXT: [[TMP8]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	; CHECK: bb10:			; CHECK: bb10:
	; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x i32> [ [[TMP1]], [[BB3]] ]			; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i32> [ [[TMP1]], [[BB3]] ]
	; CHECK-NEXT: [[LANDING_PAD68:%.]] = landingpad { i8, i32 }			; CHECK-NEXT: [[LANDING_PAD68:%.]] = landingpad { i8, i32 }
	; CHECK-NEXT: cleanup			; CHECK-NEXT: cleanup
	; CHECK-NEXT: [[SHUFFLE1]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: br label [[BB9]]			; CHECK-NEXT: br label [[BB9]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb12:			; CHECK: bb12:
	; CHECK-NEXT: [[TMP11]] = phi <2 x i32> [ [[TMP6]], [[BB7]] ]			; CHECK-NEXT: [[TMP10]] = phi <2 x i32> [ [[TMP4]], [[BB7]] ]
	; CHECK-NEXT: [[LANDING_PAD149:%.]] = landingpad { i8, i32 }			; CHECK-NEXT: [[LANDING_PAD149:%.]] = landingpad { i8, i32 }
	; CHECK-NEXT: cleanup			; CHECK-NEXT: cleanup
	; CHECK-NEXT: br label [[BB9]]			; CHECK-NEXT: br label [[BB9]]
	;			;
	bb1:			bb1:
	br label %bb3			br label %bb3

	bb2.loopexit:			bb2.loopexit:
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

Show First 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	;
%add79.i181 = fadd float 2.0, %add78.i180		%add79.i181 = fadd float 2.0, %add78.i180
%mul123.i184 = fmul float %add36.i173, %add79.i181		%mul123.i184 = fmul float %add36.i173, %add79.i181
%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00		%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00
ret i1 %cmp.i185		ret i1 %cmp.i185
}		}


define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {		define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {
; SSE-LABEL: @foo(		; CHECK-LABEL: @foo(
; SSE-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0		; CHECK-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0
; SSE-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]		; CHECK-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; SSE-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0
; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1
; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[VEC]], <4 x float> poison, <2 x i32> <i32 undef, i32 1>
; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[SUB14_I167]], i32 0
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0
; SSE-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0		; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]
; SSE-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>
; SSE-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1		; CHECK-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]
; SSE-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]		; CHECK-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; SSE-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00		; CHECK-NEXT: ret i1 [[CMP_I185]]
; SSE-NEXT: ret i1 [[CMP_I185]]
;
; AVX-LABEL: @foo(
; AVX-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0
; AVX-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; AVX-NEXT: [[FM:%.]] = fmul float [[A:%.]], [[SUB14_I167]]
; AVX-NEXT: [[SUB25_I168:%.]] = fsub float [[FM]], [[B:%.]]
; AVX-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1
; AVX-NEXT: [[ADD36_I173:%.*]] = fadd float [[SUB25_I168]], 1.000000e+01
; AVX-NEXT: [[MUL72_I179:%.]] = fmul float [[C:%.]], [[VECEXT_I276_I169]]
; AVX-NEXT: [[ADD78_I180:%.*]] = fsub float [[MUL72_I179]], 3.000000e+01
; AVX-NEXT: [[ADD79_I181:%.*]] = fadd float 2.000000e+00, [[ADD78_I180]]
; AVX-NEXT: [[MUL123_I184:%.*]] = fmul float [[ADD36_I173]], [[ADD79_I181]]
; AVX-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; AVX-NEXT: ret i1 [[CMP_I185]]
;		;
%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0		%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0
%sub14.i167 = fsub float undef, %vecext.i291.i166		%sub14.i167 = fsub float undef, %vecext.i291.i166
%fm = fmul float %a, %sub14.i167		%fm = fmul float %a, %sub14.i167
%sub25.i168 = fsub float %fm, %b		%sub25.i168 = fsub float %fm, %b
%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1		%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1
%add36.i173 = fadd float %sub25.i168, 10.0		%add36.i173 = fadd float %sub25.i168, 10.0
%mul72.i179 = fmul float %c, %vecext.i276.i169		%mul72.i179 = fmul float %c, %vecext.i276.i169
%add78.i180 = fsub float %mul72.i179, 30.0		%add78.i180 = fsub float %mul72.i179, 30.0
%add79.i181 = fadd float 2.0, %add78.i180		%add79.i181 = fadd float 2.0, %add78.i180
%mul123.i184 = fmul float %add36.i173, %add79.i181		%mul123.i184 = fmul float %add36.i173, %add79.i181
%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00		%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00
ret i1 %cmp.i185		ret i1 %cmp.i185
}		}

; Same as @ChecksExtractScores, but the extratelement vector operands do not match.		; Same as @ChecksExtractScores, but the extratelement vector operands do not match.
define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {		define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {
;		;
; SSE-LABEL: @ChecksExtractScores_different_vectors(		; SSE-LABEL: @ChecksExtractScores_different_vectors(
; SSE-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0		; SSE-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
; SSE-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4		; SSE-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
; SSE-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4		; SSE-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
; SSE-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
; SSE-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
; SSE-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4		; SSE-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
; SSE-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4		; SSE-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
; SSE-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
; SSE-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
; SSE-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0		; SSE-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
; SSE-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*		; SSE-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRA1]], i32 0		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[LOADVEC2]], <2 x double> [[LOADVEC3]], <2 x i32> <i32 1, i32 2>
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRB0]], i32 1		; SSE-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]		; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[LOADVEC]], <2 x double> [[LOADVEC4]], <2 x i32> <i32 0, i32 3>
; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0		; SSE-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], [[TMP2]]
; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1		; SSE-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP6]]
; SSE-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP2]]		; SSE-NEXT: [[TMP8:%.]] = bitcast double [[SIDX0]] to <2 x double>*
; SSE-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP8]]		; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
; SSE-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*
; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ChecksExtractScores_different_vectors(		; AVX-LABEL: @ChecksExtractScores_different_vectors(
; AVX-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0		; AVX-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
; AVX-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1		; AVX-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
; AVX-NEXT: [[LOADA0:%.]] = load double, double [[IDX0]], align 4		; AVX-NEXT: [[LOADA0:%.]] = load double, double [[IDX0]], align 4
; AVX-NEXT: [[LOADA1:%.]] = load double, double [[IDX1]], align 4		; AVX-NEXT: [[LOADA1:%.]] = load double, double [[IDX1]], align 4
; AVX-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4		; AVX-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
; AVX-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4		; AVX-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
; AVX-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
; AVX-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
; AVX-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4		; AVX-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
; AVX-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4		; AVX-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
; AVX-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
; AVX-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
; AVX-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0		; AVX-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0		; AVX-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[LOADVEC]], <2 x double> [[LOADVEC2]], <2 x i32> <i32 0, i32 3>
; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[EXTRA1]], i32 1		; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0
; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0		; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[LOADA0]], i32 1
; AVX-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[LOADA0]], i32 1		; AVX-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
; AVX-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[LOADVEC3]], <2 x double> [[LOADVEC4]], <2 x i32> <i32 0, i32 3>
; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0		; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1		; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[LOADA1]], i32 1
; AVX-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0		; AVX-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP5]], [[TMP7]]
; AVX-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[LOADA1]], i32 1		; AVX-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP4]], [[TMP8]]
; AVX-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP7]], [[TMP9]]		; AVX-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*
; AVX-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP5]], [[TMP10]]		; AVX-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
; AVX-NEXT: [[TMP12:%.]] = bitcast double [[SIDX0]] to <2 x double>*
; AVX-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%idx0 = getelementptr inbounds double, double* %array, i64 0		%idx0 = getelementptr inbounds double, double* %array, i64 0
%idx1 = getelementptr inbounds double, double* %array, i64 1		%idx1 = getelementptr inbounds double, double* %array, i64 1
%loadA0 = load double, double* %idx0, align 4		%loadA0 = load double, double* %idx0, align 4
%loadA1 = load double, double* %idx1, align 4		%loadA1 = load double, double* %idx1, align 4

%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4		%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/malformed_phis.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s
	; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Make sure we do not generate malformed phis not in the beginning of block.			; Make sure we do not generate malformed phis not in the beginning of block.
	define void @test() #0 {			define void @test() #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB1]] ], [ undef, [[BB:%.]] ]			; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB1]] ], [ undef, [[BB:%.]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi i32 [ [[OP_RDX:%.]], [[BB1]] ], [ undef, [[BB]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi i32 [ [[OP_RDX:%.]], [[BB1]] ], [ undef, [[BB]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[TMP]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[TMP]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> poison, <16 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.vector.reduce.mul.v16i32(<16 x i32> [[SHUFFLE]])			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.mul.v16i32(<16 x i32> [[TMP1]])
	; CHECK-NEXT: [[OP_RDX]] = mul i32 [[TMP1]], undef			; CHECK-NEXT: [[OP_RDX]] = mul i32 [[TMP2]], undef
	; CHECK-NEXT: br label [[BB1]]			; CHECK-NEXT: br label [[BB1]]
	;			;
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %bb1, %bb			bb1: ; preds = %bb1, %bb
	%tmp = phi i32 [ undef, %bb1 ], [ undef, %bb ]			%tmp = phi i32 [ undef, %bb1 ], [ undef, %bb ]
	%tmp2 = phi i32 [ %tmp18, %bb1 ], [ undef, %bb ]			%tmp2 = phi i32 [ %tmp18, %bb1 ], [ undef, %bb ]
	Show All 19 Lines
	define void @test_2(i8 addrspace(1)* %arg, i32 %arg1) #0 {			define void @test_2(i8 addrspace(1)* %arg, i32 %arg1) #0 {
	; CHECK-LABEL: @test_2(			; CHECK-LABEL: @test_2(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB:%.]] ], [ undef, [[BB2]] ]			; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB:%.]] ], [ undef, [[BB2]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi i32 [ 0, [[BB]] ], [ undef, [[BB2]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi i32 [ 0, [[BB]] ], [ undef, [[BB2]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> undef)			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> undef)
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[SHUFFLE]])			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], undef			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], undef
	; CHECK-NEXT: call void @use(i32 [[OP_RDX1]])			; CHECK-NEXT: call void @use(i32 [[OP_RDX1]])
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	bb:			bb:
	br label %bb2			br label %bb2

	bb2: ; preds = %bb2, %bb			bb2: ; preds = %bb2, %bb
	Show All 24 Lines
	; CHECK-LABEL: @test_3(			; CHECK-LABEL: @test_3(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i32> [ undef, [[BB1]] ], [ poison, [[BB2:%.]] ]			; CHECK-NEXT: [[VAL:%.]] = phi i32 [ undef, [[BB1]] ], [ undef, [[BB2:%.]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[VAL4:%.*]] = phi i32 [ undef, [[BB1]] ], [ undef, [[BB2]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[VAL4]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <32 x i32> poison, i32 [[VAL4]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[TMP1]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> poison, <32 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[TMP1]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.mul.v32i32(<32 x i32> [[TMP3]])
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[TMP1]], i32 4			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.mul.v16i32(<16 x i32> [[TMP1]])
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[TMP1]], i32 5			; CHECK-NEXT: [[OP_RDX:%.*]] = mul i32 [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[TMP1]], i32 6			; CHECK-NEXT: [[OP_RDX1:%.*]] = mul i32 [[OP_RDX]], [[VAL4]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[TMP1]], i32 7			; CHECK-NEXT: [[OP_RDX2:%.*]] = mul i32 [[VAL4]], [[VAL4]]
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[TMP1]], i32 8			; CHECK-NEXT: [[OP_RDX3:%.*]] = mul i32 [[VAL4]], [[VAL4]]
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[TMP1]], i32 9			; CHECK-NEXT: [[OP_RDX4:%.*]] = mul i32 [[VAL4]], [[VAL4]]
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[TMP1]], i32 10			; CHECK-NEXT: [[OP_RDX5:%.*]] = mul i32 [[VAL4]], [[VAL4]]
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[TMP1]], i32 11			; CHECK-NEXT: [[OP_RDX6:%.*]] = mul i32 [[VAL4]], [[VAL4]]
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 [[TMP1]], i32 12			; CHECK-NEXT: [[OP_RDX7:%.*]] = mul i32 [[OP_RDX1]], [[OP_RDX2]]
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[TMP1]], i32 13			; CHECK-NEXT: [[OP_RDX8:%.*]] = mul i32 [[OP_RDX3]], [[OP_RDX4]]
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP1]], i32 14			; CHECK-NEXT: [[OP_RDX9:%.*]] = mul i32 [[OP_RDX5]], [[OP_RDX6]]
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP16]], i32 [[TMP1]], i32 15			; CHECK-NEXT: [[OP_RDX10:%.*]] = mul i32 [[OP_RDX7]], [[OP_RDX8]]
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <32 x i32> poison, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[OP_RDX11:%.*]] = mul i32 [[OP_RDX9]], [[VAL]]
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <32 x i32> [[TMP18]], i32 [[TMP1]], i32 1			; CHECK-NEXT: [[OP_RDX12:%.*]] = mul i32 [[OP_RDX10]], [[OP_RDX11]]
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <32 x i32> [[TMP19]], i32 [[TMP1]], i32 2			; CHECK-NEXT: [[VAL64:%.*]] = add i32 undef, [[OP_RDX12]]
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <32 x i32> [[TMP20]], i32 [[TMP1]], i32 3
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <32 x i32> [[TMP21]], i32 [[TMP1]], i32 4
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <32 x i32> [[TMP22]], i32 [[TMP1]], i32 5
	; CHECK-NEXT: [[TMP24:%.*]] = insertelement <32 x i32> [[TMP23]], i32 [[TMP1]], i32 6
	; CHECK-NEXT: [[TMP25:%.*]] = insertelement <32 x i32> [[TMP24]], i32 [[TMP1]], i32 7
	; CHECK-NEXT: [[TMP26:%.*]] = insertelement <32 x i32> [[TMP25]], i32 [[TMP1]], i32 8
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <32 x i32> [[TMP26]], i32 [[TMP1]], i32 9
	; CHECK-NEXT: [[TMP28:%.*]] = insertelement <32 x i32> [[TMP27]], i32 [[TMP1]], i32 10
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <32 x i32> [[TMP28]], i32 [[TMP1]], i32 11
	; CHECK-NEXT: [[TMP30:%.*]] = insertelement <32 x i32> [[TMP29]], i32 [[TMP1]], i32 12
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <32 x i32> [[TMP30]], i32 [[TMP1]], i32 13
	; CHECK-NEXT: [[TMP32:%.*]] = insertelement <32 x i32> [[TMP31]], i32 [[TMP1]], i32 14
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <32 x i32> [[TMP32]], i32 [[TMP1]], i32 15
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <32 x i32> [[TMP33]], i32 [[TMP1]], i32 16
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <32 x i32> [[TMP34]], i32 [[TMP1]], i32 17
	; CHECK-NEXT: [[TMP36:%.*]] = insertelement <32 x i32> [[TMP35]], i32 [[TMP1]], i32 18
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <32 x i32> [[TMP36]], i32 [[TMP1]], i32 19
	; CHECK-NEXT: [[TMP38:%.*]] = insertelement <32 x i32> [[TMP37]], i32 [[TMP1]], i32 20
	; CHECK-NEXT: [[TMP39:%.*]] = insertelement <32 x i32> [[TMP38]], i32 [[TMP1]], i32 21
	; CHECK-NEXT: [[TMP40:%.*]] = insertelement <32 x i32> [[TMP39]], i32 [[TMP1]], i32 22
	; CHECK-NEXT: [[TMP41:%.*]] = insertelement <32 x i32> [[TMP40]], i32 [[TMP1]], i32 23
	; CHECK-NEXT: [[TMP42:%.*]] = insertelement <32 x i32> [[TMP41]], i32 [[TMP1]], i32 24
	; CHECK-NEXT: [[TMP43:%.*]] = insertelement <32 x i32> [[TMP42]], i32 [[TMP1]], i32 25
	; CHECK-NEXT: [[TMP44:%.*]] = insertelement <32 x i32> [[TMP43]], i32 [[TMP1]], i32 26
	; CHECK-NEXT: [[TMP45:%.*]] = insertelement <32 x i32> [[TMP44]], i32 [[TMP1]], i32 27
	; CHECK-NEXT: [[TMP46:%.*]] = insertelement <32 x i32> [[TMP45]], i32 [[TMP1]], i32 28
	; CHECK-NEXT: [[TMP47:%.*]] = insertelement <32 x i32> [[TMP46]], i32 [[TMP1]], i32 29
	; CHECK-NEXT: [[TMP48:%.*]] = insertelement <32 x i32> [[TMP47]], i32 [[TMP1]], i32 30
	; CHECK-NEXT: [[TMP49:%.*]] = insertelement <32 x i32> [[TMP48]], i32 [[TMP1]], i32 31
	; CHECK-NEXT: [[TMP50:%.*]] = call i32 @llvm.vector.reduce.mul.v32i32(<32 x i32> [[TMP49]])
	; CHECK-NEXT: [[TMP51:%.*]] = call i32 @llvm.vector.reduce.mul.v16i32(<16 x i32> [[TMP17]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = mul i32 [[TMP50]], [[TMP51]]
	; CHECK-NEXT: [[TMP52:%.*]] = call i32 @llvm.vector.reduce.mul.v8i32(<8 x i32> [[SHUFFLE]])
	; CHECK-NEXT: [[OP_RDX1:%.*]] = mul i32 [[OP_RDX]], [[TMP52]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = mul i32 [[OP_RDX1]], [[TMP1]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = mul i32 [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = mul i32 [[OP_RDX2]], [[OP_RDX3]]
	; CHECK-NEXT: [[OP_RDX5:%.*]] = mul i32 [[OP_RDX4]], [[TMP1]]
	; CHECK-NEXT: [[VAL64:%.*]] = add i32 undef, [[OP_RDX5]]
	; CHECK-NEXT: [[VAL65:%.*]] = sext i32 [[VAL64]] to i64			; CHECK-NEXT: [[VAL65:%.*]] = sext i32 [[VAL64]] to i64
	; CHECK-NEXT: ret i64 [[VAL65]]			; CHECK-NEXT: ret i64 [[VAL65]]
	;			;
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %bb			bb1: ; preds = %bb
	br label %bb3			br label %bb3
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s

	define i32 @bar() local_unnamed_addr {			define i32 @bar() local_unnamed_addr {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_1]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 poison, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[SUB102_1]], i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 6
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 7
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 4			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 9
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 4, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[ADD78_2]], i32 10
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> poison, i32 [[SUB86_1]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 7, i32 6, i32 5, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[ADD78_1]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[SUB102_3]], i32 12
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD94_1]], i32 2			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_3]], i32 15
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_1]], i32 3			; CHECK-NEXT: [[TMP9:%.*]] = add nsw <16 x i32> [[TMP5]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_3]], i32 4			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <16 x i32> [[TMP5]], [[TMP8]]
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP9]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 undef, i32 undef, i32 4>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <16 x i32> [[TMP9]], <16 x i32> [[TMP10]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP12:%.*]] = lshr <16 x i32> [[TMP11]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP13:%.*]] = and <16 x i32> [[TMP12]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>			; CHECK-NEXT: [[TMP14:%.*]] = mul nuw <16 x i32> [[TMP13]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
	; CHECK-NEXT: [[TMP13:%.*]] = lshr <16 x i32> [[TMP12]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: [[TMP15:%.*]] = add <16 x i32> [[TMP14]], [[TMP11]]
	; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i32> [[TMP13]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>			; CHECK-NEXT: [[TMP16:%.*]] = xor <16 x i32> [[TMP15]], [[TMP14]]
	; CHECK-NEXT: [[TMP15:%.*]] = mul nuw <16 x i32> [[TMP14]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>			; CHECK-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP16]])
	; CHECK-NEXT: [[TMP16:%.*]] = add <16 x i32> [[TMP15]], [[TMP12]]			; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP17]], 16
	; CHECK-NEXT: [[TMP17:%.*]] = xor <16 x i32> [[TMP16]], [[TMP15]]
	; CHECK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP17]])
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP18]], 16
	; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]			; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]
	; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1			; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1
	; CHECK-NEXT: ret i32 [[SHR120]]			; CHECK-NEXT: ret i32 [[SHR120]]
	;			;
	entry:			entry:
	%add103 = add nsw i32 undef, undef			%add103 = add nsw i32 undef, undef
	%sub104 = sub nsw i32 undef, undef			%sub104 = sub nsw i32 undef, undef
	%add105 = add nsw i32 undef, undef			%add105 = add nsw i32 undef, undef
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

	Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	}			}

	define void @gather_sequence_crash(<2 x float> %arg, float* %arg1, float %arg2, float* %arg3, float* %arg4, float* %arg5, i1 %c.1, i1 %c.2) {			define void @gather_sequence_crash(<2 x float> %arg, float* %arg1, float %arg2, float* %arg3, float* %arg4, float* %arg5, i1 %c.1, i1 %c.2) {
	; CHECK-LABEL: @gather_sequence_crash(			; CHECK-LABEL: @gather_sequence_crash(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br i1 [[C_1:%.]], label [[BB16:%.]], label [[BB6:%.*]]			; CHECK-NEXT: br i1 [[C_1:%.]], label [[BB16:%.]], label [[BB6:%.*]]
	; CHECK: bb6:			; CHECK: bb6:
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[ARG1:%.*]], i32 3			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[ARG1:%.*]], i32 3
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x float> <float poison, float poison, float poison, float 0.000000e+00>, float [[ARG2:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = shufflevector <2 x float> [[ARG:%.]], <2 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[ARG:%.]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> <float poison, float poison, float poison, float 0.000000e+00>, <4 x i32> <i32 undef, i32 1, i32 2, i32 7>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x float> [[TMP1]], float [[ARG2:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP3]], <4 x float>* [[TMP4]], align 4			; CHECK-NEXT: store <4 x float> [[TMP3]], <4 x float>* [[TMP4]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb16:			; CHECK: bb16:
	; CHECK-NEXT: br label [[BB17:%.*]]			; CHECK-NEXT: br label [[BB17:%.*]]
	; CHECK: bb17:			; CHECK: bb17:
	; CHECK-NEXT: br label [[BB18:%.*]]			; CHECK-NEXT: br label [[BB18:%.*]]
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/odd_store.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
%20 = getelementptr inbounds i8, i8* %A, i64 2		%20 = getelementptr inbounds i8, i8* %A, i64 2
store i8 %19, i8* %20, align 1		store i8 %19, i8* %20, align 1
ret i32 undef		ret i32 undef
}		}

; PR41892		; PR41892
define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v2f32_store(		; CHECK-LABEL: @test_v4f32_v2f32_store(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[P:%.]] to <2 x float>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[X0]], i32 0		; CHECK-NEXT: store <2 x float> [[TMP1]], <2 x float>* [[TMP2]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[X1]], i32 1
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P:%.]] to <2 x float>
; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%p1 = getelementptr inbounds float, float* %p, i64 1		%p1 = getelementptr inbounds float, float* %p, i64 1
store float %x0, float* %p, align 4		store float %x0, float* %p, align 4
store float %x1, float* %p1, align 4		store float %x1, float* %p1, align 4
ret void		ret void
Show All 11 Lines	;
%p1 = getelementptr inbounds float, float* %p, i64 1		%p1 = getelementptr inbounds float, float* %p, i64 1
store float %x0, float* %p, align 4		store float %x0, float* %p, align 4
store float %x0, float* %p1, align 4		store float %x0, float* %p1, align 4
ret void		ret void
}		}

define void @test_v4f32_v3f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v3f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v3f32_store(		; CHECK-LABEL: @test_v4f32_v3f32_store(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0		; CHECK-NEXT: [[X2:%.]] = extractelement <4 x float> [[F:%.]], i64 2
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x float> [[F]], i64 2
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 2		; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 2
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[X0]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[F]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[X1]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[P]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P]] to <2 x float>*		; CHECK-NEXT: store <2 x float> [[TMP1]], <2 x float>* [[TMP2]], align 4
; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: store float [[X2]], float* [[P2]], align 4		; CHECK-NEXT: store float [[X2]], float* [[P2]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%x2 = extractelement <4 x float> %f, i64 2		%x2 = extractelement <4 x float> %f, i64 2
%p1 = getelementptr inbounds float, float* %p, i64 1		%p1 = getelementptr inbounds float, float* %p, i64 1
%p2 = getelementptr inbounds float, float* %p, i64 2		%p2 = getelementptr inbounds float, float* %p, i64 2
Show All 39 Lines	;
store float %x1, float* %p1, align 4		store float %x1, float* %p1, align 4
store float %x2, float* %p2, align 4		store float %x2, float* %p2, align 4
store float %x3, float* %p3, align 4		store float %x3, float* %p3, align 4
ret void		ret void
}		}

define void @test_v4f32_v4f32_splat_store(<4 x float> %f, float* %p){		define void @test_v4f32_v4f32_splat_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v4f32_splat_store(		; CHECK-LABEL: @test_v4f32_v4f32_splat_store(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> poison, float [[X0]], i32 0
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[P:%.]] to <4 x float>		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[P:%.]] to <4 x float>
; CHECK-NEXT: store <4 x float> [[SHUFFLE]], <4 x float>* [[TMP2]], align 4		; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%p1 = getelementptr inbounds float, float* %p, i64 1		%p1 = getelementptr inbounds float, float* %p, i64 1
%p2 = getelementptr inbounds float, float* %p, i64 2		%p2 = getelementptr inbounds float, float* %p, i64 2
%p3 = getelementptr inbounds float, float* %p, i64 3		%p3 = getelementptr inbounds float, float* %p, i64 3
store float %x0, float* %p, align 4		store float %x0, float* %p, align 4
store float %x0, float* %p1, align 4		store float %x0, float* %p1, align 4
store float %x0, float* %p2, align 4		store float %x0, float* %p2, align 4
store float %x0, float* %p3, align 4		store float %x0, float* %p3, align 4
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	define void @vecload_vs_broadcast(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast(			; CHECK-LABEL: @vecload_vs_broadcast(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i64 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @vecload_vs_broadcast(			; SSE2-LABEL: @vecload_vs_broadcast(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; SSE2-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; SSE2-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; SSE2-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; SSE2-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i64 0
	; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; SSE2-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; SSE2-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; SSE2-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4			; SSE2-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	Show All 18 Lines
	define void @vecload_vs_broadcast2(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast2(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast2(			; CHECK-LABEL: @vecload_vs_broadcast2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i64 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @vecload_vs_broadcast2(			; SSE2-LABEL: @vecload_vs_broadcast2(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; SSE2-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; SSE2-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; SSE2-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; SSE2-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i64 0
	; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]
	; SSE2-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; SSE2-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; SSE2-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4			; SSE2-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	Show All 18 Lines
	define void @vecload_vs_broadcast3(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast3(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast3(			; CHECK-LABEL: @vecload_vs_broadcast3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i64 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; SSE2-LABEL: @vecload_vs_broadcast3(			; SSE2-LABEL: @vecload_vs_broadcast3(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: br label [[LP:%.*]]			; SSE2-NEXT: br label [[LP:%.*]]
	; SSE2: lp:			; SSE2: lp:
	; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; SSE2-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; SSE2-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; SSE2-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; SSE2-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i64 0			; SSE2-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; SSE2-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; SSE2-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i64 0
	; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]			; SSE2-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], [[TMP1]]
	; SSE2-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; SSE2-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; SSE2-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4			; SSE2-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; SSE2-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; SSE2: ext:			; SSE2: ext:
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4			; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]			; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; CHECK-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i64 0			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x float> [[TMP9]], float [[TMP1]], i64 0
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4			; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]			; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]
	; CHECK-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4			; CHECK-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i64 3			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i64 3
	Show All 19 Lines
	; SSE2-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]			; SSE2-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]
	; SSE2-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; SSE2-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; SSE2-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]			; SSE2-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]
	; SSE2-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; SSE2-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; SSE2-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4			; SSE2-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4
	; SSE2-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]			; SSE2-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]
	; SSE2-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*			; SSE2-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; SSE2-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; SSE2-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; SSE2-NEXT: [[TMP9:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i64 0			; SSE2-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 2>
	; SSE2-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 6>			; SSE2-NEXT: [[TMP10:%.*]] = insertelement <4 x float> [[TMP9]], float [[TMP1]], i64 0
	; SSE2-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]			; SSE2-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]
	; SSE2-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*			; SSE2-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*
	; SSE2-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4			; SSE2-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4
	; SSE2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5			; SSE2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; SSE2-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; SSE2-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; SSE2-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]			; SSE2-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]
	; SSE2-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4			; SSE2-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4
	; SSE2-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i64 3			; SSE2-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i64 3
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @get_block(i32 %y_pos) local_unnamed_addr #0 {			define void @get_block(i32 %y_pos) local_unnamed_addr #0 {
	; CHECK-LABEL: @get_block(			; CHECK-LABEL: @get_block(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]
	; CHECK: land.lhs.true:			; CHECK: land.lhs.true:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef			; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef
	; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2			; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[SHR15]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[SUB14]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP0]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[SHUFFLE]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[TMP3]], undef
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[TMP3]], <4 x i32> undef			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[TMP3]], <4 x i32> undef
	; CHECK-NEXT: [[TMP6:%.*]] = sext <4 x i32> [[TMP5]] to <4 x i64>			; CHECK-NEXT: [[TMP6:%.*]] = sext <4 x i32> [[TMP5]] to <4 x i64>
	; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i64> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i64> [[TMP6]] to <4 x i32>
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64			; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -slp-threshold=-1000 -mtriple=x86_64 -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -slp-threshold=-1000 -mtriple=x86_64 -S \| FileCheck %s

	; The inputs to vector phi should remain undef.			; The inputs to vector phi should remain undef.

	define i32 @phi3UndefInput(i1 %cond, i8 %arg0, i8 %arg1, i8 %arg2, i8 %arg3) {			define i32 @phi3UndefInput(i1 %cond, i8 %arg0, i8 %arg1, i8 %arg2, i8 %arg3) {
	; CHECK-LABEL: @phi3UndefInput(			; CHECK-LABEL: @phi3UndefInput(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 undef, i8 undef, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 undef, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 0, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 0, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 21 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 21 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG0:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG0:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG3:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG3:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG0:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG0:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG2:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG2:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines

	define float @foo3(float* nocapture readonly %A) #0 {			define float @foo3(float* nocapture readonly %A) #0 {
	; CHECK-LABEL: @foo3(			; CHECK-LABEL: @foo3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP16:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP15:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x float> [ [[TMP4]], [[ENTRY]] ], [ [[TMP11:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP7]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP8:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP9:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*
	; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4			; CHECK-NEXT: [[TMP11]] = load <2 x float>, <2 x float>* [[TMP10]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP12]], <4 x i32> <i32 1, i32 undef, i32 2, i32 3>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> [[TMP11]], <4 x i32> <i32 1, i32 undef, i32 2, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP10]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x float> [[TMP12]], float [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = fmul <4 x float> [[TMP14]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>			; CHECK-NEXT: [[TMP14:%.*]] = fmul <4 x float> [[TMP13]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
	; CHECK-NEXT: [[TMP16]] = fadd <4 x float> [[TMP6]], [[TMP15]]			; CHECK-NEXT: [[TMP15]] = fadd <4 x float> [[TMP5]], [[TMP14]]
	; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP17]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP16]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP16]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[TMP15]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP18]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x float> [[TMP16]], i32 1			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP15]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP19]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP18]]
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP16]], i32 2			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x float> [[TMP15]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP20]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP19]]
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP16]], i32 3			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP15]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP21]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP20]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef			; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1			; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1
	; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4			; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
	; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0			; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
	; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
	; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
	; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1			; SSE-NEXT: [[TMP5:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
	; SSE-NEXT: [[TMP6:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*			; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP5]], align 1
	; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 1			; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP4]], i64 [[ADD]], i32 0
	; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0			; SSE-NEXT: [[TMP7:%.*]] = shl <2 x i64> [[TMP6]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP7]], i64 [[ADD]], i32 1			; SSE-NEXT: [[TMP8:%.*]] = and <2 x i64> [[TMP7]], <i64 20, i64 20>
	; SSE-NEXT: [[TMP9:%.*]] = shl <2 x i64> [[TMP8]], <i64 2, i64 2>			; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP8]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
	; SSE-NEXT: [[TMP10:%.*]] = and <2 x i64> [[TMP9]], <i64 20, i64 20>			; SSE-NEXT: [[TMP9:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
	; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; SSE-NEXT: [[TMP10:%.*]] = add nuw nsw <2 x i64> [[SHUFFLE]], [[TMP9]]
	; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP10]], [[TMP11]]			; SSE-NEXT: [[TMP11:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
	; SSE-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*			; SSE-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* [[TMP11]], align 1
	; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @pr35497(			; AVX-LABEL: @pr35497(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1			; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
	; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef			; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1			; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1
	; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4			; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
	; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0			; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1			; AVX-NEXT: [[TMP5:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
	; AVX-NEXT: [[TMP6:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*			; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP5]], align 1
	; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 1			; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP4]], i64 [[ADD]], i32 0
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0			; AVX-NEXT: [[TMP7:%.*]] = shl <2 x i64> [[TMP6]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP7]], i64 [[ADD]], i32 1			; AVX-NEXT: [[TMP8:%.*]] = and <2 x i64> [[TMP7]], <i64 20, i64 20>
	; AVX-NEXT: [[TMP9:%.*]] = shl <2 x i64> [[TMP8]], <i64 2, i64 2>			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP8]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
	; AVX-NEXT: [[TMP10:%.*]] = and <2 x i64> [[TMP9]], <i64 20, i64 20>			; AVX-NEXT: [[TMP9:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
	; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; AVX-NEXT: [[TMP10:%.*]] = add nuw nsw <2 x i64> [[SHUFFLE]], [[TMP9]]
	; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP10]], [[TMP11]]			; AVX-NEXT: [[TMP11:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
	; AVX-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*			; AVX-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* [[TMP11]], align 1
	; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i64, i64* undef, align 1			%0 = load i64, i64* undef, align 1
	%and = shl i64 %0, 2			%and = shl i64 %0, 2
	%shl = and i64 %and, 20			%shl = and i64 %and, 20
	%add = add i64 undef, undef			%add = add i64 undef, undef
	store i64 %add, i64* undef, align 1			store i64 %add, i64* undef, align 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

	Show All 20 Lines
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4			; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8			; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4			; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @foo(			; AVX-LABEL: @foo(
	; AVX-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			; AVX-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16
	; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0			; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[TMP1]], i64 0
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1			; AVX-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP2]], i64 1
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16			; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX512-LABEL: @foo(			; AVX512-LABEL: @foo(
	; AVX512-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			; AVX512-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16
	; AVX512-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			; AVX512-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8
	; AVX512-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i64 0			; AVX512-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[TMP1]], i64 0
	; AVX512-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i64 1			; AVX512-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP2]], i64 1
	; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; AVX512-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16			; AVX512-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16
	%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=x86_64-- -slp-vectorizer -instcombine -S < %s \| FileCheck %s			; RUN: opt -mtriple=x86_64-- -slp-vectorizer -instcombine -S < %s \| FileCheck %s

	; These conversions should be vectorized by reviews.llvm.org/D57059			; These conversions should be vectorized by reviews.llvm.org/D57059

	define dso_local <4 x float> @foo(<4 x i32> %0) {			define dso_local <4 x float> @foo(<4 x i32> %0) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP0:%.]], i64 1			; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP0:%.]], i64 1
	; CHECK-NEXT: [[TMP3:%.*]] = sitofp i32 [[TMP2]] to float			; CHECK-NEXT: [[TMP3:%.*]] = sitofp i32 [[TMP2]] to float
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i64 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i64 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP6]] to <2 x float>			; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP6]] to <2 x float>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x float> [[TMP9]]			; CHECK-NEXT: ret <4 x float> [[TMP9]]
	;			;
	%2 = extractelement <4 x i32> %0, i32 1			%2 = extractelement <4 x i32> %0, i32 1
	%3 = sitofp i32 %2 to float			%3 = sitofp i32 %2 to float
	%4 = insertelement <4 x float> undef, float %3, i32 0			%4 = insertelement <4 x float> undef, float %3, i32 0
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	;
%s2 = select i1 %s1, i1 true, i1 %c2		%s2 = select i1 %s1, i1 true, i1 %c2
%s3 = select i1 %s2, i1 true, i1 %c3		%s3 = select i1 %s2, i1 true, i1 %c3
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_and_icmp_diff_preds(<4 x i32> %x) {		define i1 @logical_and_icmp_diff_preds(<4 x i32> %x) {
; SSE-LABEL: @logical_and_icmp_diff_preds(		; SSE-LABEL: @logical_and_icmp_diff_preds(
; SSE-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; SSE-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
; SSE-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
; SSE-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; SSE-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; SSE-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
; SSE-NEXT: [[C0:%.*]] = icmp ult i32 [[X0]], 0		; SSE-NEXT: [[C0:%.*]] = icmp ult i32 [[X0]], 0
; SSE-NEXT: [[C2:%.*]] = icmp sgt i32 [[X2]], 0		; SSE-NEXT: [[C2:%.*]] = icmp sgt i32 [[X2]], 0
; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[X3]], i32 0		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> poison, <2 x i32> <i32 3, i32 1>
; SSE-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[X1]], i32 1		; SSE-NEXT: [[TMP2:%.*]] = icmp slt <2 x i32> [[TMP1]], zeroinitializer
; SSE-NEXT: [[TMP3:%.*]] = icmp slt <2 x i32> [[TMP2]], zeroinitializer		; SSE-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[TMP2]], i32 1
; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1		; SSE-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[TMP3]], i1 false
; SSE-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[TMP4]], i1 false
; SSE-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; SSE-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP2]], i32 0
; SSE-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[TMP5]], i1 false		; SSE-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[TMP4]], i1 false
; SSE-NEXT: ret i1 [[S3]]		; SSE-NEXT: ret i1 [[S3]]
;		;
; AVX-LABEL: @logical_and_icmp_diff_preds(		; AVX-LABEL: @logical_and_icmp_diff_preds(
; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; AVX-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; AVX-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
; AVX-NEXT: [[C0:%.*]] = icmp ult i32 [[X0]], 0		; AVX-NEXT: [[C0:%.*]] = icmp ult i32 [[X0]], 0
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	;
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 true, i1 %c2		%s2 = select i1 %s1, i1 true, i1 %c2
%s3 = select i1 %s2, i1 %c3, i1 false		%s3 = select i1 %s2, i1 %c3, i1 false
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_and_icmp_subvec(<4 x i32> %x) {		define i1 @logical_and_icmp_subvec(<4 x i32> %x) {
; SSE-LABEL: @logical_and_icmp_subvec(		; SSE-LABEL: @logical_and_icmp_subvec(
; SSE-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; SSE-NEXT: [[X2:%.]] = extractelement <4 x i32> [[X:%.]], i32 2
; SSE-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> poison, <2 x i32> <i32 1, i32 0>
; SSE-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; SSE-NEXT: [[TMP2:%.*]] = icmp slt <2 x i32> [[TMP1]], zeroinitializer
; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[X1]], i32 0
; SSE-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[X0]], i32 1
; SSE-NEXT: [[TMP3:%.*]] = icmp slt <2 x i32> [[TMP2]], zeroinitializer
; SSE-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 0		; SSE-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 0
; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0		; SSE-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[TMP2]], i32 0
; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1		; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP2]], i32 1
; SSE-NEXT: [[S1:%.*]] = select i1 [[TMP5]], i1 [[TMP4]], i1 false		; SSE-NEXT: [[S1:%.*]] = select i1 [[TMP4]], i1 [[TMP3]], i1 false
; SSE-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; SSE-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
; SSE-NEXT: ret i1 [[S2]]		; SSE-NEXT: ret i1 [[S2]]
;		;
; AVX-LABEL: @logical_and_icmp_subvec(		; AVX-LABEL: @logical_and_icmp_subvec(
; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 0		; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 0
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {		define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {
; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(		; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(
; CHECK-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[X:%.]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[Y:%.]], <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <8 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = freeze <8 x i1> [[TMP3]]
; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP4]])
; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1		; CHECK-NEXT: ret i1 [[TMP5]]
; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2
; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[X1]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[X2]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[X3]], i32 3
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[Y0]], i32 4
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[Y1]], i32 5
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[Y2]], i32 6
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[Y3]], i32 7
; CHECK-NEXT: [[TMP9:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], [[TMP8]]
; CHECK-NEXT: [[TMP10:%.*]] = freeze <8 x i1> [[TMP9]]
; CHECK-NEXT: [[TMP11:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP10]])
; CHECK-NEXT: ret i1 [[TMP11]]
;		;
%x0 = extractelement <8 x i32> %x, i32 0		%x0 = extractelement <8 x i32> %x, i32 0
%x1 = extractelement <8 x i32> %x, i32 1		%x1 = extractelement <8 x i32> %x, i32 1
%x2 = extractelement <8 x i32> %x, i32 2		%x2 = extractelement <8 x i32> %x, i32 2
%x3 = extractelement <8 x i32> %x, i32 3		%x3 = extractelement <8 x i32> %x, i32 3
%y0 = extractelement <8 x i32> %y, i32 0		%y0 = extractelement <8 x i32> %y, i32 0
%y1 = extractelement <8 x i32> %y, i32 1		%y1 = extractelement <8 x i32> %y, i32 1
%y2 = extractelement <8 x i32> %y, i32 2		%y2 = extractelement <8 x i32> %y, i32 2
Show All 14 Lines	;
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {
; SSE-LABEL: @logical_and_icmp_clamp_partial(		; SSE-LABEL: @logical_and_icmp_clamp_partial(
; SSE-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2		; SSE-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2
; SSE-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 1		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> poison, <2 x i32> <i32 1, i32 0>
; SSE-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 0		; SSE-NEXT: [[TMP3:%.*]] = icmp slt <2 x i32> [[TMP2]], <i32 42, i32 42>
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0
; SSE-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP3]], i32 1
; SSE-NEXT: [[TMP6:%.*]] = icmp slt <2 x i32> [[TMP5]], <i32 42, i32 42>
; SSE-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP1]], 42		; SSE-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP1]], 42
; SSE-NEXT: [[TMP7:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; SSE-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; SSE-NEXT: [[TMP8:%.*]] = freeze <4 x i1> [[TMP7]]		; SSE-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP4]]
; SSE-NEXT: [[TMP9:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP8]])		; SSE-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP6]], i32 0		; SSE-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP9]], i1 [[TMP10]], i1 false		; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP6]], i1 [[TMP7]], i1 false
; SSE-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP6]], i32 1		; SSE-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1
; SSE-NEXT: [[OP_RDX1:%.*]] = select i1 [[TMP11]], i1 [[C2]], i1 false		; SSE-NEXT: [[OP_RDX1:%.*]] = select i1 [[TMP8]], i1 [[C2]], i1 false
; SSE-NEXT: [[TMP12:%.*]] = freeze i1 [[OP_RDX]]		; SSE-NEXT: [[TMP9:%.*]] = freeze i1 [[OP_RDX]]
; SSE-NEXT: [[OP_RDX2:%.*]] = select i1 [[TMP12]], i1 [[OP_RDX1]], i1 false		; SSE-NEXT: [[OP_RDX2:%.*]] = select i1 [[TMP9]], i1 [[OP_RDX1]], i1 false
; SSE-NEXT: ret i1 [[OP_RDX2]]		; SSE-NEXT: ret i1 [[OP_RDX2]]
;		;
; AVX-LABEL: @logical_and_icmp_clamp_partial(		; AVX-LABEL: @logical_and_icmp_clamp_partial(
; AVX-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2		; AVX-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2
; AVX-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 1		; AVX-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 1
; AVX-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 0		; AVX-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 0
; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[TMP3]], 42		; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[TMP3]], 42
; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[TMP2]], 42		; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[TMP2]], 42
▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-same-vals.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s

	define i64 @test() {			define i64 @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP:%.]] = phi i32 [ 0, [[BB2:%.]] ], [ 0, [[BB1:%.*]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i32> [ poison, [[BB2:%.]] ], [ zeroinitializer, [[BB1:%.*]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = phi i32 [ 0, [[BB2]] ], [ 0, [[BB1]] ]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> [[TMP0]], i32 [[TMP4]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP4]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.mul.v8i32(<8 x i32> [[TMP2]])
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP4]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[SHUFFLE]])
	; CHECK-NEXT: [[TMP44:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP4]], i32 4			; CHECK-NEXT: [[OP_RDX:%.*]] = mul i32 [[TMP3]], [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP44]], i32 [[TMP4]], i32 5			; CHECK-NEXT: [[TMP65:%.*]] = sext i32 [[OP_RDX]] to i64
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP4]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP4]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.mul.v8i32(<8 x i32> [[TMP7]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = mul i32 [[TMP8]], [[TMP4]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = mul i32 [[TMP4]], [[TMP4]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = mul i32 [[OP_RDX]], [[OP_RDX1]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = mul i32 [[OP_RDX2]], [[TMP]]
	; CHECK-NEXT: [[TMP65:%.*]] = sext i32 [[OP_RDX3]] to i64
	; CHECK-NEXT: ret i64 [[TMP65]]			; CHECK-NEXT: ret i64 [[TMP65]]
	;			;
	bb1:			bb1:
	br label %bb3			br label %bb3

	bb2:			bb2:
	br label %bb3			br label %bb3

	Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll

	Show All 12 Lines
	; acc &= v3[0] & v3[1] & v3[2] & v3[3];			; acc &= v3[0] & v3[1] & v3[2] & v3[3];
	; acc &= v4[0] & v4[1] & v4[2] & v4[3];			; acc &= v4[0] & v4[1] & v4[2] & v4[3];
	; return acc;			; return acc;
	; }			; }

	define i32 @reduce_and4(i32 %acc, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3, <4 x i32> %v4) {			define i32 @reduce_and4(i32 %acc, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3, <4 x i32> %v4) {
	; SSE2-LABEL: @reduce_and4(			; SSE2-LABEL: @reduce_and4(
	; SSE2-NEXT: entry:			; SSE2-NEXT: entry:
	; SSE2-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[V1:%.]], i64 0			; SSE2-NEXT: [[TMP0:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; SSE2-NEXT: [[VECEXT1:%.*]] = extractelement <4 x i32> [[V1]], i64 1			; SSE2-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; SSE2-NEXT: [[VECEXT2:%.*]] = extractelement <4 x i32> [[V1]], i64 2			; SSE2-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; SSE2-NEXT: [[VECEXT4:%.*]] = extractelement <4 x i32> [[V1]], i64 3			; SSE2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP0]])
	; SSE2-NEXT: [[VECEXT7:%.]] = extractelement <4 x i32> [[V2:%.]], i64 0			; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP2]], [[TMP3]]
	; SSE2-NEXT: [[VECEXT8:%.*]] = extractelement <4 x i32> [[V2]], i64 1
	; SSE2-NEXT: [[VECEXT10:%.*]] = extractelement <4 x i32> [[V2]], i64 2
	; SSE2-NEXT: [[VECEXT12:%.*]] = extractelement <4 x i32> [[V2]], i64 3
	; SSE2-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT8]], i32 0
	; SSE2-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> [[TMP0]], i32 [[VECEXT7]], i32 1
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[VECEXT10]], i32 2
	; SSE2-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[VECEXT12]], i32 3
	; SSE2-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[VECEXT1]], i32 4
	; SSE2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[VECEXT]], i32 5
	; SSE2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[VECEXT2]], i32 6
	; SSE2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[VECEXT4]], i32 7
	; SSE2-NEXT: [[VECEXT15:%.]] = extractelement <4 x i32> [[V3:%.]], i64 0
	; SSE2-NEXT: [[VECEXT16:%.*]] = extractelement <4 x i32> [[V3]], i64 1
	; SSE2-NEXT: [[VECEXT18:%.*]] = extractelement <4 x i32> [[V3]], i64 2
	; SSE2-NEXT: [[VECEXT20:%.*]] = extractelement <4 x i32> [[V3]], i64 3
	; SSE2-NEXT: [[VECEXT23:%.]] = extractelement <4 x i32> [[V4:%.]], i64 0
	; SSE2-NEXT: [[VECEXT24:%.*]] = extractelement <4 x i32> [[V4]], i64 1
	; SSE2-NEXT: [[VECEXT26:%.*]] = extractelement <4 x i32> [[V4]], i64 2
	; SSE2-NEXT: [[VECEXT28:%.*]] = extractelement <4 x i32> [[V4]], i64 3
	; SSE2-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT24]], i32 0
	; SSE2-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[VECEXT23]], i32 1
	; SSE2-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[VECEXT26]], i32 2
	; SSE2-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP10]], i32 [[VECEXT28]], i32 3
	; SSE2-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[VECEXT16]], i32 4
	; SSE2-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP12]], i32 [[VECEXT15]], i32 5
	; SSE2-NEXT: [[TMP14:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[VECEXT18]], i32 6
	; SSE2-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP14]], i32 [[VECEXT20]], i32 7
	; SSE2-NEXT: [[TMP16:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP15]])
	; SSE2-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP7]])
	; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP16]], [[TMP17]]
	; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]			; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; SSE2-NEXT: ret i32 [[OP_RDX1]]			; SSE2-NEXT: ret i32 [[OP_RDX1]]
	;			;
	; SSE42-LABEL: @reduce_and4(			; SSE42-LABEL: @reduce_and4(
	; SSE42-NEXT: entry:			; SSE42-NEXT: entry:
	; SSE42-NEXT: [[TMP0:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V4:%.]])			; SSE42-NEXT: [[TMP0:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V4:%.]])
	; SSE42-NEXT: [[TMP1:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V3:%.]])			; SSE42-NEXT: [[TMP1:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V3:%.]])
	; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP0]], [[TMP1]]			; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP0]], [[TMP1]]
	; SSE42-NEXT: [[TMP2:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V2:%.]])			; SSE42-NEXT: [[TMP2:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V2:%.]])
	; SSE42-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP2]]			; SSE42-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP2]]
	; SSE42-NEXT: [[TMP3:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V1:%.]])			; SSE42-NEXT: [[TMP3:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V1:%.]])
	; SSE42-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP3]]			; SSE42-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP3]]
	; SSE42-NEXT: [[OP_RDX3:%.]] = and i32 [[OP_RDX2]], [[ACC:%.]]			; SSE42-NEXT: [[OP_RDX3:%.]] = and i32 [[OP_RDX2]], [[ACC:%.]]
	; SSE42-NEXT: ret i32 [[OP_RDX3]]			; SSE42-NEXT: ret i32 [[OP_RDX3]]
	;			;
	; AVX-LABEL: @reduce_and4(			; AVX-LABEL: @reduce_and4(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[V1:%.]], i64 0			; AVX-NEXT: [[TMP0:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; AVX-NEXT: [[VECEXT1:%.*]] = extractelement <4 x i32> [[V1]], i64 1			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; AVX-NEXT: [[VECEXT2:%.*]] = extractelement <4 x i32> [[V1]], i64 2			; AVX-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; AVX-NEXT: [[VECEXT4:%.*]] = extractelement <4 x i32> [[V1]], i64 3			; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP0]])
	; AVX-NEXT: [[VECEXT7:%.]] = extractelement <4 x i32> [[V2:%.]], i64 0			; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP2]], [[TMP3]]
	; AVX-NEXT: [[VECEXT8:%.*]] = extractelement <4 x i32> [[V2]], i64 1
	; AVX-NEXT: [[VECEXT10:%.*]] = extractelement <4 x i32> [[V2]], i64 2
	; AVX-NEXT: [[VECEXT12:%.*]] = extractelement <4 x i32> [[V2]], i64 3
	; AVX-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT8]], i32 0
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> [[TMP0]], i32 [[VECEXT7]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[VECEXT10]], i32 2
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[VECEXT12]], i32 3
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[VECEXT1]], i32 4
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[VECEXT]], i32 5
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[VECEXT2]], i32 6
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[VECEXT4]], i32 7
	; AVX-NEXT: [[VECEXT15:%.]] = extractelement <4 x i32> [[V3:%.]], i64 0
	; AVX-NEXT: [[VECEXT16:%.*]] = extractelement <4 x i32> [[V3]], i64 1
	; AVX-NEXT: [[VECEXT18:%.*]] = extractelement <4 x i32> [[V3]], i64 2
	; AVX-NEXT: [[VECEXT20:%.*]] = extractelement <4 x i32> [[V3]], i64 3
	; AVX-NEXT: [[VECEXT23:%.]] = extractelement <4 x i32> [[V4:%.]], i64 0
	; AVX-NEXT: [[VECEXT24:%.*]] = extractelement <4 x i32> [[V4]], i64 1
	; AVX-NEXT: [[VECEXT26:%.*]] = extractelement <4 x i32> [[V4]], i64 2
	; AVX-NEXT: [[VECEXT28:%.*]] = extractelement <4 x i32> [[V4]], i64 3
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT24]], i32 0
	; AVX-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[VECEXT23]], i32 1
	; AVX-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[VECEXT26]], i32 2
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP10]], i32 [[VECEXT28]], i32 3
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[VECEXT16]], i32 4
	; AVX-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP12]], i32 [[VECEXT15]], i32 5
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[VECEXT18]], i32 6
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP14]], i32 [[VECEXT20]], i32 7
	; AVX-NEXT: [[TMP16:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP15]])
	; AVX-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP7]])
	; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP16]], [[TMP17]]
	; AVX-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]			; AVX-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; AVX-NEXT: ret i32 [[OP_RDX1]]			; AVX-NEXT: ret i32 [[OP_RDX1]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %v1, i64 0			%vecext = extractelement <4 x i32> %v1, i64 0
	%vecext1 = extractelement <4 x i32> %v1, i64 1			%vecext1 = extractelement <4 x i32> %v1, i64 1
	%vecext2 = extractelement <4 x i32> %v1, i64 2			%vecext2 = extractelement <4 x i32> %v1, i64 2
	%vecext4 = extractelement <4 x i32> %v1, i64 3			%vecext4 = extractelement <4 x i32> %v1, i64 3
	Show All 33 Lines
	; acc &= v1[1] & v2[1] & v3[1] & v4[1];			; acc &= v1[1] & v2[1] & v3[1] & v4[1];
	; acc &= v1[2] & v2[2] & v3[2] & v4[2];			; acc &= v1[2] & v2[2] & v3[2] & v4[2];
	; acc &= v1[3] & v2[3] & v3[3] & v4[3];			; acc &= v1[3] & v2[3] & v3[3] & v4[3];
	; return acc;			; return acc;
	; }			; }

	define i32 @reduce_and4_transpose(i32 %acc, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3, <4 x i32> %v4) {			define i32 @reduce_and4_transpose(i32 %acc, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3, <4 x i32> %v4) {
	; SSE2-LABEL: @reduce_and4_transpose(			; SSE2-LABEL: @reduce_and4_transpose(
	; SSE2-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[V1:%.]], i64 0			; SSE2-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; SSE2-NEXT: [[VECEXT1:%.]] = extractelement <4 x i32> [[V2:%.]], i64 0			; SSE2-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; SSE2-NEXT: [[VECEXT2:%.]] = extractelement <4 x i32> [[V3:%.]], i64 0			; SSE2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])
	; SSE2-NEXT: [[VECEXT4:%.]] = extractelement <4 x i32> [[V4:%.]], i64 0			; SSE2-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; SSE2-NEXT: [[VECEXT7:%.*]] = extractelement <4 x i32> [[V1]], i64 1			; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP3]], [[TMP4]]
	; SSE2-NEXT: [[VECEXT8:%.*]] = extractelement <4 x i32> [[V2]], i64 1
	; SSE2-NEXT: [[VECEXT10:%.*]] = extractelement <4 x i32> [[V3]], i64 1
	; SSE2-NEXT: [[VECEXT12:%.*]] = extractelement <4 x i32> [[V4]], i64 1
	; SSE2-NEXT: [[VECEXT15:%.*]] = extractelement <4 x i32> [[V1]], i64 2
	; SSE2-NEXT: [[VECEXT16:%.*]] = extractelement <4 x i32> [[V2]], i64 2
	; SSE2-NEXT: [[VECEXT18:%.*]] = extractelement <4 x i32> [[V3]], i64 2
	; SSE2-NEXT: [[VECEXT20:%.*]] = extractelement <4 x i32> [[V4]], i64 2
	; SSE2-NEXT: [[VECEXT23:%.*]] = extractelement <4 x i32> [[V1]], i64 3
	; SSE2-NEXT: [[VECEXT24:%.*]] = extractelement <4 x i32> [[V2]], i64 3
	; SSE2-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT24]], i32 0
	; SSE2-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[VECEXT16]], i32 1
	; SSE2-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[VECEXT8]], i32 2
	; SSE2-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[VECEXT1]], i32 3
	; SSE2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[VECEXT23]], i32 4
	; SSE2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[VECEXT15]], i32 5
	; SSE2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[VECEXT7]], i32 6
	; SSE2-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[VECEXT]], i32 7
	; SSE2-NEXT: [[VECEXT26:%.*]] = extractelement <4 x i32> [[V3]], i64 3
	; SSE2-NEXT: [[VECEXT28:%.*]] = extractelement <4 x i32> [[V4]], i64 3
	; SSE2-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT28]], i32 0
	; SSE2-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[VECEXT20]], i32 1
	; SSE2-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP10]], i32 [[VECEXT12]], i32 2
	; SSE2-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[VECEXT4]], i32 3
	; SSE2-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP12]], i32 [[VECEXT26]], i32 4
	; SSE2-NEXT: [[TMP14:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[VECEXT18]], i32 5
	; SSE2-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP14]], i32 [[VECEXT10]], i32 6
	; SSE2-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> [[TMP15]], i32 [[VECEXT2]], i32 7
	; SSE2-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP16]])
	; SSE2-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP8]])
	; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP17]], [[TMP18]]
	; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]			; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; SSE2-NEXT: ret i32 [[OP_RDX1]]			; SSE2-NEXT: ret i32 [[OP_RDX1]]
	;			;
	; SSE42-LABEL: @reduce_and4_transpose(			; SSE42-LABEL: @reduce_and4_transpose(
	; SSE42-NEXT: [[TMP1:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V4:%.]])			; SSE42-NEXT: [[TMP1:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V4:%.]])
	; SSE42-NEXT: [[TMP2:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V3:%.]])			; SSE42-NEXT: [[TMP2:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V3:%.]])
	; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP1]], [[TMP2]]			; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP1]], [[TMP2]]
	; SSE42-NEXT: [[TMP3:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V2:%.]])			; SSE42-NEXT: [[TMP3:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V2:%.]])
	; SSE42-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP3]]			; SSE42-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP3]]
	; SSE42-NEXT: [[TMP4:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V1:%.]])			; SSE42-NEXT: [[TMP4:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V1:%.]])
	; SSE42-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP4]]			; SSE42-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP4]]
	; SSE42-NEXT: [[OP_RDX3:%.]] = and i32 [[OP_RDX2]], [[ACC:%.]]			; SSE42-NEXT: [[OP_RDX3:%.]] = and i32 [[OP_RDX2]], [[ACC:%.]]
	; SSE42-NEXT: ret i32 [[OP_RDX3]]			; SSE42-NEXT: ret i32 [[OP_RDX3]]
	;			;
	; AVX-LABEL: @reduce_and4_transpose(			; AVX-LABEL: @reduce_and4_transpose(
	; AVX-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[V1:%.]], i64 0			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; AVX-NEXT: [[VECEXT1:%.]] = extractelement <4 x i32> [[V2:%.]], i64 0			; AVX-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; AVX-NEXT: [[VECEXT2:%.]] = extractelement <4 x i32> [[V3:%.]], i64 0			; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])
	; AVX-NEXT: [[VECEXT4:%.]] = extractelement <4 x i32> [[V4:%.]], i64 0			; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; AVX-NEXT: [[VECEXT7:%.*]] = extractelement <4 x i32> [[V1]], i64 1			; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP3]], [[TMP4]]
	; AVX-NEXT: [[VECEXT8:%.*]] = extractelement <4 x i32> [[V2]], i64 1
	; AVX-NEXT: [[VECEXT10:%.*]] = extractelement <4 x i32> [[V3]], i64 1
	; AVX-NEXT: [[VECEXT12:%.*]] = extractelement <4 x i32> [[V4]], i64 1
	; AVX-NEXT: [[VECEXT15:%.*]] = extractelement <4 x i32> [[V1]], i64 2
	; AVX-NEXT: [[VECEXT16:%.*]] = extractelement <4 x i32> [[V2]], i64 2
	; AVX-NEXT: [[VECEXT18:%.*]] = extractelement <4 x i32> [[V3]], i64 2
	; AVX-NEXT: [[VECEXT20:%.*]] = extractelement <4 x i32> [[V4]], i64 2
	; AVX-NEXT: [[VECEXT23:%.*]] = extractelement <4 x i32> [[V1]], i64 3
	; AVX-NEXT: [[VECEXT24:%.*]] = extractelement <4 x i32> [[V2]], i64 3
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT24]], i32 0
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[VECEXT16]], i32 1
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[VECEXT8]], i32 2
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[VECEXT1]], i32 3
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[VECEXT23]], i32 4
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[VECEXT15]], i32 5
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[VECEXT7]], i32 6
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[VECEXT]], i32 7
	; AVX-NEXT: [[VECEXT26:%.*]] = extractelement <4 x i32> [[V3]], i64 3
	; AVX-NEXT: [[VECEXT28:%.*]] = extractelement <4 x i32> [[V4]], i64 3
	; AVX-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> poison, i32 [[VECEXT28]], i32 0
	; AVX-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[VECEXT20]], i32 1
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP10]], i32 [[VECEXT12]], i32 2
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[VECEXT4]], i32 3
	; AVX-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP12]], i32 [[VECEXT26]], i32 4
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[VECEXT18]], i32 5
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP14]], i32 [[VECEXT10]], i32 6
	; AVX-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> [[TMP15]], i32 [[VECEXT2]], i32 7
	; AVX-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP16]])
	; AVX-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP8]])
	; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP17]], [[TMP18]]
	; AVX-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]			; AVX-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; AVX-NEXT: ret i32 [[OP_RDX1]]			; AVX-NEXT: ret i32 [[OP_RDX1]]
	;			;
	%vecext = extractelement <4 x i32> %v1, i64 0			%vecext = extractelement <4 x i32> %v1, i64 0
	%vecext1 = extractelement <4 x i32> %v2, i64 0			%vecext1 = extractelement <4 x i32> %v2, i64 0
	%vecext2 = extractelement <4 x i32> %v3, i64 0			%vecext2 = extractelement <4 x i32> %v3, i64 0
	%vecext4 = extractelement <4 x i32> %v4, i64 0			%vecext4 = extractelement <4 x i32> %v4, i64 0
	%vecext7 = extractelement <4 x i32> %v1, i64 1			%vecext7 = extractelement <4 x i32> %v1, i64 1
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

	define i1 @fcmp_lt_gt(double %a, double %b, double %c) {			define i1 @fcmp_lt_gt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt_gt(			; CHECK-LABEL: @fcmp_lt_gt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	Show All 32 Lines
	}			}

	define i1 @fcmp_lt(double %a, double %b, double %c) {			define i1 @fcmp_lt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt(			; CHECK-LABEL: @fcmp_lt(
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[MUL]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[MUL]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>			; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/redux-feed-buildvector.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64 -passes=slp-vectorizer -S -mcpu=skylake-avx512 \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64 -passes=slp-vectorizer -S -mcpu=skylake-avx512 \| FileCheck %s

	; The test represents the case with multiple vectorization possibilities			; The test represents the case with multiple vectorization possibilities
	; but the most effective way to vectorize it is to match both 8-way reductions			; but the most effective way to vectorize it is to match both 8-way reductions
	; feeding the insertelement vector build sequence.			; feeding the insertelement vector build sequence.

	declare void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double>, <2 x double*>, i32 immarg, <2 x i1>)			declare void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double>, <2 x double*>, i32 immarg, <2 x i1>)

	define void @test(double* nocapture readonly %arg, double* nocapture readonly %arg1, double* nocapture %arg2) {			define void @test(double* nocapture readonly %arg, double* nocapture readonly %arg1, double* nocapture %arg2) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x double> poison, double* [[ARG:%.*]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x double> poison, double* [[ARG:%.*]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x double> [[TMP0]], <8 x double*> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x double> [[TMP0]], <8 x double*> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr double, <8 x double> [[SHUFFLE]], <8 x i64> <i64 1, i64 3, i64 5, i64 7, i64 9, i64 11, i64 13, i64 15>			; CHECK-NEXT: [[TMP2:%.]] = getelementptr double, <8 x double> [[TMP1]], <8 x i64> <i64 1, i64 3, i64 5, i64 7, i64 9, i64 11, i64 13, i64 15>
	; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr inbounds double, double [[ARG1:%.*]], i64 16			; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr inbounds double, double [[ARG1:%.*]], i64 16
	; CHECK-NEXT: [[TMP2:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP1]], i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x double> poison)			; CHECK-NEXT: [[TMP3:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP2]], i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x double> poison)
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[GEP2_0]] to <8 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[GEP2_0]] to <8 x double>*
	; CHECK-NEXT: [[TMP4:%.]] = load <8 x double>, <8 x double> [[TMP3]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <8 x double>, <8 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <8 x double> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <8 x double> [[TMP5]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARG1]] to <8 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[ARG1]] to <8 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <8 x double>, <8 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP8:%.]] = load <8 x double>, <8 x double> [[TMP7]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <8 x double> [[TMP7]], [[TMP2]]			; CHECK-NEXT: [[TMP9:%.*]] = fmul fast <8 x double> [[TMP8]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP8]])			; CHECK-NEXT: [[TMP10:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP9]])
	; CHECK-NEXT: [[TMP10:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP5]])			; CHECK-NEXT: [[TMP11:%.*]] = call fast double @llvm.vector.reduce.fadd.v8f64(double -0.000000e+00, <8 x double> [[TMP6]])
	; CHECK-NEXT: [[I142:%.*]] = insertelement <2 x double> poison, double [[TMP9]], i64 0			; CHECK-NEXT: [[I142:%.*]] = insertelement <2 x double> poison, double [[TMP10]], i64 0
	; CHECK-NEXT: [[I143:%.*]] = insertelement <2 x double> [[I142]], double [[TMP10]], i64 1			; CHECK-NEXT: [[I143:%.*]] = insertelement <2 x double> [[I142]], double [[TMP11]], i64 1
	; CHECK-NEXT: [[P:%.]] = getelementptr inbounds double, double [[ARG2:%.*]], <2 x i64> <i64 0, i64 16>			; CHECK-NEXT: [[P:%.]] = getelementptr inbounds double, double [[ARG2:%.*]], <2 x i64> <i64 0, i64 16>
	; CHECK-NEXT: call void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double> [[I143]], <2 x double*> [[P]], i32 8, <2 x i1> <i1 true, i1 true>)			; CHECK-NEXT: call void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double> [[I143]], <2 x double*> [[P]], i32 8, <2 x i1> <i1 true, i1 true>)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep1.0 = getelementptr inbounds double, double* %arg, i64 1			%gep1.0 = getelementptr inbounds double, double* %arg, i64 1
	%ld1.0 = load double, double* %gep1.0, align 8			%ld1.0 = load double, double* %gep1.0, align 8
	%ld0.0 = load double, double* %arg1, align 8			%ld0.0 = load double, double* %arg1, align 8
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -passes=slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s			; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -passes=slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	define void @fextr(i16* %ptr) {			define void @fextr(i16* %ptr) {
	; CHECK-LABEL: @fextr(			; CHECK-LABEL: @fextr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load <8 x i16>, <8 x i16> undef, align 16			; CHECK-NEXT: [[LD:%.]] = load <8 x i16>, <8 x i16> undef, align 16
	; CHECK-NEXT: br label [[T:%.*]]			; CHECK-NEXT: br label [[T:%.*]]
	; CHECK: t:			; CHECK: t:
	; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; CHECK-NEXT: [[TMP0:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP1:%.*]] = add <8 x i16> [[LD]], [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[TMP0]], <8 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store <8 x i16> [[TMP1]], <8 x i16>* [[TMP2]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'Stores SLP vectorized with cost '			; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-20'			; YAML-NEXT: - Cost: '-20'
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder-reused-masked-gather.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -mattr=+avx512f -mtriple=x86_64 -S < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -mattr=+avx512f -mtriple=x86_64 -S < %s \| FileCheck %s

	define void @test(float* noalias %0, float* %p) {			define void @test(float* noalias %0, float* %p) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x float> poison, float* [[P:%.*]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x float> poison, float* [[P:%.*]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP2]], <8 x float*> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.]] = shufflevector <8 x float> [[TMP2]], <8 x float*> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 15, i64 4, i64 5, i64 0, i64 2, i64 6, i64 7, i64 8>			; CHECK-NEXT: [[TMP4:%.]] = getelementptr float, <8 x float> [[TMP3]], <8 x i64> <i64 15, i64 4, i64 5, i64 0, i64 2, i64 6, i64 7, i64 8>
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 2			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 2
	; CHECK-NEXT: [[TMP5:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP3]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> poison)			; CHECK-NEXT: [[TMP6:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP4]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> poison)
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> poison, <16 x i32> <i32 4, i32 3, i32 0, i32 1, i32 2, i32 0, i32 1, i32 2, i32 0, i32 2, i32 5, i32 6, i32 7, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> poison, <16 x i32> <i32 4, i32 3, i32 0, i32 1, i32 2, i32 0, i32 1, i32 2, i32 0, i32 2, i32 5, i32 6, i32 7, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <16 x float> <float poison, float poison, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <16 x float> [[SHUFFLE1]], <16 x i32> <i32 18, i32 19, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <16 x float> [[SHUFFLE]], <16 x float> <float poison, float poison, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <16 x i32> <i32 2, i32 3, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	; CHECK-NEXT: [[TMP7:%.*]] = fadd reassoc nsz arcp contract afn <16 x float> [[SHUFFLE1]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = fadd reassoc nsz arcp contract afn <16 x float> [[SHUFFLE]], [[TMP7]]
	; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <16 x float> [[TMP7]], <16 x float> poison, <16 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 1, i32 9, i32 0, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x float> [[TMP8]], <16 x float> poison, <16 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 1, i32 9, i32 0, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP4]] to <16 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP5]] to <16 x float>*
	; CHECK-NEXT: store <16 x float> [[SHUFFLE2]], <16 x float>* [[TMP8]], align 4			; CHECK-NEXT: store <16 x float> [[SHUFFLE1]], <16 x float>* [[TMP9]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%2 = getelementptr inbounds float, float* %p, i64 2			%2 = getelementptr inbounds float, float* %p, i64 2
	%3 = getelementptr inbounds float, float* %p, i64 4			%3 = getelementptr inbounds float, float* %p, i64 4
	%4 = load float, float* %3, align 4			%4 = load float, float* %3, align 4
	%5 = getelementptr inbounds float, float* %p, i64 5			%5 = getelementptr inbounds float, float* %p, i64 5
	%6 = load float, float* %5, align 16			%6 = load float, float* %5, align 16
	%7 = getelementptr inbounds float, float* %p, i64 15			%7 = getelementptr inbounds float, float* %p, i64 15
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reused-undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -slp-threshold=-1000 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -slp-threshold=-1000 < %s \| FileCheck %s

	define i32 @main(i32 %0) {			define i32 @main(i32 %0) {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: for.cond.preheader:			; CHECK-NEXT: for.cond.preheader:
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.]], label [[FOR_INC_PREHEADER:%.]]			; CHECK-NEXT: br i1 false, label [[FOR_END:%.]], label [[FOR_INC_PREHEADER:%.]]
	; CHECK: for.inc.preheader:			; CHECK: for.inc.preheader:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 poison>, i32 [[TMP0:%.]], i32 6			; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 undef>, i32 [[TMP0:%.]], i32 6
	; CHECK-NEXT: br i1 false, label [[FOR_END]], label [[L1_PREHEADER:%.*]]			; CHECK-NEXT: br i1 false, label [[FOR_END]], label [[L1_PREHEADER:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[DOTPR:%.]] = phi i32 [ 0, [[FOR_INC_PREHEADER]] ], [ 0, [[FOR_COND_PREHEADER:%.]] ]			; CHECK-NEXT: [[DOTPR:%.]] = phi i32 [ 0, [[FOR_INC_PREHEADER]] ], [ 0, [[FOR_COND_PREHEADER:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> poison, i32 [[DOTPR]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> <i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[DOTPR]], i32 3
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: br label [[L1_PREHEADER]]			; CHECK-NEXT: br label [[L1_PREHEADER]]
	; CHECK: L1.preheader:			; CHECK: L1.preheader:
	; CHECK-NEXT: [[TMP3:%.*]] = phi <8 x i32> [ [[SHUFFLE]], [[FOR_END]] ], [ [[TMP1]], [[FOR_INC_PREHEADER]] ]			; CHECK-NEXT: [[TMP4:%.*]] = phi <8 x i32> [ [[TMP3]], [[FOR_END]] ], [ [[TMP1]], [[FOR_INC_PREHEADER]] ]
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	for.cond.preheader:			for.cond.preheader:
	br i1 false, label %for.end, label %for.inc.preheader			br i1 false, label %for.end, label %for.inc.preheader

	for.inc.preheader:			for.inc.preheader:
	br i1 false, label %for.end, label %L1.preheader			br i1 false, label %for.end, label %L1.preheader

	Show All 15 Lines

llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll

	Show All 11 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = trunc <2 x i32> [[TMP0]] to <2 x i8>			; CHECK-NEXT: [[TMP1:%.*]] = trunc <2 x i32> [[TMP0]] to <2 x i8>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i8> [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = zext i8 [[TMP2]] to i32			; CHECK-NEXT: [[TMP3:%.*]] = zext i8 [[TMP2]] to i32
	; CHECK-NEXT: [[BF_CAST162:%.*]] = and i32 [[TMP3]], 0			; CHECK-NEXT: [[BF_CAST162:%.*]] = and i32 [[TMP3]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> zeroinitializer, <2 x i32> [[TMP0]], <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> zeroinitializer, <2 x i32> [[TMP0]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[T13:%.*]] = and <2 x i32> [[TMP4]], zeroinitializer			; CHECK-NEXT: [[T13:%.*]] = and <2 x i32> [[TMP4]], zeroinitializer
	; CHECK-NEXT: br label [[ELSE1:%.*]]			; CHECK-NEXT: br label [[ELSE1:%.*]]
	; CHECK: else1:			; CHECK: else1:
	; CHECK-NEXT: [[T20:%.*]] = extractelement <2 x i32> [[T13]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[T13]], <2 x i32> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[BF_CAST162]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[BF_CAST162]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[T20]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt <2 x i32> [[TMP6]], zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt <2 x i32> [[TMP6]], zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
	; CHECK-NEXT: ret i1 [[TMP8]]			; CHECK-NEXT: ret i1 [[TMP8]]
	;			;
	entry:			entry:
	br i1 false, label %then, label %else			br i1 false, label %then, label %else

	then:			then:
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reorder.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -mcpu=cascadelake < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -mcpu=cascadelake < %s \| FileCheck %s

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX10_I_I86:%.*]] = getelementptr inbounds float, ptr undef, i64 2			; CHECK-NEXT: [[ARRAYIDX10_I_I86:%.*]] = getelementptr inbounds float, ptr undef, i64 2
	; CHECK-NEXT: [[ARRAYIDX21_I:%.*]] = getelementptr inbounds [4 x float], ptr undef, i64 2			; CHECK-NEXT: [[ARRAYIDX21_I:%.*]] = getelementptr inbounds [4 x float], ptr undef, i64 2
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr undef, align 4			; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr undef, align 4
	; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x float> zeroinitializer, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x float> zeroinitializer, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = load float, ptr [[ARRAYIDX10_I_I86]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = load float, ptr [[ARRAYIDX10_I_I86]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = load float, ptr undef, align 4			; CHECK-NEXT: [[TMP3:%.*]] = load float, ptr undef, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> <float 0.000000e+00, float poison>, <2 x float> [[TMP0]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP0]], <2 x float> <float 0.000000e+00, float poison>, <2 x i32> <i32 2, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[TMP3]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[TMP2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> <float poison, float 0.000000e+00>, float [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> <float poison, float 0.000000e+00>, <2 x i32> <i32 1, i32 3>
	; CHECK-NEXT: [[TMP8:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP4]], <2 x float> [[TMP6]], <2 x float> [[TMP7]])			; CHECK-NEXT: [[TMP8:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP4]], <2 x float> [[TMP6]], <2 x float> [[TMP7]])
	; CHECK-NEXT: br i1 false, label [[BB2:%.]], label [[BB3:%.]]			; CHECK-NEXT: br i1 false, label [[BB2:%.]], label [[BB3:%.]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x float> [ [[TMP9]], [[BB2]] ], [ zeroinitializer, [[BB1]] ]			; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x float> [ [[TMP9]], [[BB2]] ], [ zeroinitializer, [[BB1]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll

	Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[VAR4:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[VAR4:%.*]] = alloca i8, align 1
	; CHECK-NEXT: [[VAR5:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[VAR5:%.*]] = alloca i8, align 1
	; CHECK-NEXT: [[VAR17:%.]] = call i8 @wibble(i8* [[VAR4]])			; CHECK-NEXT: [[VAR17:%.]] = call i8 @wibble(i8* [[VAR4]])
	; CHECK-NEXT: [[VAR23:%.]] = call i8 @llvm.stacksave()			; CHECK-NEXT: [[VAR23:%.]] = call i8 @llvm.stacksave()
	; CHECK-NEXT: [[VAR24:%.*]] = alloca inalloca i32, align 4			; CHECK-NEXT: [[VAR24:%.*]] = alloca inalloca i32, align 4
	; CHECK-NEXT: call void @quux(i32* inalloca(i32) [[VAR24]])			; CHECK-NEXT: call void @quux(i32* inalloca(i32) [[VAR24]])
	; CHECK-NEXT: call void @llvm.stackrestore(i8* [[VAR23]])			; CHECK-NEXT: call void @llvm.stackrestore(i8* [[VAR23]])
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> poison, i8* [[VAR4]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> poison, i8* [[VAR4]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i8> [[TMP2]], <4 x i8*> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.]] = shufflevector <4 x i8> [[TMP2]], <4 x i8*> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: store <4 x i8> [[SHUFFLE]], <4 x i8>* [[TMP1]], align 8			; CHECK-NEXT: store <4 x i8> [[TMP3]], <4 x i8>* [[TMP1]], align 8
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8* [[VAR5]], i32 1			; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x i8> poison, i8* [[VAR4]], i32 0
	; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i8> [[TMP3]], <4 x i8*> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP5:%.]] = insertelement <2 x i8> [[TMP4]], i8* [[VAR5]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i8* [[VAR36]] to <4 x i8>			; CHECK-NEXT: [[TMP6:%.]] = shufflevector <2 x i8> [[TMP5]], <2 x i8*> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: store <4 x i8> [[SHUFFLE1]], <4 x i8>* [[TMP4]], align 8			; CHECK-NEXT: [[TMP7:%.]] = bitcast i8* [[VAR36]] to <4 x i8>
				; CHECK-NEXT: store <4 x i8> [[TMP6]], <4 x i8>* [[TMP7]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%var2 = alloca i8			%var2 = alloca i8
	%var3 = alloca i8			%var3 = alloca i8
	%var4 = alloca i8			%var4 = alloca i8
	%var5 = alloca i8			%var5 = alloca i8
	%var12 = alloca [12 x i8*]			%var12 = alloca [12 x i8*]
	%var15 = call i8* @wibble(i8* %var2)			%var15 = call i8* @wibble(i8* %var2)
	Show All 23 Lines
	}			}

	define void @spam() #1 {			define void @spam() #1 {
	; CHECK-LABEL: @spam(			; CHECK-LABEL: @spam(
	; CHECK-NEXT: [[VAR12:%.]] = alloca [12 x i8], align 8			; CHECK-NEXT: [[VAR12:%.]] = alloca [12 x i8], align 8
	; CHECK-NEXT: [[VAR36:%.]] = getelementptr inbounds [12 x i8], [12 x i8] [[VAR12]], i32 0, i32 4			; CHECK-NEXT: [[VAR36:%.]] = getelementptr inbounds [12 x i8], [12 x i8] [[VAR12]], i32 0, i32 4
	; CHECK-NEXT: [[VAR4:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[VAR4:%.*]] = alloca i8, align 1
	; CHECK-NEXT: [[VAR5:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[VAR5:%.*]] = alloca i8, align 1
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> poison, i8* [[VAR4]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> poison, i8* [[VAR4]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8* [[VAR5]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i8> [[TMP1]], i8* [[VAR5]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i8> [[TMP2]], <4 x i8*> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP3:%.]] = shufflevector <2 x i8> [[TMP2]], <2 x i8*> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8* [[VAR36]] to <4 x i8>			; CHECK-NEXT: [[TMP4:%.]] = bitcast i8* [[VAR36]] to <4 x i8>
	; CHECK-NEXT: store <4 x i8> [[SHUFFLE]], <4 x i8>* [[TMP3]], align 8			; CHECK-NEXT: store <4 x i8> [[TMP3]], <4 x i8>* [[TMP4]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%var4 = alloca i8			%var4 = alloca i8
	%var5 = alloca i8			%var5 = alloca i8
	%var12 = alloca [12 x i8*]			%var12 = alloca [12 x i8*]
	%var36 = getelementptr inbounds [12 x i8], [12 x i8]* %var12, i32 0, i32 4			%var36 = getelementptr inbounds [12 x i8], [12 x i8]* %var12, i32 0, i32 4
	store i8* %var4, i8** %var36			store i8* %var4, i8** %var36
	%var37 = getelementptr inbounds [12 x i8], [12 x i8]* %var12, i32 0, i32 5			%var37 = getelementptr inbounds [12 x i8], [12 x i8]* %var12, i32 0, i32 5
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/tiny-tree.ll

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret void		ret void
}		}

define void @store_splat(float*, float) {		define void @store_splat(float*, float) {
; CHECK-LABEL: @store_splat(		; CHECK-LABEL: @store_splat(
; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 0		; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 0
; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i32 0		; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i32 0
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <4 x i32> zeroinitializer		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP3]] to <4 x float>*		; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP3]] to <4 x float>*
; CHECK-NEXT: store <4 x float> [[SHUFFLE]], <4 x float>* [[TMP5]], align 4		; CHECK-NEXT: store <4 x float> [[TMP5]], <4 x float>* [[TMP6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%3 = getelementptr inbounds float, float* %0, i64 0		%3 = getelementptr inbounds float, float* %0, i64 0
store float %1, float* %3, align 4		store float %1, float* %3, align 4
%4 = getelementptr inbounds float, float* %0, i64 1		%4 = getelementptr inbounds float, float* %0, i64 1
store float %1, float* %4, align 4		store float %1, float* %4, align 4
%5 = getelementptr inbounds float, float* %0, i64 2		%5 = getelementptr inbounds float, float* %0, i64 2
store float %1, float* %5, align 4		store float %1, float* %5, align 4
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

define void @tiny_vector_with_diff_opcode(i16 %a, i16 %v1) {		define void @tiny_vector_with_diff_opcode(i16 %a, i16 %v1) {
; CHECK-LABEL: @tiny_vector_with_diff_opcode(		; CHECK-LABEL: @tiny_vector_with_diff_opcode(
; CHECK-NEXT: [[TMP1:%.]] = load i16, i16 [[V1:%.*]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load i16, i16 [[V1:%.*]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 undef to i16		; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 undef to i16
; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds i16, i16 [[A:%.*]], i64 0		; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds i16, i16 [[A:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i16> poison, i16 [[TMP1]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i16> poison, i16 [[TMP1]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i16> [[TMP3]], i16 [[TMP2]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i16> [[TMP3]], i16 [[TMP2]], i32 1
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP4]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i16> [[TMP4]], <2 x i16> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[PTR0]] to <8 x i16>*		; CHECK-NEXT: [[TMP6:%.]] = bitcast i16 [[PTR0]] to <8 x i16>*
; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* [[TMP5]], align 16		; CHECK-NEXT: store <8 x i16> [[TMP5]], <8 x i16>* [[TMP6]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%1 = load i16, i16* %v1, align 4		%1 = load i16, i16* %v1, align 4
%2 = trunc i64 undef to i16		%2 = trunc i64 undef to i16
%ptr0 = getelementptr inbounds i16, i16* %a, i64 0		%ptr0 = getelementptr inbounds i16, i16* %a, i64 0
store i16 %1, i16* %ptr0, align 16		store i16 %1, i16* %ptr0, align 16
%ptr1 = getelementptr inbounds i16, i16* %a, i64 1		%ptr1 = getelementptr inbounds i16, i16* %a, i64 1
store i16 %2, i16* %ptr1, align 4		store i16 %2, i16* %ptr1, align 4
Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

	Show All 37 Lines
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T15]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T40]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T27]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T27]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T47]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, <4 x i32> <i32 undef, i32 undef, i32 6, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T9]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T40]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 undef, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 undef, i32 3>
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T71]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T71]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

	Show All 37 Lines
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T15]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T40]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T27]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T27]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T47]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, <4 x i32> <i32 undef, i32 undef, i32 6, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T9]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T40]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 undef, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 undef, i32 3>
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T71]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T71]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define void @foo() {			define void @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float			; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float
	; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef			; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> <float poison, float poison, float undef, float undef>, float [[SUB]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP14:%.]], [[BB3:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP10:%.]], [[BB3:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8
	; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>
	; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double			; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1			; CHECK-NEXT: [[ADD1:%.*]] = fadd double [[TMP3]], [[CONV2]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1			; CHECK-NEXT: [[SUB1:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> <double poison, double poison, double undef, double undef>, double [[SUB1]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[ADD1]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <4 x double> [[TMP6]], [[TMP4]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = fptrunc <4 x double> [[TMP6]] to <4 x float>
	; CHECK-NEXT: [[TMP11:%.*]] = fcmp ogt <4 x double> [[TMP10]], [[TMP4]]			; CHECK-NEXT: [[TMP9:%.*]] = select <4 x i1> [[TMP7]], <4 x float> [[TMP2]], <4 x float> [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.*]] = fptrunc <4 x double> [[TMP10]] to <4 x float>
	; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP11]], <4 x float> [[TMP2]], <4 x float> [[TMP12]]
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP14]] = phi <4 x float> [ [[TMP13]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]			; CHECK-NEXT: [[TMP10]] = phi <4 x float> [ [[TMP9]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	entry:			entry:
	%conv = uitofp i16 undef to float			%conv = uitofp i16 undef to float
	%sub = fsub float 6.553500e+04, undef			%sub = fsub float 6.553500e+04, undef
	br label %bb1			br label %bb1

	bb1:			bb1:
	Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 468668

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cmp-swapped-pred.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/broadcast_long.ll

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/gather-extractelements-different-bbs.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

llvm/test/Transforms/SLPVectorizer/X86/hoist.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-const-undef.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/insertelement-postpone.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

llvm/test/Transforms/SLPVectorizer/X86/landing_pad.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/malformed_phis.ll

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

llvm/test/Transforms/SLPVectorizer/X86/odd_store.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-same-vals.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

llvm/test/Transforms/SLPVectorizer/X86/redux-feed-buildvector.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder-reused-masked-gather.ll

llvm/test/Transforms/SLPVectorizer/X86/reused-undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic