This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic

Authored by ABataev on Oct 1 2021, 4:10 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
anton-afanasyev
vporpo

Commits

rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph.

Summary

Currently we emit gathers for scalars being vectorized in the tre as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Oct 1 2021, 4:10 PM

Herald added subscribers: kerbowa, hiraditya, nhaehnle, jvesely. · View Herald TranscriptOct 1 2021, 4:10 PM

ABataev requested review of this revision.Oct 1 2021, 4:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2021, 4:10 PM

Harbormaster completed remote builds in B126755: Diff 376651.Oct 1 2021, 4:10 PM

Rebase

Harbormaster completed remote builds in B126915: Diff 377013.Oct 4 2021, 2:29 PM

RKSimon retitled this revision from [SLP]Improve gathering of the scals used in the graph. to [SLP]Improve gathering of the scalars used in the graph..Oct 5 2021, 6:35 AM

Rebase + bug fixes

Harbormaster completed remote builds in B133811: Diff 386648.Nov 11 2021, 2:47 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Rebase

Harbormaster completed remote builds in B135503: Diff 389033.Nov 22 2021, 7:36 PM

RKSimon added inline comments.Nov 29 2021, 9:13 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
296	Is it worth merging the isa<> and cast<> into a dyn_cast<>?
489	return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block?
4600	What targets are we still missing support for?

ABataev added inline comments.Nov 29 2021, 9:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4600	AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.

Rebase + address comments.

Harbormaster completed remote builds in B136480: Diff 390398.Nov 29 2021, 11:39 AM

Rebase

Harbormaster completed remote builds in B136694: Diff 390702.Nov 30 2021, 8:08 AM

Rebase

Harbormaster completed remote builds in B136747: Diff 390783.Nov 30 2021, 1:09 PM

Rebase

Harbormaster completed remote builds in B138215: Diff 392842.Dec 8 2021, 12:09 PM

Rebase

RKSimon added inline comments.Dec 14 2021, 8:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4562	Wshadow warning vs Idx @ Line 4688?
4597	Wshadow warning vs Idx @ Line 4688?

Address comments

Harbormaster completed remote builds in B139236: Diff 394269.Dec 14 2021, 9:48 AM

Rebase

Harbormaster completed remote builds in B141051: Diff 396715.Dec 30 2021, 2:15 PM

ABataev mentioned this in D123587: [SLP] Generate shuffles if we can reorder an existing node.Apr 12 2022, 12:05 PM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 7:51 AM

Herald added subscribers: • pcwang-thead, nlopes, kosarev. · View Herald Transcript

nlopes added inline comments.Aug 26 2022, 7:54 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6627	Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you!

ABataev added inline comments.Aug 26 2022, 8:08 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6627	Sure, thanks!

Address comments

Harbormaster completed remote builds in B183623: Diff 455933.Aug 26 2022, 10:50 AM

Rebase

Harbormaster completed remote builds in B186399: Diff 459790.Sep 13 2022, 11:19 AM

ABataev mentioned this in rG796af0c02728: [SLP] Move getInsertIndex function, NFC..Sep 14 2022, 6:24 AM

ABataev mentioned this in rGd647312e3f57: [SLP][NFC]Extract getLastInstructionInBundle function for better.Sep 14 2022, 8:44 AM

Rebase

Harbormaster completed remote builds in B192832: Diff 468668.Oct 18 2022, 1:42 PM

nhaehnle removed a subscriber: nhaehnle.Oct 19 2022, 2:00 AM

Large update.
Includes:

Unifies all shuffle builders and shuffle demission operands.
Generalizes emission and cost model estimation of the buildvectors/gathers.

Will be splitted into several smaller patches eventually.

Harbormaster completed remote builds in B201460: Diff 480583.Dec 6 2022, 9:34 PM

ABataev mentioned this in D139718: [SLP][NFC]Inital redesign of ShuffleInstructionBuilder, NFC..Dec 9 2022, 7:50 AM

ABataev mentioned this in rGecac8192dbf6: [SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC..Dec 13 2022, 9:54 AM

Rebase

Harbormaster completed remote builds in B202927: Diff 482594.Dec 13 2022, 1:17 PM

Restore accidentally removed code.

Harbormaster completed remote builds in B202945: Diff 482619.Dec 13 2022, 2:43 PM

Rebase

Harbormaster completed remote builds in B204383: Diff 484571.Dec 21 2022, 7:50 AM

ABataev mentioned this in D140499: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 21 2022, 1:54 PM

khchen added a subscriber: khchen.Dec 22 2022, 8:35 AM

ABataev mentioned this in rGac01ae71f0c4: [SLP]Use ShuffleInstructionBuilder for vector shrinking..Dec 28 2022, 6:11 AM

Rebase

Harbormaster completed remote builds in B206131: Diff 486895.Jan 6 2023, 10:07 AM

Rebase

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 9 2023, 9:43 AM

Harbormaster completed remote builds in B206577: Diff 487485.Jan 9 2023, 10:30 AM

ABataev mentioned this in D141512: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 11 2023, 8:33 AM

ABataev mentioned this in D141940: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Jan 17 2023, 8:01 AM

ABataev mentioned this in rG9bdcf8778a5c: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars..Jan 19 2023, 1:50 PM

ABataev mentioned this in rG708eb1b96d9a: [SLP]Add shuffling of extractelements to avoid extra costs/data movement..Feb 20 2023, 6:16 AM

ABataev mentioned this in D144958: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Feb 28 2023, 5:21 AM

ABataev mentioned this in rGa611b3f3059e: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 7 2023, 12:47 PM

Rebase

Restore deleted code/update test

Harbormaster completed remote builds in B218206: Diff 503510.Mar 8 2023, 2:48 PM

ABataev mentioned this in D145732: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function..Mar 9 2023, 2:20 PM

hans mentioned this in rG3b3a4c270bcb: Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather….Mar 10 2023, 5:40 AM

ABataev mentioned this in rG93a9be0cea0a: [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes..Mar 10 2023, 1:22 PM

ABataev mentioned this in rGf3a68ac10c84: [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector….Mar 13 2023, 6:27 AM

Rebase

RKSimon added inline comments.Mar 13 2023, 2:27 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4539	Any chance that we can use ShuffleVectorInst::isIdentityMask ?
4878	auto *
4880	auto *

ABataev added inline comments.Mar 13 2023, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4539	Sure, will do it later
4880	Both these cases are the existing code, just the diff is not quite correct because of the big differences.

Restore accidentally removed lines, address comments

Harbormaster completed remote builds in B219182: Diff 504861.Mar 13 2023, 5:18 PM

Rebase

Restore some deleted code

Harbormaster completed remote builds in B219617: Diff 505467.Mar 15 2023, 7:08 AM

ABataev mentioned this in D146167: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars..Mar 15 2023, 2:14 PM

ABataev mentioned this in rG0ad87ffdcc23: [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining….Mar 17 2023, 11:21 AM

Rebase

Harbormaster completed remote builds in B220124: Diff 506162.Mar 17 2023, 12:55 PM

ABataev mentioned this in D146564: [SLP]Find reused scalars in buildvector sequences, if any..Mar 21 2023, 2:11 PM

ABataev mentioned this in rG40105a993399: [SLP]Find reused scalars in buildvector sequences, if any..Apr 5 2023, 9:39 AM

Rebase

Harbormaster completed remote builds in B224057: Diff 511474.Apr 6 2023, 11:37 AM

Rebase

Harbormaster completed remote builds in B224133: Diff 511560.Apr 6 2023, 5:26 PM

Rebase

Harbormaster completed remote builds in B224875: Diff 512589.Apr 11 2023, 3:26 PM

ABataev mentioned this in D148174: [SLP]Introduce gather cost estimation function..Apr 12 2023, 2:36 PM

ABataev mentioned this in rGf82eb7e066f3: [SLP]Introduce gather cost estimation function..Apr 13 2023, 10:19 AM

Rebase

Harbormaster completed remote builds in B225410: Diff 513316.Apr 13 2023, 12:33 PM

ABataev mentioned this in D148279: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions..Apr 13 2023, 4:42 PM

ABataev mentioned this in rGcd341f3f4878: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 5:55 AM

ABataev mentioned this in rG1ce4b26a21a0: [SLP]Add final resize to ShuffleCostEstimator::finalize member function and….Apr 18 2023, 11:54 AM

Rebase

Harbormaster completed remote builds in B227770: Diff 516462.Apr 24 2023, 11:19 AM

dtemirbulatov added a reviewer: vporpo.Apr 27 2023, 5:39 PM

Temp rebase, requires some extra work.

Harbormaster completed remote builds in B230224: Diff 519833.May 5 2023, 7:04 AM

Rebase

Herald added a subscriber: wangpc. · View Herald TranscriptNov 9 2023, 2:20 PM

Harbormaster completed remote builds in B258052: Diff 558067.Nov 9 2023, 6:17 PM

Rebase

Harbormaster completed remote builds in B258083: Diff 558113.Nov 16 2023, 10:49 AM

LGTM.

This revision is now accepted and ready to land.Thu, Nov 30, 7:34 AM

LGTM.

Rebase

Harbormaster completed remote builds in B258147: Diff 558197.Thu, Nov 30, 11:35 AM

Closed by commit rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph. (authored by ABataev). · Explain WhyFri, Dec 1, 11:26 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG279b1ea65f84: [SLP]Improve gathering of the scalars used in the graph..

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

In D110978#4657889, @vporpo wrote:

This is causing a performance regression.

@ABataev could you please take a look? Here is a reduced reproducer. It is getting vectorized without this patch, but is not getting vectorized with it.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"classA" = type { %"vector", %"vector", %"complex" }
%"vector" = type { ptr, ptr, %"pair" }
%"pair" = type { %"pair_elem" }
%"pair_elem" = type { ptr }
%"complex" = type { double, double }

define void @foo() #0 {
  %1 = getelementptr %"classA", ptr null, i64 0, i32 2
  %2 = getelementptr %"classA", ptr null, i64 0, i32 2, i32 1
  br i1 false, label %10, label %3

3:                                                ; preds = %10, %0                                                                                                                                                
  %4 = phi double [ 0.000000e+00, %0 ], [ %25, %10 ]
  %5 = phi double [ 0.000000e+00, %0 ], [ %24, %10 ]
  %6 = fmul double %5, %5
  %7 = fmul double %4, %4
  %8 = fadd double %7, %6
  %9 = fcmp ult double %8, 0.000000e+00
  ret void

10:                                               ; preds = %10, %0                                                                                                                                                
  %11 = phi double [ %24, %10 ], [ 0.000000e+00, %0 ]
  %12 = phi double [ %25, %10 ], [ 0.000000e+00, %0 ]
  %13 = load double, ptr null, align 8
  %14 = load double, ptr null, align 8
  %15 = load double, ptr null, align 8
  %16 = getelementptr %"complex", ptr null, i64 0, i32 1
  %17 = load double, ptr %16, align 8
  %18 = fmul double %13, %15
  %19 = fmul double %14, %17
  %20 = fadd double %18, %19
  %21 = fmul double %14, %15
  %22 = fmul double %13, %17
  %23 = fsub double %21, %22
  %24 = fadd double %11, %20
  store double %11, ptr %1, align 8
  %25 = fadd double %12, %23
  store double %12, ptr %2, align 8
  br i1 false, label %3, label %10

; uselistorder directives                                                                                                                                                                                          
  uselistorder double %24, { 1, 0 }
  uselistorder double %25, { 1, 0 }
}

attributes #0 = { "target-features"="+aes,+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }

Thanks!

Ping @ABataev ! This is blocking our internal release at Google!

dtemirbulatov added a subscriber: dtemirbulatov.Tue, Dec 12, 1:54 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

1302 lines

test/

Transforms/

PhaseOrdering/

AArch64/

matrix-extract-insert.ll

8 lines

X86/

vector-reductions-logical.ll

60 lines

vector-reductions.ll

2 lines

SLPVectorizer/

AArch64/

accelerate-vector-functions-inseltpoison.ll

90 lines

accelerate-vector-functions.ll

90 lines

horizontal.ll

2 lines

reorder-fmuladd-crash.ll

4 lines

transpose-inseltpoison.ll

2 lines

transpose.ll

2 lines

vectorize-free-extracts-inserts.ll

219 lines

AMDGPU/

add_sub_sat-inseltpoison.ll

20 lines

add_sub_sat.ll

20 lines

crash_extract_subvector_cost.ll

13 lines

SystemZ/

pr34619.ll

2 lines

X86/

PR35865-inseltpoison.ll

6 lines

PR35865.ll

6 lines

PR39774.ll

19 lines

alternate-calls-inseltpoison.ll

18 lines

alternate-calls.ll

18 lines

alternate-cast-inseltpoison.ll

22 lines

alternate-cast.ll

22 lines

alternate-fp-inseltpoison.ll

6 lines

alternate-fp.ll

6 lines

alternate-int-inseltpoison.ll

48 lines

alternate-int.ll

48 lines

arith-fp-inseltpoison.ll

62 lines

arith-fp.ll

62 lines

blending-shuffle-inseltpoison.ll

42 lines

blending-shuffle.ll

42 lines

cmp_commute-inseltpoison.ll

2 lines

cmp_commute.ll

2 lines

commutativity.ll

62 lines

crash_exceed_scheduling.ll

6 lines

crash_lencod.ll

9 lines

crash_smallpt.ll

12 lines

crash_vectorizeTree.ll

4 lines

cse.ll

14 lines

diamond_broadcast_extra_shuffle.ll

34 lines

extract-shuffle-inseltpoison.ll

9 lines

9 lines

11 lines

24 lines

6 lines

26 lines

51 lines

insert-element-build-vector-inseltpoison.ll

150 lines

insert-element-build-vector.ll

153 lines

insert-shuffle.ll

29 lines

jumbled-load-multiuse.ll

7 lines

jumbled-load-used-in-phi.ll

2 lines

jumbled-load.ll

17 lines

jumbled_store_crash.ll

17 lines

lookahead.ll

41 lines

matched-shuffled-entries.ll

41 lines

memory-runtime-checks.ll

6 lines

8 lines

12 lines

84 lines

42 lines

6 lines

2 lines

87 lines

4 lines

remark_extract_broadcast.ll

12 lines

shrink_after_reorder.ll

6 lines

shrink_after_reorder2.ll

2 lines

split-load8_2-unord.ll

9 lines

tiny-tree.ll

6 lines

vectorize-reorder-alt-shuffle.ll

4 lines

vectorize-widest-phis.ll

26 lines

slp-umax-rdx-matcher-crash.ll

4 lines

Diff 390783

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	static bool isCommutative(Instruction *I) {
if (auto *BO = dyn_cast<BinaryOperator>(I))		if (auto *BO = dyn_cast<BinaryOperator>(I))
return BO->isCommutative();		return BO->isCommutative();
// TODO: This should check for generic Instruction::isCommutative(), but		// TODO: This should check for generic Instruction::isCommutative(), but
// we need to confirm that the caller code correctly handles Intrinsics		// we need to confirm that the caller code correctly handles Intrinsics
// for example (does not have 2 operands).		// for example (does not have 2 operands).
return false;		return false;
}		}

		/// Checks if the given value is actually an undefined constant vector.
		static bool isUndefVector(const Value *V) {
		auto *C = dyn_cast<Constant>(V);
		if (!C)
		return false;
		if (!C->containsUndefOrPoisonElement() \|\| !isa<FixedVectorType>(C->getType()))
		return false;
		auto *VecTy = dyn_cast<FixedVectorType>(C->getType());
		if (!VecTy)
		return false;
		if (isa<UndefValue>(C))
		RKSimonUnsubmitted Not Done Reply Inline Actions Is it worth merging the isa<> and cast<> into a dyn_cast<>? RKSimon: Is it worth merging the isa<> and cast<> into a dyn_cast<>?
		return true;
		for (unsigned I = 0, E = VecTy->getNumElements(); I != E; ++I) {
		if (Constant *Elem = C->getAggregateElement(I))
		if (!isa<UndefValue>(Elem))
		return false;
		}
		return true;
		}

/// Checks if the vector of instructions can be represented as a shuffle, like:		/// Checks if the vector of instructions can be represented as a shuffle, like:
/// %x0 = extractelement <4 x i8> %x, i32 0		/// %x0 = extractelement <4 x i8> %x, i32 0
/// %x3 = extractelement <4 x i8> %x, i32 3		/// %x3 = extractelement <4 x i8> %x, i32 3
/// %y1 = extractelement <4 x i8> %y, i32 1		/// %y1 = extractelement <4 x i8> %y, i32 1
/// %y2 = extractelement <4 x i8> %y, i32 2		/// %y2 = extractelement <4 x i8> %y, i32 2
/// %x0x0 = mul i8 %x0, %x0		/// %x0x0 = mul i8 %x0, %x0
/// %x3x3 = mul i8 %x3, %x3		/// %x3x3 = mul i8 %x3, %x3
/// %y1y1 = mul i8 %y1, %y1		/// %y1y1 = mul i8 %y1, %y1
/// %y2y2 = mul i8 %y2, %y2		/// %y2y2 = mul i8 %y2, %y2
/// %ins1 = insertelement <4 x i8> poison, i8 %x0x0, i32 0		/// %ins1 = insertelement <4 x i8> poison, i8 %x0x0, i32 0
/// %ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1		/// %ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1
/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		/// %ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		/// %ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
/// ret <4 x i8> %ins4		/// ret <4 x i8> %ins4
/// can be transformed into:		/// can be transformed into:
/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,		/// %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <4 x i32> <i32 0, i32 3, i32 5,
/// i32 6>		/// i32 6>
/// %2 = mul <4 x i8> %1, %1		/// %2 = mul <4 x i8> %1, %1
/// ret <4 x i8> %2		/// ret <4 x i8> %2
/// We convert this initially to something like:
/// %x0 = extractelement <4 x i8> %x, i32 0
/// %x3 = extractelement <4 x i8> %x, i32 3
/// %y1 = extractelement <4 x i8> %y, i32 1
/// %y2 = extractelement <4 x i8> %y, i32 2
/// %1 = insertelement <4 x i8> poison, i8 %x0, i32 0
/// %2 = insertelement <4 x i8> %1, i8 %x3, i32 1
/// %3 = insertelement <4 x i8> %2, i8 %y1, i32 2
/// %4 = insertelement <4 x i8> %3, i8 %y2, i32 3
/// %5 = mul <4 x i8> %4, %4
/// %6 = extractelement <4 x i8> %5, i32 0
/// %ins1 = insertelement <4 x i8> poison, i8 %6, i32 0
/// %7 = extractelement <4 x i8> %5, i32 1
/// %ins2 = insertelement <4 x i8> %ins1, i8 %7, i32 1
/// %8 = extractelement <4 x i8> %5, i32 2
/// %ins3 = insertelement <4 x i8> %ins2, i8 %8, i32 2
/// %9 = extractelement <4 x i8> %5, i32 3
/// %ins4 = insertelement <4 x i8> %ins3, i8 %9, i32 3
/// ret <4 x i8> %ins4
/// InstCombiner transforms this into a shuffle and vector mul
/// Mask will return the Shuffle Mask equivalent to the extracted elements.		/// Mask will return the Shuffle Mask equivalent to the extracted elements.
/// TODO: Can we split off and reuse the shuffle mask detection from		/// TODO: Can we split off and reuse the shuffle mask detection from
/// TargetTransformInfo::getInstructionThroughput?		/// TargetTransformInfo::getInstructionThroughput?
static Optional<TargetTransformInfo::ShuffleKind>		static Optional<TargetTransformInfo::ShuffleKind>
isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {		isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });		find_if(VL, [](Value *V) { return isa<ExtractElementInst>(V); });
if (It == VL.end())		if (It == VL.end())
Show All 12 Lines	for (unsigned I = 0, E = VL.size(); I < E; ++I) {
// Undef can be represented as an undef element in a vector.		// Undef can be represented as an undef element in a vector.
if (isa<UndefValue>(VL[I]))		if (isa<UndefValue>(VL[I]))
continue;		continue;
auto *EI = cast<ExtractElementInst>(VL[I]);		auto *EI = cast<ExtractElementInst>(VL[I]);
if (isa<ScalableVectorType>(EI->getVectorOperandType()))		if (isa<ScalableVectorType>(EI->getVectorOperandType()))
return None;		return None;
auto *Vec = EI->getVectorOperand();		auto *Vec = EI->getVectorOperand();
// We can extractelement from undef or poison vector.		// We can extractelement from undef or poison vector.
if (isa<UndefValue>(Vec))		if (isUndefVector(Vec))
continue;		continue;
// All vector operands must have the same number of vector elements.		// All vector operands must have the same number of vector elements.
if (cast<FixedVectorType>(Vec->getType())->getNumElements() != Size)		if (cast<FixedVectorType>(Vec->getType())->getNumElements() != Size)
return None;		return None;
if (isa<UndefValue>(EI->getIndexOperand()))		if (isa<UndefValue>(EI->getIndexOperand()))
continue;		continue;
auto *Idx = dyn_cast<ConstantInt>(EI->getIndexOperand());		auto *Idx = dyn_cast<ConstantInt>(EI->getIndexOperand());
if (!Idx)		if (!Idx)
Show All 27 Lines	isFixedVectorShuffle(ArrayRef<Value *> VL, SmallVectorImpl<int> &Mask) {
if (CommonShuffleMode == Select && Vec2)		if (CommonShuffleMode == Select && Vec2)
return TargetTransformInfo::SK_Select;		return TargetTransformInfo::SK_Select;
// If Vec2 was never used, we have a permutation of a single vector, otherwise		// If Vec2 was never used, we have a permutation of a single vector, otherwise
// we have permutation of 2 vectors.		// we have permutation of 2 vectors.
return Vec2 ? TargetTransformInfo::SK_PermuteTwoSrc		return Vec2 ? TargetTransformInfo::SK_PermuteTwoSrc
: TargetTransformInfo::SK_PermuteSingleSrc;		: TargetTransformInfo::SK_PermuteSingleSrc;
}		}

		/// Tries to find extractelement instructions with constant indices from fixed
		/// vector type and gather such instructions into a bunch, which highly likely
		/// might be detected as a shuffle of 1 or 2 input vectors. If this attempt was
		/// successful, the matched scalars are replaced by poison values in \p VL for
		/// future analysis.
		static Optional<TTI::ShuffleKind>
		tryToGatherExtractElements(SmallVectorImpl<Value *> &VL,
		SmallVectorImpl<int> &Mask) {
		// Scan list of gathered scalars for extractelements that can be represented
		// as shuffles.
		MapVector<Value *, SmallVector<int>> VectorOpToIdx;
		SmallVector<int> UndefVectorExtracts;
		for (int I = 0, E = VL.size(); I < E; ++I) {
		auto *EI = dyn_cast<ExtractElementInst>(VL[I]);
		if (!EI \|\| !isa<FixedVectorType>(EI->getVectorOperandType()) \|\|
		!isa<ConstantInt, UndefValue>(EI->getIndexOperand()))
		continue;
		if (isUndefVector(EI->getVectorOperand())) {
		UndefVectorExtracts.push_back(I);
		continue;
		}
		VectorOpToIdx[EI->getVectorOperand()].push_back(I);
		}
		// Sort the vector operands by the maximum number of uses in extractelements.
		MapVector<unsigned, SmallVector<Value *>> VFToVector;
		for (const auto &Data : VectorOpToIdx)
		VFToVector[cast<FixedVectorType>(Data.first->getType())->getNumElements()]
		.push_back(Data.first);
		for (auto &Data : VFToVector) {
		stable_sort(Data.second, [&VectorOpToIdx](Value V1, Value V2) {
		return VectorOpToIdx.find(V1)->second.size() >
		VectorOpToIdx.find(V2)->second.size();
		});
		}
		// Find the best pair of the vectors with the same number of elements or a
		// single vector.
		const int UndefSz = UndefVectorExtracts.size();
		unsigned SingleMax = 0;
		Value *SingleVec = nullptr;
		unsigned PairMax = 0;
		std::pair<Value , Value > PairVec(nullptr, nullptr);
		for (auto &Data : VFToVector) {
		Value *V1 = Data.second.front();
		if (SingleMax < VectorOpToIdx[V1].size() + UndefSz) {
		SingleMax = VectorOpToIdx[V1].size() + UndefSz;
		SingleVec = V1;
		}
		Value *V2 = nullptr;
		if (Data.second.size() > 1)
		V2 = *std::next(Data.second.begin());
		if (V2 && PairMax < VectorOpToIdx[V1].size() + VectorOpToIdx[V2].size() +
		UndefSz) {
		PairMax = VectorOpToIdx[V1].size() + VectorOpToIdx[V2].size() + UndefSz;
		PairVec = std::make_pair(V1, V2);
		}
		}
		if (SingleMax == 0 && PairMax == 0 && UndefSz == 0)
		return None;
		// Check if better to perform a shuffle of 2 vectors or just of a single
		// vector.
		SmallVector<Value *> SavedVL(VL.begin(), VL.end());
		SmallVector<Value *> GatheredExtracts(
		VL.size(), PoisonValue::get(VL.front()->getType()));
		if (SingleMax >= PairMax && SingleMax) {
		for (int Idx : VectorOpToIdx[SingleVec])
		std::swap(GatheredExtracts[Idx], VL[Idx]);
		} else {
		for (Value *V : {PairVec.first, PairVec.second})
		for (int Idx : VectorOpToIdx[V])
		std::swap(GatheredExtracts[Idx], VL[Idx]);
		}
		// Add extracts from undefs too.
		for (int Idx : UndefVectorExtracts)
		std::swap(GatheredExtracts[Idx], VL[Idx]);
		// Check that gather of extractelements can be represented as just a
		// shuffle of a single/two vectors the scalars are extracted from.
		Optional<TTI::ShuffleKind> Res = isFixedVectorShuffle(GatheredExtracts, Mask);
		if (!Res.hasValue()) {
		// Restore the original VL if attempt was not successful.
		VL.swap(SavedVL);
		return None;
		}
		// Restore unused scalars from mask.
		for (int I = 0, E = GatheredExtracts.size(); I > E; ++I) {
		auto *EI = dyn_cast<ExtractElementInst>(VL[I]);
		if (!EI \|\| !isa<FixedVectorType>(EI->getVectorOperandType()) \|\|
		!isa<ConstantInt, UndefValue>(EI->getIndexOperand()) \|\|
		is_contained(UndefVectorExtracts, I))
		continue;
		if (Mask[I] == UndefMaskElem)
		std::swap(VL[I], GatheredExtracts[I]);
		}
		return Res;
		RKSimonUnsubmitted Not Done Reply Inline Actions return None instead to make it obvious it failed? Maybe do this as an early out instead of the much bigger if (Res.hasValue()) indented block? RKSimon: return None instead to make it obvious it failed? Maybe do this as an early out instead of the…
		}

namespace {		namespace {

/// Main data required for vectorization of instructions.		/// Main data required for vectorization of instructions.
struct InstructionsState {		struct InstructionsState {
/// The very first instruction in the list with the main opcode.		/// The very first instruction in the list with the main opcode.
Value *OpValue = nullptr;		Value *OpValue = nullptr;

/// The main/alternate instruction.		/// The main/alternate instruction.
▲ Show 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	void deleteTree() {
MustGather.clear();		MustGather.clear();
ExternalUses.clear();		ExternalUses.clear();
for (auto &Iter : BlocksSchedules) {		for (auto &Iter : BlocksSchedules) {
BlockScheduling *BS = Iter.second.get();		BlockScheduling *BS = Iter.second.get();
BS->clear();		BS->clear();
}		}
MinBWs.clear();		MinBWs.clear();
InstrElementSize.clear();		InstrElementSize.clear();
		PostponedGathers.clear();
}		}

unsigned getTreeSize() const { return VectorizableTree.size(); }		unsigned getTreeSize() const { return VectorizableTree.size(); }

/// Perform LICM and CSE on the newly generated gather sequences.		/// Perform LICM and CSE on the newly generated gather sequences.
void optimizeGatherSequence();		void optimizeGatherSequence();

/// Checks if the specified gather tree entry \p TE can be represented as a		/// Checks if the specified gather tree entry \p TE can be represented as a
▲ Show 20 Lines • Show All 821 Lines • ▼ Show 20 Lines	private:
/// non-identity permutation that allows to reuse extract instructions.		/// non-identity permutation that allows to reuse extract instructions.
bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,		bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,
SmallVectorImpl<unsigned> &CurrentOrder) const;		SmallVectorImpl<unsigned> &CurrentOrder) const;

/// Vectorize a single entry in the tree.		/// Vectorize a single entry in the tree.
Value vectorizeTree(TreeEntry E);		Value vectorizeTree(TreeEntry E);

/// Vectorize a single entry in the tree, starting in \p VL.		/// Vectorize a single entry in the tree, starting in \p VL.
Value vectorizeTree(ArrayRef<Value > VL);		Value vectorizeTree(ArrayRef<Value > VL, const EdgeInfo &EI);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars. If \p		/// context means the creation of vectors from a group of scalars. If \p
/// NeedToShuffle is true, need to add a cost of reshuffling some of the		/// NeedToShuffle is true, need to add a cost of reshuffling some of the
/// vector elements.		/// vector elements.
InstructionCost getGatherCost(FixedVectorType *Ty,		InstructionCost getGatherCost(FixedVectorType *Ty,
const DenseSet<unsigned> &ShuffledIndices,		const DenseSet<unsigned> &ShuffledIndices,
bool NeedToShuffle) const;		bool NeedToShuffle) const;

/// Checks if the gathered \p VL can be represented as shuffle(s) of previous		/// Checks if the gathered \p VL can be represented as shuffle(s) of previous
/// tree entries.		/// tree entries.
/// \returns ShuffleKind, if gathered values can be represented as shuffles of		/// \returns ShuffleKind, if gathered values can be represented as shuffles of
/// previous tree entries. \p Mask is filled with the shuffle mask.		/// previous tree entries. \p Mask is filled with the shuffle mask.
Optional<TargetTransformInfo::ShuffleKind>		Optional<TargetTransformInfo::ShuffleKind>
isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,		isGatherShuffledEntry(const TreeEntry TE, ArrayRef<Value > VL,
		SmallVectorImpl<int> &Mask,
SmallVectorImpl<const TreeEntry *> &Entries);		SmallVectorImpl<const TreeEntry *> &Entries);

/// \returns the scalarization cost for this list of values. Assuming that		/// \returns the scalarization cost for this list of values. Assuming that
/// this subtree gets vectorized, we may need to extract the values from the		/// this subtree gets vectorized, we may need to extract the values from the
/// roots. This method calculates the cost of extracting the values.		/// roots. This method calculates the cost of extracting the values.
InstructionCost getGatherCost(ArrayRef<Value *> VL) const;		InstructionCost getGatherCost(ArrayRef<Value *> VL) const;

		/// Returns the last instruction in the bundle.
		Instruction &getLastInstructionInBundle(const TreeEntry *E);

/// Set the Builder insert point to one after the last instruction in		/// Set the Builder insert point to one after the last instruction in
/// the bundle		/// the bundle
void setInsertPointAfterBundle(const TreeEntry *E);		void setInsertPointAfterBundle(const TreeEntry *E);

/// \returns a vector from a collection of scalars in \p VL.		/// \returns a vector from a collection of scalars in \p VL.
Value gather(ArrayRef<Value > VL);		Value gather(ArrayRef<Value > VL, Value *Root = nullptr);

/// \returns whether the VectorizableTree is fully vectorizable and will		/// \returns whether the VectorizableTree is fully vectorizable and will
/// be beneficial even the tree height is tiny.		/// be beneficial even the tree height is tiny.
bool isFullyVectorizableTinyTree(bool ForReduction) const;		bool isFullyVectorizableTinyTree(bool ForReduction) const;

/// Reorder commutative or alt operands to get better probability of		/// Reorder commutative or alt operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,
▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	LLVM_DUMP_METHOD void dump() const {
if (VectorizedValue)		if (VectorizedValue)
dbgs() << *VectorizedValue << "\n";		dbgs() << *VectorizedValue << "\n";
else		else
dbgs() << "NULL\n";		dbgs() << "NULL\n";
dbgs() << "ReuseShuffleIndices: ";		dbgs() << "ReuseShuffleIndices: ";
if (ReuseShuffleIndices.empty())		if (ReuseShuffleIndices.empty())
dbgs() << "Empty";		dbgs() << "Empty";
else		else
for (unsigned ReuseIdx : ReuseShuffleIndices)		for (int ReuseIdx : ReuseShuffleIndices)
dbgs() << ReuseIdx << ", ";		dbgs() << ReuseIdx << ", ";
dbgs() << "\n";		dbgs() << "\n";
dbgs() << "ReorderIndices: ";		dbgs() << "ReorderIndices: ";
for (unsigned ReorderIdx : ReorderIndices)		for (unsigned ReorderIdx : ReorderIndices)
dbgs() << ReorderIdx << ", ";		dbgs() << ReorderIdx << ", ";
dbgs() << "\n";		dbgs() << "\n";
dbgs() << "UserTreeIndices: ";		dbgs() << "UserTreeIndices: ";
for (const auto &EInfo : UserTreeIndices)		for (const auto &EInfo : UserTreeIndices)
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	#endif

const TreeEntry getTreeEntry(Value V) const {		const TreeEntry getTreeEntry(Value V) const {
return ScalarToTreeEntry.lookup(V);		return ScalarToTreeEntry.lookup(V);
}		}

/// Maps a specific scalar to its tree entry.		/// Maps a specific scalar to its tree entry.
SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;		SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;

/// Maps a value!to the proposed vectorizable size.		/// Maps a value to the proposed vectorizable size.
SmallDenseMap<Value *, unsigned> InstrElementSize;		SmallDenseMap<Value *, unsigned> InstrElementSize;

/// A list of scalars that we found that we need to keep as scalars.		/// A list of scalars that we found that we need to keep as scalars.
ValueSet MustGather;		ValueSet MustGather;

		/// List of gather nodes, depending on other gather/vector nodes, which should
		/// be emitted after the vector instruction emission process to correctly
		/// handle order of the vector instructions and shuffles.
		SetVector<const TreeEntry *> PostponedGathers;

/// This POD struct describes one external user in the vectorized tree.		/// This POD struct describes one external user in the vectorized tree.
struct ExternalUser {		struct ExternalUser {
ExternalUser(Value S, llvm::User U, int L)		ExternalUser(Value S, llvm::User U, int L)
: Scalar(S), User(U), Lane(L) {}		: Scalar(S), User(U), Lane(L) {}

// Which scalar in our function.		// Which scalar in our function.
Value *Scalar;		Value *Scalar;

▲ Show 20 Lines • Show All 1,305 Lines • ▼ Show 20 Lines	void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
SmallVector<int> ReuseShuffleIndicies;		SmallVector<int> ReuseShuffleIndicies;
SmallVector<Value *> UniqueValues;		SmallVector<Value *> UniqueValues;
auto &&TryToFindDuplicates = [&VL, &ReuseShuffleIndicies, &UniqueValues,		auto &&TryToFindDuplicates = [&VL, &ReuseShuffleIndicies, &UniqueValues,
&UserTreeIdx,		&UserTreeIdx,
this](const InstructionsState &S) {		this](const InstructionsState &S) {
// Check that every instruction appears once in this bundle.		// Check that every instruction appears once in this bundle.
DenseMap<Value *, unsigned> UniquePositions;		DenseMap<Value *, unsigned> UniquePositions;
for (Value *V : VL) {		for (Value *V : VL) {
		if (isConstant(V)) {
		ReuseShuffleIndicies.emplace_back(
		isa<UndefValue>(V) ? UndefMaskElem : UniqueValues.size());
		UniqueValues.emplace_back(V);
		continue;
		}
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		auto Res = UniquePositions.try_emplace(V, UniqueValues.size());
ReuseShuffleIndicies.emplace_back(isa<UndefValue>(V) ? -1		ReuseShuffleIndicies.emplace_back(Res.first->second);
: Res.first->second);
if (Res.second)		if (Res.second)
UniqueValues.emplace_back(V);		UniqueValues.emplace_back(V);
}		}
size_t NumUniqueScalarValues = UniqueValues.size();		size_t NumUniqueScalarValues = UniqueValues.size();
if (NumUniqueScalarValues == VL.size()) {		if (NumUniqueScalarValues == VL.size()) {
ReuseShuffleIndicies.clear();		ReuseShuffleIndicies.clear();
} else {		} else {
LLVM_DEBUG(dbgs() << "SLP: Shuffle for reused scalars.\n");		LLVM_DEBUG(dbgs() << "SLP: Shuffle for reused scalars.\n");
▲ Show 20 Lines • Show All 943 Lines • ▼ Show 20 Lines	for (auto *V : VL) {
++Idx;		++Idx;

// Reached the start of a new vector registers.		// Reached the start of a new vector registers.
if (Idx % EltsPerVector == 0) {		if (Idx % EltsPerVector == 0) {
AllConsecutive = true;		AllConsecutive = true;
continue;		continue;
}		}

		// Check if the scalar is used in the gather.
		if (Mask[Idx] == UndefMaskElem)
		continue;

// Check all extracts for a vector register on the target directly		// Check all extracts for a vector register on the target directly
// extract values in order.		// extract values in order.
unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));		unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));
		if (Mask[Idx - 1] != UndefMaskElem) {
unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));		unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));
AllConsecutive &= PrevIdx + 1 == CurrentIdx &&		AllConsecutive &= PrevIdx + 1 == CurrentIdx &&
CurrentIdx % EltsPerVector == Idx % EltsPerVector;		CurrentIdx % EltsPerVector == Idx % EltsPerVector;
		}

if (AllConsecutive)		if (AllConsecutive)
continue;		continue;

// Skip all indices, except for the last index per vector block.		// Skip all indices, except for the last index per vector block.
if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())		if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())
continue;		continue;

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
if (MinBWs.count(VL[0]))		if (MinBWs.count(VL[0]))
VecTy = FixedVectorType::get(		VecTy = FixedVectorType::get(
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());		IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());
unsigned EntryVF = E->getVectorFactor();		unsigned EntryVF = E->getVectorFactor();
auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);		auto *FinalVecTy = FixedVectorType::get(VecTy->getElementType(), EntryVF);

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
// FIXME: it tries to fix a problem with MSVC buildbots.		// FIXME: it tries to fix a problem with MSVC buildbots.
TargetTransformInfo &TTIRef = *TTI;		TargetTransformInfo &TTIRef = *TTI;
		RKSimonUnsubmitted Not Done Reply Inline Actions Any chance that we can use ShuffleVectorInst::isIdentityMask ? RKSimon: Any chance that we can use ShuffleVectorInst::isIdentityMask ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, will do it later ABataev: Sure, will do it later
auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,		auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,
VectorizedVals](InstructionCost &Cost,		VectorizedVals](InstructionCost &Cost,
bool IsGather) {		bool IsGather,
		ArrayRef<int> Mask = None) {
DenseMap<Value *, int> ExtractVectorsTys;		DenseMap<Value *, int> ExtractVectorsTys;
		int Idx = -1;
for (auto *V : VL) {		for (auto *V : VL) {
if (isa<UndefValue>(V))		++Idx;
		// Ignore non-extractelement scalars.
		if (isa<UndefValue>(V) \|\| (!Mask.empty() && Mask[Idx] == UndefMaskElem))
continue;		continue;
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals))		if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|
		(IsGather && ScalarToTreeEntry.count(V)))
continue;		continue;
auto *EE = cast<ExtractElementInst>(V);		auto *EE = cast<ExtractElementInst>(V);
Optional<unsigned> EEIdx = getExtractIndex(EE);		Optional<unsigned> EEIdx = getExtractIndex(EE);
if (!EEIdx)		if (!EEIdx)
continue;		continue;
unsigned Idx = *EEIdx;		unsigned Idx = *EEIdx;
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
if (TTIRef.getNumberOfParts(VecTy) !=		if (TTIRef.getNumberOfParts(VecTy) !=
TTIRef.getNumberOfParts(EE->getVectorOperandType())) {		TTIRef.getNumberOfParts(EE->getVectorOperandType())) {
auto It =		auto It =
ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;		ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;
It->getSecond() = std::min<int>(It->second, Idx);		It->getSecond() = std::min<int>(It->second, Idx);
}		}
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EE->hasOneUse()) {		if (EE->hasOneUse()) {
Show All 18 Lines	auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,
}		}
// Add a cost for subvector extracts/inserts if required.		// Add a cost for subvector extracts/inserts if required.
for (const auto &Data : ExtractVectorsTys) {		for (const auto &Data : ExtractVectorsTys) {
auto *EEVTy = cast<FixedVectorType>(Data.first->getType());		auto *EEVTy = cast<FixedVectorType>(Data.first->getType());
unsigned NumElts = VecTy->getNumElements();		unsigned NumElts = VecTy->getNumElements();
if (Data.second % NumElts == 0)		if (Data.second % NumElts == 0)
continue;		continue;
if (TTIRef.getNumberOfParts(EEVTy) > TTIRef.getNumberOfParts(VecTy)) {		if (TTIRef.getNumberOfParts(EEVTy) > TTIRef.getNumberOfParts(VecTy)) {
unsigned Idx = (Data.second / NumElts) * NumElts;		unsigned Idx = (Data.second / NumElts) * NumElts;
		RKSimonUnsubmitted Not Done Reply Inline Actions Wshadow warning vs Idx @ Line 4688? RKSimon: Wshadow warning vs Idx @ Line 4688?
unsigned EENumElts = EEVTy->getNumElements();		unsigned EENumElts = EEVTy->getNumElements();
		// FIXME: Remove this check after correct support for
		// SK_ExtractSubvector is landed for all targets.
		RKSimonUnsubmitted Not Done Reply Inline Actions What targets are we still missing support for? RKSimon: What targets are we still missing support for?
		ABataevAuthorUnsubmitted Done Reply Inline Actions AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts. ABataev: AArch64, in many cases switches to the default cost bunch of extracts + bunch of inserts.
		if (Idx % NumElts == 0)
		continue;
if (Idx + NumElts <= EENumElts) {		if (Idx + NumElts <= EENumElts) {
Cost +=		Cost +=
TTIRef.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,		TTIRef.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, None, Idx, VecTy);		EEVTy, None, Idx, VecTy);
} else {		} else {
// Need to round up the subvector type vectorization factor to avoid a		// Need to round up the subvector type vectorization factor to avoid a
// crash in cost model functions. Make SubVT so that Idx + VF of SubVT		// crash in cost model functions. Make SubVT so that Idx + VF of SubVT
// <= EENumElts.		// <= EENumElts.
Show All 9 Lines	for (const auto &Data : ExtractVectorsTys) {
}		}
}		}
};		};
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))		if (allConstant(VL))
return 0;		return 0;
if (isa<InsertElementInst>(VL[0]))		if (isa<InsertElementInst>(VL[0]))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();
SmallVector<int> Mask;
SmallVector<const TreeEntry *> Entries;
Optional<TargetTransformInfo::ShuffleKind> Shuffle =
isGatherShuffledEntry(E, Mask, Entries);
if (Shuffle.hasValue()) {
InstructionCost GatherCost = 0;		InstructionCost GatherCost = 0;
if (ShuffleVectorInst::isIdentityMask(Mask)) {		SmallVector<Value *> Gathers(VL.begin(), VL.end());
// Perfect match in the graph, will reuse the previously vectorized		BoUpSLP::ValueSet VectorizedLoads;
// node. Cost is 0.
LLVM_DEBUG(
dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "
<< *VL.front() << ".\n");
if (NeedToShuffleReuses)
GatherCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);
} else {
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
<< " entries for bundle that starts with "
<< *VL.front() << ".\n");
// Detected that instead of gather we can emit a shuffle of single/two
// previously vectorized nodes. Add the cost of the permutation rather
// than gather.
::addMask(Mask, E->ReuseShuffleIndices);
GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);
}
return GatherCost;
}
if (isSplat(VL)) {
// Found the broadcasting of the single scalar, calculate the cost as the
// broadcast.
return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
}
if ((E->getOpcode() == Instruction::ExtractElement \|\|
all_of(E->Scalars,
[](Value *V) {
return isa<ExtractElementInst, UndefValue>(V);
})) &&
allSameType(VL)) {
// Check that gather of extractelements can be represented as just a
// shuffle of a single/two vectors the scalars are extracted from.
SmallVector<int> Mask;
Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =
isFixedVectorShuffle(VL, Mask);
if (ShuffleKind.hasValue()) {
// Found the bunch of extractelement instructions that must be gathered
// into a vector and can be represented as a permutation elements in a
// single input vector or of 2 input vectors.
InstructionCost Cost =
computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);
AdjustExtractsCost(Cost, /IsGather=/true);
if (NeedToShuffleReuses)
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);
return Cost;
}
}
InstructionCost ReuseShuffleCost = 0;
if (NeedToShuffleReuses)
ReuseShuffleCost = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
// Improve gather cost for gather of loads, if we can group some of the		// Improve gather cost for gather of loads, if we can group some of the
// loads into vector loads.		// loads into vector loads.
if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&		if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&
!E->isAltShuffle()) {		!E->isAltShuffle() &&
BoUpSLP::ValueSet VectorizedLoads;		!all_of(Gathers, [this](Value *V) { return getTreeEntry(V); }) &&
		!isSplat(Gathers)) {
unsigned StartIdx = 0;		unsigned StartIdx = 0;
unsigned VF = VL.size() / 2;		unsigned VF = VL.size() / 2;
unsigned VectorizedCnt = 0;		unsigned VectorizedCnt = 0;
unsigned ScatterVectorizeCnt = 0;		unsigned ScatterVectorizeCnt = 0;
const unsigned Sz = DL->getTypeSizeInBits(E->getMainOp()->getType());		const unsigned Sz = DL->getTypeSizeInBits(E->getMainOp()->getType());
for (unsigned MinVF = getMinVF(2 * Sz); VF >= MinVF; VF /= 2) {		for (unsigned MinVF = getMinVF(2 * Sz); VF >= MinVF; VF /= 2) {
for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;		for (unsigned Cnt = StartIdx, End = VL.size(); Cnt + VF <= End;
Cnt += VF) {		Cnt += VF) {
Show All 27 Lines	if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&
// Check if the whole array was vectorized already - exit.		// Check if the whole array was vectorized already - exit.
if (StartIdx >= VL.size())		if (StartIdx >= VL.size())
break;		break;
// Found vectorizable parts - exit.		// Found vectorizable parts - exit.
if (!VectorizedLoads.empty())		if (!VectorizedLoads.empty())
break;		break;
}		}
if (!VectorizedLoads.empty()) {		if (!VectorizedLoads.empty()) {
InstructionCost GatherCost = 0;
unsigned NumParts = TTI->getNumberOfParts(VecTy);
bool NeedInsertSubvectorAnalysis =
!NumParts \|\| (VL.size() / VF) > NumParts;
// Get the cost for gathered loads.		// Get the cost for gathered loads.
for (unsigned I = 0, End = VL.size(); I < End; I += VF) {		for (unsigned I = 0, End = VL.size(); I < End; I += VF) {
if (VectorizedLoads.contains(VL[I]))		if (!VectorizedLoads.contains(VL[I]))
continue;		continue;
GatherCost += getGatherCost(VL.slice(I, VF));		// Exclude potentially vectorized loads from list of gathered scalars.
		for (unsigned K = I, End = I + VF; K < End; ++K)
		Gathers[K] = PoisonValue::get(Gathers[K]->getType());
}		}
// The cost for vectorized loads.		// The cost for vectorized loads.
InstructionCost ScalarsCost = 0;		InstructionCost ScalarsCost = 0;
for (Value *V : VectorizedLoads) {		for (Value *V : VectorizedLoads) {
auto *LI = cast<LoadInst>(V);		auto *LI = cast<LoadInst>(V);
ScalarsCost += TTI->getMemoryOpCost(		ScalarsCost += TTI->getMemoryOpCost(
Instruction::Load, LI->getType(), LI->getAlign(),		Instruction::Load, LI->getType(), LI->getAlign(),
LI->getPointerAddressSpace(), CostKind, LI);		LI->getPointerAddressSpace(), CostKind, LI);
}		}
auto *LI = cast<LoadInst>(E->getMainOp());		auto *LI = cast<LoadInst>(E->getMainOp());
auto *LoadTy = FixedVectorType::get(LI->getType(), VF);		auto *LoadTy = FixedVectorType::get(LI->getType(), VF);
Align Alignment = LI->getAlign();		Align Alignment = LI->getAlign();
GatherCost +=		GatherCost +=
VectorizedCnt *		VectorizedCnt *
TTI->getMemoryOpCost(Instruction::Load, LoadTy, Alignment,		TTI->getMemoryOpCost(Instruction::Load, LoadTy, Alignment,
LI->getPointerAddressSpace(), CostKind, LI);		LI->getPointerAddressSpace(), CostKind, LI);
GatherCost += ScatterVectorizeCnt *		GatherCost += ScatterVectorizeCnt *
TTI->getGatherScatterOpCost(		TTI->getGatherScatterOpCost(
Instruction::Load, LoadTy, LI->getPointerOperand(),		Instruction::Load, LoadTy, LI->getPointerOperand(),
/VariableMask=/false, Alignment, CostKind, LI);		/VariableMask=/false, Alignment, CostKind, LI);
if (NeedInsertSubvectorAnalysis) {		// Add the cost for the subvectors shuffling.
// Add the cost for the subvectors insert.		GatherCost += (VectorizedCnt + ScatterVectorizeCnt - 1) *
for (int I = VF, E = VL.size(); I < E; I += VF)		TTI->getShuffleCost(TTI::SK_Select, VecTy);
GatherCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VecTy,		GatherCost -= ScalarsCost;
None, I, LoadTy);
}		}
return ReuseShuffleCost + GatherCost - ScalarsCost;
}		}
		int VF = VL.size();
		SmallVector<int> ExtractMask;
		// Try to gather extractelements, which can be represented as shuffles.
		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
		tryToGatherExtractElements(Gathers, ExtractMask);
		if (Shuffle.hasValue()) {
		// Found the bunch of extractelement instructions that must be gathered
		// into a vector and can be represented as a permutation elements in a
		// single input vector or of 2 input vectors.
		GatherCost = computeExtractCost(VL, VecTy, Shuffle, ExtractMask, TTI);
		AdjustExtractsCost(GatherCost, /IsGather=/true, ExtractMask);
		}
		SmallVector<int> Mask;
		// Adds extract mask to the gather mask and checks if need to use extract
		// mask at all. Maybe, extracts create a perfect diamond match with other
		// vector/gather nodes.
		auto &&AddExtractMask = [&ExtractMask, &Mask, &GatherCost, VF]() {
		if (ExtractMask.empty())
		return;
		bool NoNeedToGatherExtracts = true;
		for (int I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem) {
		Mask[I] = I;
		} else if (ExtractMask[I] != UndefMaskElem) {
		Mask[I] = I + VF;
		NoNeedToGatherExtracts = false;
		}
		}
		// The extract gathers are not used - no need to count them.
		if (NoNeedToGatherExtracts) {
		ExtractMask.clear();
		GatherCost = 0;
		}
		};
		SmallVector<const TreeEntry *> Entries;
		// Check for reused gathered scalars.
		Shuffle = isGatherShuffledEntry(E, Gathers, Mask, Entries);
		if (Shuffle.hasValue()) {
		// Adjust remaining gathered scalars.
		for (int I = 0; I < VF; ++I)
		if (Mask[I] != UndefMaskElem)
		Gathers[I] = PoisonValue::get(Gathers[I]->getType());
		if (any_of(Gathers, [](Value *V) {
		return isConstant(V) && !isa<UndefValue>(V);
		})) {
		if (*Shuffle == TargetTransformInfo::SK_PermuteSingleSrc) {
		for (int I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem)
		Mask[I] += VF;
		else if (!isa<UndefValue>(Gathers[I]) && isConstant(Gathers[I]))
		Mask[I] = I;
		}
		} else {
		GatherCost += TTI->getShuffleCost(*Shuffle, VecTy, Mask);
		for (int I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem)
		Mask[I] = I;
		else if (!isa<UndefValue>(Gathers[I]) && isConstant(Gathers[I]))
		Mask[I] = I + VF;
		}
		}
		// Add a mask for shuffle of extractelement instruction shuffling.
		AddExtractMask();
		if (ExtractMask.empty()) {
		::addMask(Mask, E->ReuseShuffleIndices);
		// Cost of the first shuffle with input constant vector.
		GatherCost +=
		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
		} else {
		// Cost of the first shuffle with input constant vector.
		GatherCost += TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, Mask);
		if (NeedToShuffleReuses)
		GatherCost += TTI->getShuffleCost(
		TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
		}
		} else {
		AddExtractMask();
		if (Entries.size() == 1 && ShuffleVectorInst::isIdentityMask(Mask)) {
		// Perfect match in the graph, will reuse the previously vectorized
		// node. Cost is 0.
		LLVM_DEBUG(dbgs() << "SLP: perfect diamond match for gather bundle "
		"that starts with "
		<< *VL.front() << ".\n");
		if (NeedToShuffleReuses)
		GatherCost +=
		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
		FinalVecTy, E->ReuseShuffleIndices);
		} else {
		LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
		<< " entries for bundle that starts with "
		<< *VL.front() << ".\n");
		// Detected that instead of gather we can emit a shuffle of single/two
		// previously vectorized nodes. Add the cost of the permutation rather
		// than gather.
		if (ExtractMask.empty() && *Shuffle == TTI::SK_PermuteSingleSrc) {
		::addMask(Mask, E->ReuseShuffleIndices);
		// Cost of the first shuffle with input constant vector.
		GatherCost +=
		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
		} else {
		// Cost of the first shuffle with input constant vector.
		GatherCost +=
		TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, Mask);
		if (NeedToShuffleReuses)
		GatherCost += TTI->getShuffleCost(
		TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
}		}
		}
		}
		// Add the cost for final shuffle with vectorized loads.
		if (!VectorizedLoads.empty())
		GatherCost += TTI->getShuffleCost(TTI::SK_Select, VecTy);
		}
		InstructionCost ReuseShuffleCost = 0;
		if (!Shuffle.hasValue() && NeedToShuffleReuses) {
		ReuseShuffleCost = TTI->getShuffleCost(
		TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
		if (!VectorizedLoads.empty())
		GatherCost += ReuseShuffleCost;
		}
		if (Gathers != VL) {
		// Final permute with the vector of scalars.
		if (any_of(Gathers,
		[](Value *V) { return !isa<UndefValue>(V) && isConstant(V); }))
		GatherCost += TTI->getShuffleCost(TTI::SK_Select, VecTy);
		if (all_of(Gathers, isConstant))
		return GatherCost;
		}
		if (isSplat(Gathers) && (Gathers == VL \|\| VL.size() > 2)) {
		// Found the broadcasting of the single scalar, calculate the cost as the
		// broadcast.
		return GatherCost +
		(Gathers == VL ? 0
		: TTI->getShuffleCost(
		TargetTransformInfo::SK_Select, VecTy)) +
		TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
		}
		if (Gathers != VL)
		return GatherCost + getGatherCost(Gathers);
return ReuseShuffleCost + getGatherCost(VL);		return ReuseShuffleCost + getGatherCost(VL);
}		}
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!E->ReorderIndices.empty()) {		if (!E->ReorderIndices.empty()) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
if (E->getOpcode() == Instruction::Store) {		if (E->getOpcode() == Instruction::Store) {
// For stores the order is actually a mask.		// For stores the order is actually a mask.
Show All 10 Lines	if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask))
CommonCost =		CommonCost =
TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
assert(E->getOpcode() && allSameType(VL) && allSameBlock(VL) && "Invalid VL");		assert(E->getOpcode() && allSameType(VL) && allSameBlock(VL) && "Invalid VL");
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI:		case Instruction::PHI:
		RKSimonUnsubmitted Not Done Reply Inline Actions auto * RKSimon: auto *
		ABataevAuthorUnsubmitted Done Reply Inline Actions Both these cases are the existing code, just the diff is not quite correct because of the big differences. ABataev: Both these cases are the existing code, just the diff is not quite correct because of the big…
return 0;		return 0;

case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
// The common cost of removal ExtractElement/ExtractValue instructions +		// The common cost of removal ExtractElement/ExtractValue instructions +
// the cost of shuffles, if required to resuffle the original vector.		// the cost of shuffles, if required to resuffle the original vector.
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
unsigned Idx = 0;		unsigned Idx = 0;
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	case Instruction::InsertElement: {
TargetTransformInfo::SK_PermuteSingleSrc,		TargetTransformInfo::SK_PermuteSingleSrc,
FixedVectorType::get(SrcVecTy->getElementType(), Sz));		FixedVectorType::get(SrcVecTy->getElementType(), Sz));
} else if (!IsIdentity) {		} else if (!IsIdentity) {
auto *FirstInsert =		auto *FirstInsert =
cast<Instruction>(find_if(E->Scalars, [E](Value V) {		cast<Instruction>(find_if(E->Scalars, [E](Value V) {
return !is_contained(E->Scalars,		return !is_contained(E->Scalars,
cast<Instruction>(V)->getOperand(0));		cast<Instruction>(V)->getOperand(0));
}));		}));
if (isa<UndefValue>(FirstInsert->getOperand(0))) {		if (isUndefVector(FirstInsert->getOperand(0))) {
Cost += TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, SrcVecTy, Mask);		Cost += TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, SrcVecTy, Mask);
} else {		} else {
SmallVector<int> InsertMask(NumElts);		SmallVector<int> InsertMask(NumElts);
std::iota(InsertMask.begin(), InsertMask.end(), 0);		std::iota(InsertMask.begin(), InsertMask.end(), 0);
for (unsigned I = 0; I < NumElts; I++) {		for (unsigned I = 0; I < NumElts; I++) {
if (Mask[I] != UndefMaskElem)		if (Mask[I] != UndefMaskElem)
InsertMask[Offset + I] = NumElts + I;		InsertMask[Offset + I] = NumElts + I;
}		}
▲ Show 20 Lines • Show All 718 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
if (ViewSLPTree)		if (ViewSLPTree)
ViewGraph(this, "SLP" + F->getName(), false, Str);		ViewGraph(this, "SLP" + F->getName(), false, Str);
#endif		#endif

return Cost;		return Cost;
}		}

Optional<TargetTransformInfo::ShuffleKind>		Optional<TargetTransformInfo::ShuffleKind>
BoUpSLP::isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,		BoUpSLP::isGatherShuffledEntry(const TreeEntry TE, ArrayRef<Value > VL,
		SmallVectorImpl<int> &Mask,
SmallVectorImpl<const TreeEntry *> &Entries) {		SmallVectorImpl<const TreeEntry *> &Entries) {
		Entries.clear();
		// No need to check for the topmost gather node.
		if (TE == VectorizableTree.front().get())
		return None;
		Mask.assign(VL.size(), UndefMaskElem);
		assert(TE->UserTreeIndices.size() == 1 &&
		"Expected only single user of the gather node.");
// TODO: currently checking only for Scalars in the tree entry, need to count		// TODO: currently checking only for Scalars in the tree entry, need to count
// reused elements too for better cost estimation.		// reused elements too for better cost estimation.
Mask.assign(TE->Scalars.size(), UndefMaskElem);		Instruction &UserInst =
Entries.clear();		getLastInstructionInBundle(TE->UserTreeIndices.front().UserTE);
		auto *PHI = dyn_cast<PHINode>(&UserInst);
		auto *NodeUI = DT->getNode(
		PHI ? PHI->getIncomingBlock(TE->UserTreeIndices.front().EdgeIdx)
		: UserInst.getParent());
		assert(NodeUI && "Should only process reachable instructions");
		SmallPtrSet<Value *, 4> GatheredScalars(VL.begin(), VL.end());
// Build a lists of values to tree entries.		// Build a lists of values to tree entries.
DenseMap<Value , SmallPtrSet<const TreeEntry , 4>> ValueToTEs;		DenseMap<Value , SmallPtrSet<const TreeEntry , 4>> ValueToTEs;
for (const std::unique_ptr<TreeEntry> &EntryPtr : VectorizableTree) {		for (const std::unique_ptr<TreeEntry> &EntryPtr : VectorizableTree) {
if (EntryPtr.get() == TE)		if (EntryPtr.get() == TE)
break;		continue;
if (EntryPtr->State != TreeEntry::NeedToGather)		if (EntryPtr->State != TreeEntry::NeedToGather)
continue;		continue;
		if (!any_of(EntryPtr->Scalars, [&GatheredScalars](Value *V) {
		return GatheredScalars.contains(V);
		}))
		continue;
		assert(EntryPtr->UserTreeIndices.size() == 1 &&
		"Expected only single user of the gather node.");
		Instruction &EntryUserInst = getLastInstructionInBundle(
		EntryPtr.get()->UserTreeIndices.front().UserTE);
		if (&UserInst == &EntryUserInst) {
		// If 2 gathers are operands of the same entry, compare operands indices,
		// use the earlier one as the base.
		if (TE->UserTreeIndices.front().UserTE ==
		EntryPtr.get()->UserTreeIndices.front().UserTE &&
		TE->UserTreeIndices.front().EdgeIdx <
		EntryPtr.get()->UserTreeIndices.front().EdgeIdx)
		continue;
		}
		// Check if the user node of the TE comes after user node of EntryPtr,
		// otherwise EntryPtr depends on TE.
		auto *EntryPHI = dyn_cast<PHINode>(&EntryUserInst);
		auto *EntryI =
		EntryPHI ? EntryPHI
		->getIncomingBlock(
		EntryPtr.get()->UserTreeIndices.front().EdgeIdx)
		->getTerminator()
		: &EntryUserInst;
		auto *EntryParent = EntryI->getParent();
		auto *NodeEUI = DT->getNode(EntryParent);
		if (!NodeEUI)
		continue;
		assert((NodeUI == NodeEUI) ==
		(NodeUI->getDFSNumIn() == NodeEUI->getDFSNumIn()) &&
		"Different nodes should have different DFS numbers");
		// Check the order of the gather nodes users.
		if (UserInst.getParent() != EntryParent &&
		(DT->dominates(NodeUI, NodeEUI) \|\| !DT->dominates(NodeEUI, NodeUI)))
		continue;
		if (UserInst.getParent() == EntryParent && UserInst.comesBefore(EntryI))
		continue;
for (Value *V : EntryPtr->Scalars)		for (Value *V : EntryPtr->Scalars)
		if (!isConstant(V))
ValueToTEs.try_emplace(V).first->getSecond().insert(EntryPtr.get());		ValueToTEs.try_emplace(V).first->getSecond().insert(EntryPtr.get());
}		}
// Find all tree entries used by the gathered values. If no common entries		// Find all tree entries used by the gathered values. If no common entries
// found - not a shuffle.		// found - not a shuffle.
// Here we build a set of tree nodes for each gathered value and trying to		// Here we build a set of tree nodes for each gathered value and trying to
// find the intersection between these sets. If we have at least one common		// find the intersection between these sets. If we have at least one common
// tree node for each gathered value - we have just a permutation of the		// tree node for each gathered value - we have just a permutation of the
// single vector. If we have 2 different sets, we're in situation where we		// single vector. If we have 2 different sets, we're in situation where we
// have a permutation of 2 input vectors.		// have a permutation of 2 input vectors.
SmallVector<SmallPtrSet<const TreeEntry *, 4>> UsedTEs;		SmallVector<SmallPtrSet<const TreeEntry *, 4>> UsedTEs;
DenseMap<Value *, int> UsedValuesEntry;		DenseMap<Value *, int> UsedValuesEntry;
for (Value *V : TE->Scalars) {		for (Value *V : VL) {
if (isa<UndefValue>(V))		if (isConstant(V))
continue;		continue;
// Build a list of tree entries where V is used.		// Build a list of tree entries where V is used.
SmallPtrSet<const TreeEntry *, 4> VToTEs;		SmallPtrSet<const TreeEntry *, 4> VToTEs;
auto It = ValueToTEs.find(V);		auto It = ValueToTEs.find(V);
if (It != ValueToTEs.end())		if (It != ValueToTEs.end())
VToTEs = It->second;		VToTEs = It->second;
if (const TreeEntry *VTE = getTreeEntry(V))		if (const TreeEntry *VTE = getTreeEntry(V))
VToTEs.insert(VTE);		VToTEs.insert(VTE);
if (VToTEs.empty())		if (VToTEs.empty())
return None;		continue;
if (UsedTEs.empty()) {		if (UsedTEs.empty()) {
// The first iteration, just insert the list of nodes to vector.		// The first iteration, just insert the list of nodes to vector.
UsedTEs.push_back(VToTEs);		UsedTEs.push_back(VToTEs);
		UsedValuesEntry.try_emplace(V, 0);
} else {		} else {
// Need to check if there are any previously used tree nodes which use V.		// Need to check if there are any previously used tree nodes which use V.
// If there are no such nodes, consider that we have another one input		// If there are no such nodes, consider that we have another one input
// vector.		// vector.
SmallPtrSet<const TreeEntry *, 4> SavedVToTEs(VToTEs);		SmallPtrSet<const TreeEntry *, 4> SavedVToTEs(VToTEs);
unsigned Idx = 0;		unsigned Idx = 0;
for (SmallPtrSet<const TreeEntry *, 4> &Set : UsedTEs) {		for (SmallPtrSet<const TreeEntry *, 4> &Set : UsedTEs) {
// Do we have a non-empty intersection of previously listed tree entries		// Do we have a non-empty intersection of previously listed tree entries
// and tree entries using current V?		// and tree entries using current V?
set_intersect(VToTEs, Set);		set_intersect(VToTEs, Set);
if (!VToTEs.empty()) {		if (!VToTEs.empty()) {
// Yes, write the new subset and continue analysis for the next		// Yes, write the new subset and continue analysis for the next
// scalar.		// scalar.
Set.swap(VToTEs);		Set.swap(VToTEs);
break;		break;
}		}
VToTEs = SavedVToTEs;		VToTEs = SavedVToTEs;
++Idx;		++Idx;
}		}
// No non-empty intersection found - need to add a second set of possible		// No non-empty intersection found - need to add a second set of possible
// source vectors.		// source vectors.
if (Idx == UsedTEs.size()) {		if (Idx == UsedTEs.size()) {
// If the number of input vectors is greater than 2 - not a permutation,		// If the number of input vectors is greater than 2 - not a permutation,
// fallback to the regular gather.		// fallback to the regular gather.
		// TODO: support multiple reshuffled nodes.
if (UsedTEs.size() == 2)		if (UsedTEs.size() == 2)
return None;		continue;
UsedTEs.push_back(SavedVToTEs);		UsedTEs.push_back(SavedVToTEs);
Idx = UsedTEs.size() - 1;		Idx = UsedTEs.size() - 1;
}		}
UsedValuesEntry.try_emplace(V, Idx);		UsedValuesEntry.try_emplace(V, Idx);
}		}
}		}

		// Return if no reused scalars found.
		if (UsedTEs.empty())
		return None;

unsigned VF = 0;		unsigned VF = 0;
if (UsedTEs.size() == 1) {		if (UsedTEs.size() == 1) {
// Try to find the perfect match in another gather node at first.		// Try to find the perfect match in another gather node at first.
auto It = find_if(UsedTEs.front(), [TE](const TreeEntry *EntryPtr) {		auto It = find_if(UsedTEs.front(), [VL, TE](const TreeEntry *EntryPtr) {
return EntryPtr->isSame(TE->Scalars);		return EntryPtr->isSame(VL) \|\| EntryPtr->isSame(TE->Scalars);
});		});
if (It != UsedTEs.front().end()) {		if (It != UsedTEs.front().end()) {
Entries.push_back(*It);		Entries.push_back(*It);
std::iota(Mask.begin(), Mask.end(), 0);		std::iota(Mask.begin(), Mask.end(), 0);
		// Clear undef scalars.
		for (int I = 0, Sz = VL.size(); I < Sz; ++I)
		if (isa<UndefValue>(TE->Scalars[I]))
		Mask[I] = UndefMaskElem;
return TargetTransformInfo::SK_PermuteSingleSrc;		return TargetTransformInfo::SK_PermuteSingleSrc;
}		}
// No perfect match, just shuffle, so choose the first tree node.		// No perfect match, just shuffle, so choose the first tree node from the
Entries.push_back(*UsedTEs.front().begin());		// tree.
		Entries.push_back(
		*std::min_element(UsedTEs.front().begin(), UsedTEs.front().end(),
		[](const TreeEntry TE1, const TreeEntry TE2) {
		return TE1->Idx < TE2->Idx;
		}));
} else {		} else {
// Try to find nodes with the same vector factor.		// Try to find nodes with the same vector factor.
assert(UsedTEs.size() == 2 && "Expected at max 2 permuted entries.");		assert(UsedTEs.size() == 2 && "Expected at max 2 permuted entries.");
		// Keep the order of tree nodes to avoid non-determinism.
DenseMap<int, const TreeEntry *> VFToTE;		DenseMap<int, const TreeEntry *> VFToTE;
for (const TreeEntry *TE : UsedTEs.front())		for (const TreeEntry *TE : UsedTEs.front()) {
VFToTE.try_emplace(TE->getVectorFactor(), TE);		unsigned VF = TE->getVectorFactor();
for (const TreeEntry *TE : UsedTEs.back()) {		auto It = VFToTE.find(VF);
		if (It != VFToTE.end()) {
		if (It->second->Idx > TE->Idx)
		It->getSecond() = TE;
		continue;
		}
		VFToTE.try_emplace(VF, TE);
		}
		// Same, keep the order to avoid non-determinism.
		SmallVector<const TreeEntry *> SecondEntries(UsedTEs.back().begin(),
		UsedTEs.back().end());
		sort(SecondEntries, [](const TreeEntry TE1, const TreeEntry TE2) {
		return TE1->Idx < TE2->Idx;
		});
		for (const TreeEntry *TE : SecondEntries) {
auto It = VFToTE.find(TE->getVectorFactor());		auto It = VFToTE.find(TE->getVectorFactor());
if (It != VFToTE.end()) {		if (It != VFToTE.end()) {
VF = It->first;		VF = It->first;
Entries.push_back(It->second);		Entries.push_back(It->second);
Entries.push_back(TE);		Entries.push_back(TE);
break;		break;
}		}
}		}
// No 2 source vectors with the same vector factor - give up and do regular		// No 2 source vectors with the same vector factor - give up and do regular
// gather.		// gather.
if (Entries.empty())		if (Entries.empty())
return None;		return None;
}		}

// Build a shuffle mask for better cost estimation and vector emission.		Value *SingleV = nullptr;
for (int I = 0, E = TE->Scalars.size(); I < E; ++I) {		bool IsSplat = all_of(VL, [&SingleV](Value *V) {
Value *V = TE->Scalars[I];		if (!isa<UndefValue>(V)) {
if (isa<UndefValue>(V))		if (!SingleV)
		SingleV = V;
		return SingleV == V;
		};
		return true;
		});
		// CHecks if the 2 PHIs are compatible in terms of high possibility to be
		// vectorized.
		auto &&AreCompatiblePHIs = [](Value V, Value V1) {
		auto *PHI = cast<PHINode>(V);
		auto *PHI1 = cast<PHINode>(V1);
		// Check that all incoming values are compatible/from same parent (if they
		// are instructions).
		for (int I = 0, E = PHI->getNumIncomingValues(); I < E; ++I) {
		Value *In = PHI->getIncomingValue(I);
		Value *In1 = PHI1->getIncomingValue(I);
		if (isConstant(In) && isConstant(In1))
continue;		continue;
unsigned Idx = UsedValuesEntry.lookup(V);		if (!getSameOpcode({In, In1}).getOpcode())
const TreeEntry *VTE = Entries[Idx];		return false;
int FoundLane = VTE->findLaneForValue(V);		if (cast<Instruction>(In)->getParent() !=
Mask[I] = Idx * VF + FoundLane;		cast<Instruction>(In1)->getParent())
// Extra check required by isSingleSourceMaskImpl function (called by		return false;
// ShuffleVectorInst::isSingleSourceMask).
if (Mask[I] >= 2 * E)
return None;
}		}
		return true;
		};
		auto &&MightBeIgnored = [this, IsSplat](Value *V) {
		auto *I = dyn_cast<Instruction>(V);
		return I && !IsSplat && !ScalarToTreeEntry.count(I) &&
		!isVectorLikeInstWithConstOps(I) &&
		!areAllUsersVectorized(I, UserIgnoreList) && isSimple(I);
		};
		auto &&NeighborMightBeIgnored = [&MightBeIgnored, &UsedValuesEntry, VL,
		&AreCompatiblePHIs](Value *V, int Idx) {
		Value *V1 = VL[Idx];
		bool UsedInSameVTE = false;
		auto It = UsedValuesEntry.find(V1);
		if (It != UsedValuesEntry.end())
		UsedInSameVTE = It->second == UsedValuesEntry.find(V)->second;
		return V != V1 && MightBeIgnored(V1) && !UsedInSameVTE &&
		getSameOpcode({V, V1}).getOpcode() &&
		cast<Instruction>(V)->getParent() ==
		cast<Instruction>(V1)->getParent() &&
		(!isa<PHINode>(V1) \|\| AreCompatiblePHIs(V, V1));
		};
		// Build a shuffle mask for better cost estimation and vector emission.
		SmallBitVector UsedIdxs(Entries.size());
		SmallVector<std::pair<unsigned, int>> EntryLanes;
		for (int I = 0, E = VL.size(); I < E; ++I) {
		Value *V = VL[I];
		auto It = UsedValuesEntry.find(V);
		if (It == UsedValuesEntry.end())
		continue;
		// Do not try to shuffle scalars, if they are constants, or instructions
		// that can be vectorized as a result of the following vector build
		// vectorization.
		if (isConstant(V) \|\| (MightBeIgnored(V) &&
		((I > 0 && NeighborMightBeIgnored(V, I - 1)) \|\|
		(I != E - 1 && NeighborMightBeIgnored(V, I + 1)))))
		continue;
		unsigned Idx = It->second;
		EntryLanes.emplace_back(Idx, I);
		UsedIdxs.set(Idx);
		}
		SmallVector<const TreeEntry *> TempEntries;
		for (unsigned I = 0, Sz = Entries.size(); I < Sz; ++I) {
		if (!UsedIdxs.test(I))
		continue;
		for (std::pair<unsigned, int> &Pair : EntryLanes)
		if (Pair.first == I)
		Pair.first = TempEntries.size();
		TempEntries.push_back(Entries[I]);
		}
		Entries.swap(TempEntries);
		for (const std::pair<unsigned, int> &Pair : EntryLanes)
		Mask[Pair.second] = Pair.first * VF +
		Entries[Pair.first]->findLaneForValue(VL[Pair.second]);
switch (Entries.size()) {		switch (Entries.size()) {
case 1:		case 1:
return TargetTransformInfo::SK_PermuteSingleSrc;		return TargetTransformInfo::SK_PermuteSingleSrc;
case 2:		case 2:
return TargetTransformInfo::SK_PermuteTwoSrc;		return TargetTransformInfo::SK_PermuteTwoSrc;
default:		default:
		Entries.clear();
break;		break;
}		}
return None;		return None;
}		}

InstructionCost		InstructionCost
BoUpSLP::getGatherCost(FixedVectorType *Ty,		BoUpSLP::getGatherCost(FixedVectorType *Ty,
const DenseSet<unsigned> &ShuffledIndices,		const DenseSet<unsigned> &ShuffledIndices,
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (VL.empty())
return;		return;
VLOperands Ops(VL, DL, SE, R);		VLOperands Ops(VL, DL, SE, R);
// Reorder the operands in place.		// Reorder the operands in place.
Ops.reorder();		Ops.reorder();
Left = Ops.getVL(0);		Left = Ops.getVL(0);
Right = Ops.getVL(1);		Right = Ops.getVL(1);
}		}

void BoUpSLP::setInsertPointAfterBundle(const TreeEntry *E) {		Instruction &BoUpSLP::getLastInstructionInBundle(const TreeEntry *E) {
// Get the basic block this bundle is in. All instructions in the bundle		// Get the basic block this bundle is in. All instructions in the bundle
// should be in this block.		// should be in this block.
auto *Front = E->getMainOp();		auto *Front = E->getMainOp();
auto *BB = Front->getParent();		auto *BB = Front->getParent();
assert(llvm::all_of(E->Scalars, [=](Value *V) -> bool {		assert(llvm::all_of(E->Scalars, [=](Value *V) -> bool {
auto *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;		return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;
}));		}));
Show All 37 Lines	if (!LastInst) {
for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {		for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {
if (Bundle.erase(&I) && E->isOpcodeOrAlt(&I))		if (Bundle.erase(&I) && E->isOpcodeOrAlt(&I))
LastInst = &I;		LastInst = &I;
if (Bundle.empty())		if (Bundle.empty())
break;		break;
}		}
}		}
assert(LastInst && "Failed to find last instruction in bundle");		assert(LastInst && "Failed to find last instruction in bundle");
		return *LastInst;
		}

		void BoUpSLP::setInsertPointAfterBundle(const TreeEntry *E) {
		auto *Front = E->getMainOp();
		auto *BB = Front->getParent();
		Instruction *LastInst = &getLastInstructionInBundle(E);
		assert(LastInst && "Failed to find last instruction in bundle");

// Set the insertion point after the last instruction in the bundle. Set the		// Set the insertion point after the last instruction in the bundle. Set the
// debug location to Front.		// debug location to Front.
Builder.SetInsertPoint(BB, ++LastInst->getIterator());		Builder.SetInsertPoint(BB, ++LastInst->getIterator());
Builder.SetCurrentDebugLocation(Front->getDebugLoc());		Builder.SetCurrentDebugLocation(Front->getDebugLoc());
}		}

Value BoUpSLP::gather(ArrayRef<Value > VL) {		Value BoUpSLP::gather(ArrayRef<Value > VL, Value *Root) {
// List of instructions/lanes from current block and/or the blocks which are		// List of instructions/lanes from current block and/or the blocks which are
// part of the current loop. These instructions will be inserted at the end to		// part of the current loop. These instructions will be inserted at the end to
// make it possible to optimize loops and hoist invariant instructions out of		// make it possible to optimize loops and hoist invariant instructions out of
// the loops body with better chances for success.		// the loops body with better chances for success.
SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;		SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;
SmallSet<int, 4> PostponedIndices;		SmallSet<int, 4> PostponedIndices;
Loop *L = LI->getLoopFor(Builder.GetInsertBlock());		Loop *L = LI->getLoopFor(Builder.GetInsertBlock());
auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {		auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {
Show All 23 Lines	if (TreeEntry *Entry = getTreeEntry(V)) {
unsigned FoundLane = Entry->findLaneForValue(V);		unsigned FoundLane = Entry->findLaneForValue(V);
ExternalUses.emplace_back(V, InsElt, FoundLane);		ExternalUses.emplace_back(V, InsElt, FoundLane);
}		}
return Vec;		return Vec;
};		};
Value *Val0 =		Value *Val0 =
isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];		isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];
FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());		FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());
Value *Vec = PoisonValue::get(VecTy);		Value *Vec = Root ? Root : PoisonValue::get(VecTy);
SmallVector<int> NonConsts;		SmallVector<int> NonConsts;
// Insert constant values at first.		// Insert constant values at first.
for (int I = 0, E = VL.size(); I < E; ++I) {		for (int I = 0, E = VL.size(); I < E; ++I) {
if (PostponedIndices.contains(I))		if (PostponedIndices.contains(I))
continue;		continue;
if (!isConstant(VL[I])) {		if (!isConstant(VL[I])) {
NonConsts.push_back(I);		NonConsts.push_back(I);
continue;		continue;
}		}
		if (Root && isa<UndefValue>(VL[I])) {
		if (isa<PoisonValue>(VL[I]))
		continue;
		if (auto *SV = dyn_cast<ShuffleVectorInst>(Root)) {
		if (SV->getMaskValue(I) == UndefMaskElem)
		continue;
		}
		}
Vec = CreateInsertElement(Vec, VL[I], I);		Vec = CreateInsertElement(Vec, VL[I], I);
}		}
// Insert non-constant values.		// Insert non-constant values.
for (int I : NonConsts)		for (int I : NonConsts)
Vec = CreateInsertElement(Vec, VL[I], I);		Vec = CreateInsertElement(Vec, VL[I], I);
// Append instructions, which are/may be part of the loop, in the end to make		// Append instructions, which are/may be part of the loop, in the end to make
// it possible to hoist non-loop-based instructions.		// it possible to hoist non-loop-based instructions.
for (const std::pair<Value *, unsigned> &Pair : PostponedInsts)		for (const std::pair<Value *, unsigned> &Pair : PostponedInsts)
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	public:

~ShuffleInstructionBuilder() {		~ShuffleInstructionBuilder() {
assert((IsFinalized \|\| Mask.empty()) &&		assert((IsFinalized \|\| Mask.empty()) &&
"Shuffle construction must be finalized.");		"Shuffle construction must be finalized.");
}		}
};		};
} // namespace		} // namespace

Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL) {		Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL, const EdgeInfo &EI) {
unsigned VF = VL.size();		unsigned VF = VL.size();
InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (S.getOpcode()) {		if (S.getOpcode()) {
if (TreeEntry *E = getTreeEntry(S.OpValue))		if (TreeEntry *E = getTreeEntry(S.OpValue))
if (E->isSame(VL)) {		if (E->isSame(VL)) {
Value *V = vectorizeTree(E);		Value *V = vectorizeTree(E);
if (VF != cast<FixedVectorType>(V->getType())->getNumElements()) {		if (VF != cast<FixedVectorType>(V->getType())->getNumElements()) {
if (!E->ReuseShuffleIndices.empty()) {		if (!E->ReuseShuffleIndices.empty()) {
Show All 36 Lines	if (TreeEntry *E = getTreeEntry(S.OpValue))
std::iota(UniformMask.begin(), UniformMask.end(), 0);		std::iota(UniformMask.begin(), UniformMask.end(), 0);
V = Builder.CreateShuffleVector(V, UniformMask, "shrink.shuffle");		V = Builder.CreateShuffleVector(V, UniformMask, "shrink.shuffle");
}		}
}		}
return V;		return V;
}		}
}		}

		// Find the corresponding gather entry and vectorize it.
		// Allows to be more accurate with tree/graph transformations, checks for the
		// correctness of the transformations in many cases.
		auto *I =
		find_if(VectorizableTree, [EI](const std::unique_ptr<TreeEntry> &TE) {
		return TE->State == TreeEntry::NeedToGather &&
		TE->UserTreeIndices.front().EdgeIdx == EI.EdgeIdx &&
		TE->UserTreeIndices.front().UserTE == EI.UserTE;
		});
		assert(I != VectorizableTree.end() && "Gather node is not in the graph.");
		assert(I->get()->UserTreeIndices.size() == 1 &&
		"Expected only single user for the gather node.");
		assert(I->get()->isSame(VL) && "Expected same list of scalars.");
		return vectorizeTree(I->get());
		}

		namespace {
		/// Merges shuffle masks and emits final shuffle instruction, if required, for
		/// gathered nodes. This is similar to ShuffleInstructionBuilder but supports
		/// shuffling of 2 input vectors. It implements lazy shuffles emission, when the
		/// actual shuffle instruction is generated only this is actually required.
		/// Otherwise, the shuffle instruction emission is delayed till the end of the
		/// process, to reduce the number of emitted instructions and further
		/// analysis/transformations.
		/// TODO: Investigate if these 2 classes might be merged.
		class ShuffleGatherBuilder {
		bool IsFinalized = false;
		SmallVector<int> CommonMask;
		SmallVector<Value *, 2> InVectors;
		function_ref<Value (Value , Value *, ArrayRef<int>)> CreateShuffle;

		public:
		ShuffleGatherBuilder(
		function_ref<Value (Value , Value *, ArrayRef<int>)> CreateShuffle)
		: CreateShuffle(CreateShuffle) {}
		/// Adds 2 input vectors and the mask for their shuffling.
		void add(Value V1, Value V2, ArrayRef<int> Mask) {
		assert(V1 && V2 && !Mask.empty() && "Expected non-empty input vectors.");
		if (InVectors.empty()) {
		InVectors.push_back(V1);
		InVectors.push_back(V2);
		CommonMask.assign(Mask.begin(), Mask.end());
		return;
		}
		Value *Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Vec = CreateShuffle(Vec, InVectors.back(), CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		} else if (cast<FixedVectorType>(Vec->getType())->getNumElements() !=
		Mask.size()) {
		Vec = CreateShuffle(Vec, nullptr, CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		}
		V1 = CreateShuffle(V1, V2, Mask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx + Sz;
		InVectors.front() = Vec;
		if (InVectors.size() == 2)
		InVectors.back() = V1;
		else
		InVectors.push_back(V1);
		}
		/// Adds another one input vector and the mask for the shuffling.
		void add(Value *V1, ArrayRef<int> Mask) {
		if (InVectors.empty()) {
		if (!isa<FixedVectorType>(V1->getType())) {
		V1 = CreateShuffle(V1, nullptr, CommonMask);
		CommonMask.assign(Mask.size(), UndefMaskElem);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		return;
		}
		InVectors.push_back(V1);
		CommonMask.assign(Mask.begin(), Mask.end());
		return;
		}
		const auto *It = find(InVectors, V1);
		if (It == InVectors.end()) {
		if (InVectors.size() == 2 \|\|
		InVectors.front()->getType() != V1->getType() \|\|
		!isa<FixedVectorType>(V1->getType())) {
		Value *V = InVectors.front();
		if (InVectors.size() == 2) {
		V = CreateShuffle(InVectors.front(), InVectors.back(), CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		} else if (cast<FixedVectorType>(V->getType())->getNumElements() !=
		CommonMask.size()) {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - CommonMask.size()) { + CommonMask.size()) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - CommonMask.size()) { +…
		V = CreateShuffle(InVectors.front(), nullptr, CommonMask);
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] == UndefMaskElem && Mask[Idx] != UndefMaskElem)
		CommonMask[Idx] =
		V->getType() != V1->getType()
		? Idx + Sz
		: Mask[Idx] + cast<FixedVectorType>(V1->getType())
		->getNumElements();
		if (V->getType() != V1->getType())
		V1 = CreateShuffle(V1, nullptr, Mask);
		InVectors.front() = V;
		if (InVectors.size() == 2)
		InVectors.back() = V1;
		else
		InVectors.push_back(V1);
		return;
		}
		// Check if second vector is required if the used elements are already
		// used from the first one.
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem && CommonMask[Idx] == UndefMaskElem) {
		InVectors.push_back(V1);
		break;
		}
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (Mask[Idx] != UndefMaskElem && CommonMask[Idx] == UndefMaskElem)
		CommonMask[Idx] = Mask[Idx] + (It == InVectors.begin() ? 0 : Sz);
		}
		/// Finalize emission of the shuffles.
		Value *
		finalize(ArrayRef<int> ExtMask,
		function_ref<void(Value *&, SmallVectorImpl<int> &)> Action = {}) {
		IsFinalized = true;
		if (Action) {
		Value *Vec = InVectors.front();
		if (InVectors.size() == 2) {
		Vec = CreateShuffle(Vec, InVectors.back(), CommonMask);
		InVectors.pop_back();
		} else {
		Vec = CreateShuffle(Vec, nullptr, CommonMask);
		}
		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
		if (CommonMask[Idx] != UndefMaskElem)
		CommonMask[Idx] = Idx;
		Action(Vec, CommonMask);
		InVectors.front() = Vec;
		}
		if (!ExtMask.empty()) {
		SmallVector<int> NewMask(ExtMask.size(), UndefMaskElem);
		for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {
		if (ExtMask[I] == UndefMaskElem)
		continue;
		NewMask[I] = CommonMask[ExtMask[I]];
		}
		CommonMask.swap(NewMask);
		}
		if (InVectors.size() == 2)
		return CreateShuffle(InVectors.front(), InVectors.back(), CommonMask);
		return CreateShuffle(InVectors.front(), nullptr, CommonMask);
		}

		~ShuffleGatherBuilder() {
		assert((IsFinalized \|\| CommonMask.empty()) &&
		"Shuffle construction must be finalized.");
		}
		};
		} // namespace

		Value BoUpSLP::vectorizeTree(TreeEntry E) {
		IRBuilder<>::InsertPointGuard Guard(Builder);

		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");
		return E->VectorizedValue;
		}

		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
		unsigned VF = E->getVectorFactor();
		ShuffleInstructionBuilder ShuffleBuilder(Builder, VF);
		if (E->State == TreeEntry::NeedToGather) {
		// Can set insert point safely on for the initial gather node.
		if (E == VectorizableTree.front().get() && E->getMainOp())
		setInsertPointAfterBundle(E);
		else if (!E->UserTreeIndices.empty() &&
		E->UserTreeIndices.front().UserTE->getOpcode() !=
		Instruction::PHI) {
		// Need to adjust insert point to fix possible conflict because of PHI
		// nodes vectorization. They cause cyclic dependencies and the gather node
		// may be generated earlier than the user vector node instruction.
		Builder.SetInsertPoint(
		&getLastInstructionInBundle(E->UserTreeIndices.front().UserTE));
		}
		SmallVector<int> ReuseShuffleIndicies(E->ReuseShuffleIndices.begin(),
		E->ReuseShuffleIndices.end());
		SmallVector<Value *> GatheredScalars(E->Scalars.begin(), E->Scalars.end());
		// Checks if the mask is an identity mask.
		auto &&IsIdentityMask = [](ArrayRef<int> Mask, FixedVectorType *VecTy) {
		int Limit = Mask.size();
		return VecTy->getNumElements() == Mask.size() &&
		all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) &&
		ShuffleVectorInst::isIdentityMask(Mask);
		};
		// Check if the mask is a splat/broadcast mask.
		auto &&IsBroadcastMask = [](ArrayRef<int> Mask) {
		int Limit = Mask.size();
		int Elt = UndefMaskElem;
		return all_of(Mask, [Limit, &Elt](int Idx) {
		if (Idx != UndefMaskElem && Elt == UndefMaskElem)
		Elt = Idx;
		return Idx < Limit && Elt == Idx;
		});
		};
		// Tries to combine 2 different masks into single one.
		auto &&CombineMasks = [](SmallVectorImpl<int> &Mask, ArrayRef<int> ExtMask) {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - auto &&CombineMasks = [](SmallVectorImpl<int> &Mask, ArrayRef<int> ExtMask) { + auto &&CombineMasks = [](SmallVectorImpl<int> &Mask, + ArrayRef<int> ExtMask) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - auto &&CombineMasks = [](SmallVectorImpl<int>…
		SmallVector<int> NewMask(ExtMask.size(), UndefMaskElem);
		for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {
		if (ExtMask[I] == UndefMaskElem)
		continue;
		NewMask[I] = Mask[ExtMask[I]];
		}
		Mask.swap(NewMask);
		};
		// Smart shuffle instruction emission, walks through shuffles trees and
		// tries to find the best matching vector for the actual shuffle
		// instruction.
		auto &&CreateShuffle = [this, &IsIdentityMask, &IsBroadcastMask,
		CombineMasks](Value V1, Value V2,
		ArrayRef<int> Mask) -> Value * {
		assert(V1 && "Expected at least one vector value.");
		if (V2 && !isUndefVector(V2)) {
		Value *Vec = Builder.CreateShuffleVector(V1, V2, Mask);
		if (auto *I = dyn_cast<Instruction>(Vec)) {
		GatherShuffleSeq.insert(I);
		CSEBlocks.insert(I->getParent());
		}
		return Vec;
		}
		if (isa<PoisonValue>(V1))
		return PoisonValue::get(FixedVectorType::get(
		cast<VectorType>(V1->getType())->getElementType(), Mask.size()));
		Value *Op = V1;
		SmallVector<int> CombinedMask(Mask.begin(), Mask.end());
		while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
		// Exit if not a fixed vector type or changing size shuffle.
		if (!isa<FixedVectorType>(SV->getType()))
		break;
		// Exit if the identity or broadcast mask is found.
		if (IsIdentityMask(CombinedMask,
		cast<FixedVectorType>(SV->getType())) \|\|
		IsBroadcastMask(CombinedMask))
		break;
		bool IsOp1Undef = isUndefVector(SV->getOperand(0));
		bool IsOp2Undef = isUndefVector(SV->getOperand(1));
		if (!IsOp1Undef && !IsOp2Undef)
		break;
		SmallVector<int> ShuffleMask(SV->getShuffleMask().begin(),
		SV->getShuffleMask().end());
		CombineMasks(ShuffleMask, CombinedMask);
		CombinedMask.swap(ShuffleMask);
		if (IsOp2Undef)
		Op = SV->getOperand(0);
		else
		Op = SV->getOperand(1);
		}
		if (!isa<FixedVectorType>(Op->getType()) \|\|
		!IsIdentityMask(CombinedMask, cast<FixedVectorType>(Op->getType()))) {
		Value *Vec = Builder.CreateShuffleVector(Op, CombinedMask);
		if (auto *I = dyn_cast<Instruction>(Vec)) {
		GatherShuffleSeq.insert(I);
		CSEBlocks.insert(I->getParent());
		}
		return Vec;
		}
		return Op;
		};
		ShuffleGatherBuilder GatherBuilder(CreateShuffle);
		Value *Vec = nullptr;
		SmallVector<int> Mask;
		SmallVector<int> ExtractMask;
		Optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;
		Optional<TargetTransformInfo::ShuffleKind> GatherShuffle;
		SmallVector<const TreeEntry *> Entries;
		Type *ScalarTy = GatheredScalars.front()->getType();
		if (!all_of(GatheredScalars, UndefValue::classof)) {
		// Check for gathered extracts.
		ExtractShuffle = tryToGatherExtractElements(GatheredScalars, ExtractMask);
		// Need to remove vectorized extracelement instructions.
		for (int I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
		int Idx = ExtractMask[I];
		if (Idx == UndefMaskElem)
		continue;
		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
		// If all users are vectorized - can delete the extractelement itself.
		if (!areAllUsersVectorized(EI, UserIgnoreList))
		continue;
		eraseInstruction(EI);
		}
		// Gather extracts after we check for full matched gathers only.
		GatherShuffle =
		(E->getOpcode() == Instruction::Load && !E->isAltShuffle() &&
		!all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) &&
		!isSplat(E->Scalars) &&
		(E->Scalars == GatheredScalars \|\| GatheredScalars.size() > 2))
		? None
		: isGatherShuffledEntry(E, GatheredScalars, Mask, Entries);
		if (GatherShuffle.hasValue()) {
		if (any_of(Entries,
		[](const TreeEntry *TE) { return !TE->VectorizedValue; })) {
		PostponedGathers.insert(E);
		// Postpone gather emission, will be emitted after the end of the
		// process to keep correct order.
		auto *VecTy = FixedVectorType::get(ScalarTy, VF);
		Value *Vec = Builder.CreateAlignedLoad(
		VecTy, UndefValue::get(VecTy->getPointerTo()), MaybeAlign());
		E->VectorizedValue = Vec;
		nlopesUnsubmitted Not Done Reply Inline Actions Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be switched. Thank you! nlopes: Please use PoisonValue whenever possible. It seems this is just a placeholder, so it can be…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, thanks! ABataev: Sure, thanks!
		return Vec;
		}
		assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
		"Expected shuffle of 1 or 2 entries.");
		// Remove shuffled elements from list of gathers.
		for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
		if (Mask[I] != UndefMaskElem)
		GatheredScalars[I] = PoisonValue::get(ScalarTy);
		}
		if (Entries.size() == 1)
		GatherBuilder.add(Entries.front()->VectorizedValue, Mask);
		else
		GatherBuilder.add(Entries.front()->VectorizedValue,
		Entries.back()->VectorizedValue, Mask);
		} else {
// Check that every instruction appears once in this bundle.		// Check that every instruction appears once in this bundle.
SmallVector<int> ReuseShuffleIndicies;
SmallVector<Value *> UniqueValues;		SmallVector<Value *> UniqueValues;
if (VL.size() > 2) {		if (GatheredScalars.size() > 2) {
DenseMap<Value *, unsigned> UniquePositions;		DenseMap<Value *, unsigned> UniquePositions;
unsigned NumValues =
std::distance(VL.begin(), find_if(reverse(VL), [](Value *V) {
return !isa<UndefValue>(V);
}).base());
VF = std::max<unsigned>(VF, PowerOf2Ceil(NumValues));
int UniqueVals = 0;		int UniqueVals = 0;
for (Value *V : VL.drop_back(VL.size() - VF)) {		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
		Value *V = GatheredScalars[I];
if (isa<UndefValue>(V)) {		if (isa<UndefValue>(V)) {
		if (!NeedToShuffleReuses)
ReuseShuffleIndicies.emplace_back(UndefMaskElem);		ReuseShuffleIndicies.emplace_back(UndefMaskElem);
		UniqueValues.emplace_back(V);
continue;		continue;
}		}
if (isConstant(V)) {		if (isConstant(V)) {
		if (ReuseShuffleIndicies.empty())
ReuseShuffleIndicies.emplace_back(UniqueValues.size());		ReuseShuffleIndicies.emplace_back(UniqueValues.size());
UniqueValues.emplace_back(V);		UniqueValues.emplace_back(V);
continue;		continue;
}		}
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		auto Res = UniquePositions.try_emplace(V, UniqueValues.size());
		if (!NeedToShuffleReuses) {
ReuseShuffleIndicies.emplace_back(Res.first->second);		ReuseShuffleIndicies.emplace_back(Res.first->second);
		} else {
		for (unsigned Idx = 0; Idx < VF; ++Idx)
		if (ReuseShuffleIndicies[Idx] == I)
		ReuseShuffleIndicies[Idx] = Res.first->second;
		}
if (Res.second) {		if (Res.second) {
UniqueValues.emplace_back(V);		UniqueValues.emplace_back(V);
++UniqueVals;		++UniqueVals;
}		}
}		}
		if (!NeedToShuffleReuses) {
if (UniqueVals == 1 && UniqueValues.size() == 1) {		if (UniqueVals == 1 && UniqueValues.size() == 1) {
// Emit pure splat vector.		// Emit pure splat vector.
ReuseShuffleIndicies.append(VF - ReuseShuffleIndicies.size(),		ReuseShuffleIndicies.append(VF - ReuseShuffleIndicies.size(),
UndefMaskElem);		UndefMaskElem);
} else if (UniqueValues.size() >= VF - 1 \|\| UniqueValues.size() <= 1) {		} else if (UniqueValues.size() >= VF - 1 \|\|
		UniqueValues.size() <= 1) {
ReuseShuffleIndicies.clear();		ReuseShuffleIndicies.clear();
UniqueValues.clear();		UniqueValues.swap(GatheredScalars);
UniqueValues.append(VL.begin(), std::next(VL.begin(), NumValues));
}		}
UniqueValues.append(VF - UniqueValues.size(),
PoisonValue::get(VL[0]->getType()));
VL = UniqueValues;
}		}
		UniqueValues.append(VF - UniqueValues.size(),
ShuffleInstructionBuilder ShuffleBuilder(Builder, VF);		PoisonValue::get(ScalarTy));
Value *Vec = gather(VL);		GatheredScalars.swap(UniqueValues);
if (!ReuseShuffleIndicies.empty()) {
ShuffleBuilder.addMask(ReuseShuffleIndicies);
Vec = ShuffleBuilder.finalize(Vec);
if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherShuffleSeq.insert(I);
CSEBlocks.insert(I->getParent());
}		}
}		}
return Vec;
}		}
		// Combine generated extracts mask and reused scalars masks and
Value BoUpSLP::vectorizeTree(TreeEntry E) {		// corresponding input vectors.
IRBuilder<>::InsertPointGuard Guard(Builder);		if (ExtractShuffle.hasValue()) {
		// Gather of extractelements can be represented as just a shuffle of
if (E->VectorizedValue) {		// a single/two vectors the scalars are extracted from.
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		// Find input vectors.
return E->VectorizedValue;		Value *Vec1 = nullptr;
		Value *Vec2 = nullptr;
		for (unsigned I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
		if (ExtractMask[I] == UndefMaskElem \|\|
		(!Mask.empty() && Mask[I] != UndefMaskElem)) {
		ExtractMask[I] = UndefMaskElem;
		continue;
		}
		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
		if (!Vec1) {
		Vec1 = EI->getVectorOperand();
		} else if (Vec1 != EI->getVectorOperand()) {
		assert((!Vec2 \|\| Vec2 == EI->getVectorOperand()) &&
		"Expected only 1 or 2 vectors shuffle.");
		Vec2 = EI->getVectorOperand();
		}
		}
		if (Vec2) {
		GatherBuilder.add(Vec1, Vec2, ExtractMask);
		} else if (Vec1) {
		GatherBuilder.add(Vec1, ExtractMask);
		} else {
		GatherBuilder.add(PoisonValue::get(FixedVectorType::get(
		ScalarTy, GatheredScalars.size())),
		ExtractMask);
		}
		}
		if (ExtractShuffle.hasValue() \|\| GatherShuffle.hasValue()) {
		// Insert non-constant scalars.
		SmallVector<Value *> NonConstants(GatheredScalars);
		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
		if (!isa<Constant>(GatheredScalars[I]))
		GatheredScalars[I] = PoisonValue::get(ScalarTy);
		else
		NonConstants[I] = PoisonValue::get(ScalarTy);
}		}
		// Generate constants for final shuffle.
bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		if (!all_of(GatheredScalars, UndefValue::classof)) {
unsigned VF = E->getVectorFactor();		Mask.assign(GatheredScalars.size(), UndefMaskElem);
ShuffleInstructionBuilder ShuffleBuilder(Builder, VF);		Value *VecVal = gather(GatheredScalars);
if (E->State == TreeEntry::NeedToGather) {		for (int I = 0, Sz = GatheredScalars.size(); I < Sz; ++I) {
if (E->getMainOp())		if (!isa<UndefValue>(GatheredScalars[I]))
setInsertPointAfterBundle(E);		Mask[I] = I;
Value *Vec;		}
SmallVector<int> Mask;		GatherBuilder.add(VecVal, Mask);
SmallVector<const TreeEntry *> Entries;		}
Optional<TargetTransformInfo::ShuffleKind> Shuffle =		// Emit final insertelement instructions for defined values.
isGatherShuffledEntry(E, Mask, Entries);		if (!all_of(NonConstants, Constant::classof))
if (Shuffle.hasValue()) {		Vec = GatherBuilder.finalize(
assert((Entries.size() == 1 \|\| Entries.size() == 2) &&		ReuseShuffleIndicies,
"Expected shuffle of 1 or 2 entries.");		[this, &NonConstants](Value *&Vec, SmallVectorImpl<int> &Mask) {
Vec = Builder.CreateShuffleVector(Entries.front()->VectorizedValue,		Vec = gather(NonConstants, Vec);
Entries.back()->VectorizedValue, Mask);		for (unsigned I = 0, Sz = Mask.size(); I < Sz; ++I)
		if (!isa<Constant>(NonConstants[I]))
		Mask[I] = I;
		});
		else
		Vec = GatherBuilder.finalize(ReuseShuffleIndicies);
} else {		} else {
Vec = gather(E->Scalars);		// Just generate simple gather, no reused scalars/extracts.
}		Vec = gather(GatheredScalars);
if (NeedToShuffleReuses) {		Mask.assign(GatheredScalars.size(), UndefMaskElem);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		for (unsigned Idx = 0, Sz = GatheredScalars.size(); Idx < Sz; ++Idx)
Vec = ShuffleBuilder.finalize(Vec);		if (!isa<UndefValue>(GatheredScalars[Idx]))
if (auto *I = dyn_cast<Instruction>(Vec)) {		Mask[Idx] = Idx;
GatherShuffleSeq.insert(I);		GatherBuilder.add(Vec, Mask);
CSEBlocks.insert(I->getParent());		Vec = GatherBuilder.finalize(ReuseShuffleIndicies);
}
}		}
E->VectorizedValue = Vec;		E->VectorizedValue = Vec;
return Vec;		return Vec;
}		}

assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
Show All 25 Lines	case Instruction::PHI: {
// PHINodes may have multiple entries from the same block. We want to		// PHINodes may have multiple entries from the same block. We want to
// visit every block once.		// visit every block once.
SmallPtrSet<BasicBlock*, 4> VisitedBBs;		SmallPtrSet<BasicBlock*, 4> VisitedBBs;

for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
BasicBlock *IBB = PH->getIncomingBlock(i);		BasicBlock *IBB = PH->getIncomingBlock(i);

		// Stop emission if all incoming values are generated.
		if (NewPhi->getNumIncomingValues() == PH->getNumIncomingValues()) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return V;
		}

if (!VisitedBBs.insert(IBB).second) {		if (!VisitedBBs.insert(IBB).second) {
NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);		NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);
continue;		continue;
}		}

Builder.SetInsertPoint(IBB->getTerminator());		Builder.SetInsertPoint(IBB->getTerminator());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
Value *Vec = vectorizeTree(E->getOperand(i));		Value *Vec = vectorizeTree(E->getOperand(i), EdgeInfo(E, i));
NewPhi->addIncoming(Vec, IBB);		NewPhi->addIncoming(Vec, IBB);
}		}

assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&		assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&
"Invalid number of incoming values");		"Invalid number of incoming values");
return V;		return V;
}		}

Show All 17 Lines	case Instruction::ExtractValue: {
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
NewV = ShuffleBuilder.finalize(NewV);		NewV = ShuffleBuilder.finalize(NewV);
E->VectorizedValue = NewV;		E->VectorizedValue = NewV;
return NewV;		return NewV;
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
assert(E->ReuseShuffleIndices.empty() && "All inserts should be unique");		assert(E->ReuseShuffleIndices.empty() && "All inserts should be unique");
Builder.SetInsertPoint(cast<Instruction>(E->Scalars.back()));		Builder.SetInsertPoint(cast<Instruction>(E->Scalars.back()));
Value *V = vectorizeTree(E->getOperand(1));		Value *V = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));

// Create InsertVector shuffle if necessary		// Create InsertVector shuffle if necessary
auto FirstInsert = cast<Instruction>(find_if(E->Scalars, [E](Value *V) {		auto FirstInsert = cast<Instruction>(find_if(E->Scalars, [E](Value *V) {
return !is_contained(E->Scalars, cast<Instruction>(V)->getOperand(0));		return !is_contained(E->Scalars, cast<Instruction>(V)->getOperand(0));
}));		}));
const unsigned NumElts =		const unsigned NumElts =
cast<FixedVectorType>(FirstInsert->getType())->getNumElements();		cast<FixedVectorType>(FirstInsert->getType())->getNumElements();
const unsigned NumScalars = E->Scalars.size();		const unsigned NumScalars = E->Scalars.size();
Show All 20 Lines	case Instruction::InsertElement: {
if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
continue;		continue;
IsIdentity &= *InsertIdx - Offset == I;		IsIdentity &= *InsertIdx - Offset == I;
Mask[*InsertIdx - Offset] = I;		Mask[*InsertIdx - Offset] = I;
}		}
if (!IsIdentity \|\| NumElts != NumScalars)		if (!IsIdentity \|\| NumElts != NumScalars)
V = Builder.CreateShuffleVector(V, Mask);		V = Builder.CreateShuffleVector(V, Mask);

if ((!IsIdentity \|\| Offset != 0 \|\|		if (Offset != 0 \|\| !isUndefVector(FirstInsert->getOperand(0))) {
!isa<UndefValue>(FirstInsert->getOperand(0))) &&
NumElts != NumScalars) {
SmallVector<int> InsertMask(NumElts);		SmallVector<int> InsertMask(NumElts);
std::iota(InsertMask.begin(), InsertMask.end(), 0);		std::iota(InsertMask.begin(), InsertMask.end(), 0);
for (unsigned I = 0; I < NumElts; I++) {		for (unsigned I = 0; I < NumElts; I++) {
if (Mask[I] != UndefMaskElem)		if (Mask[I] != UndefMaskElem)
InsertMask[Offset + I] = NumElts + I;		InsertMask[Offset + I] = NumElts + I;
}		}

V = Builder.CreateShuffleVector(		V = Builder.CreateShuffleVector(
Show All 14 Lines	switch (ShuffleOrOp) {
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *InVec = vectorizeTree(E->getOperand(0));		Value *InVec = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

auto *CI = cast<CastInst>(VL0);		auto *CI = cast<CastInst>(VL0);
Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);		Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp: {		case Instruction::ICmp: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *L = vectorizeTree(E->getOperand(0));		Value *L = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
Value *R = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *R = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Value *V = Builder.CreateCmp(P0, L, R);		Value *V = Builder.CreateCmp(P0, L, R);
propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, E->Scalars, VL0);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Select: {		case Instruction::Select: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Cond = vectorizeTree(E->getOperand(0));		Value *Cond = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
Value *True = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
Value *False = vectorizeTree(E->getOperand(2));		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *True = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));
		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *False = vectorizeTree(E->getOperand(2), EdgeInfo(E, 2));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateSelect(Cond, True, False);		Value *V = Builder.CreateSelect(Cond, True, False);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op = vectorizeTree(E->getOperand(0));		Value *Op = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateUnOp(		Value *V = Builder.CreateUnOp(
static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);		static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);
Show All 25 Lines	switch (ShuffleOrOp) {
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *LHS = vectorizeTree(E->getOperand(0));		Value *LHS = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
Value *RHS = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		Value *RHS = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateBinOp(		Value *V = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,		static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,
Show All 31 Lines	case Instruction::Load: {
// Find which lane we need to extract.		// Find which lane we need to extract.
unsigned FoundLane = Entry->findLaneForValue(PO);		unsigned FoundLane = Entry->findLaneForValue(PO);
ExternalUses.emplace_back(PO, cast<User>(VecPtr), FoundLane);		ExternalUses.emplace_back(PO, cast<User>(VecPtr), FoundLane);
}		}

NewLI = Builder.CreateAlignedLoad(VecTy, VecPtr, LI->getAlign());		NewLI = Builder.CreateAlignedLoad(VecTy, VecPtr, LI->getAlign());
} else {		} else {
assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");		assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");
Value *VecPtr = vectorizeTree(E->getOperand(0));		Value *VecPtr = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
// Use the minimum alignment of the gathered loads.		// Use the minimum alignment of the gathered loads.
Align CommonAlignment = LI->getAlign();		Align CommonAlignment = LI->getAlign();
for (Value *V : E->Scalars)		for (Value *V : E->Scalars)
CommonAlignment =		CommonAlignment =
commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);		NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);
}		}
Value *V = propagateMetadata(NewLI, E->Scalars);		Value *V = propagateMetadata(NewLI, E->Scalars);

ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Store: {		case Instruction::Store: {
auto *SI = cast<StoreInst>(VL0);		auto *SI = cast<StoreInst>(VL0);
unsigned AS = SI->getPointerAddressSpace();		unsigned AS = SI->getPointerAddressSpace();

setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *VecValue = vectorizeTree(E->getOperand(0));		Value *VecValue = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
ShuffleBuilder.addMask(E->ReorderIndices);		ShuffleBuilder.addMask(E->ReorderIndices);
VecValue = ShuffleBuilder.finalize(VecValue);		VecValue = ShuffleBuilder.finalize(VecValue);

Value *ScalarPtr = SI->getPointerOperand();		Value *ScalarPtr = SI->getPointerOperand();
Value *VecPtr = Builder.CreateBitCast(		Value *VecPtr = Builder.CreateBitCast(
ScalarPtr, VecValue->getType()->getPointerTo(AS));		ScalarPtr, VecValue->getType()->getPointerTo(AS));
StoreInst *ST = Builder.CreateAlignedStore(VecValue, VecPtr,		StoreInst *ST = Builder.CreateAlignedStore(VecValue, VecPtr,
SI->getAlign());		SI->getAlign());
Show All 13 Lines	case Instruction::Store: {
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
auto *GEP0 = cast<GetElementPtrInst>(VL0);		auto *GEP0 = cast<GetElementPtrInst>(VL0);
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op0 = vectorizeTree(E->getOperand(0));		Value *Op0 = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));

SmallVector<Value *> OpVecs;		SmallVector<Value *> OpVecs;
for (int J = 1, N = GEP0->getNumOperands(); J < N; ++J) {		for (int J = 1, N = GEP0->getNumOperands(); J < N; ++J) {
Value *OpVec = vectorizeTree(E->getOperand(J));		Value *OpVec = vectorizeTree(E->getOperand(J), EdgeInfo(E, J));
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Value *V = Builder.CreateGEP(GEP0->getSourceElementType(), Op0, OpVecs);		Value *V = Builder.CreateGEP(GEP0->getSourceElementType(), Op0, OpVecs);
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, E->Scalars);

ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
Show All 31 Lines	case Instruction::Call: {
CallInst *CEI = cast<CallInst>(VL0);		CallInst *CEI = cast<CallInst>(VL0);
ScalarArg = CEI->getArgOperand(j);		ScalarArg = CEI->getArgOperand(j);
OpVecs.push_back(CEI->getArgOperand(j));		OpVecs.push_back(CEI->getArgOperand(j));
if (hasVectorInstrinsicOverloadedScalarOpd(IID, j))		if (hasVectorInstrinsicOverloadedScalarOpd(IID, j))
TysForDecl.push_back(ScalarArg->getType());		TysForDecl.push_back(ScalarArg->getType());
continue;		continue;
}		}

Value *OpVec = vectorizeTree(E->getOperand(j));		Value *OpVec = vectorizeTree(E->getOperand(j), EdgeInfo(E, j));
LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");		LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Function *CF;		Function *CF;
if (!UseIntrinsic) {		if (!UseIntrinsic) {
VFShape Shape =		VFShape Shape =
VFShape::get(*CI, ElementCount::getFixed(static_cast<unsigned>(		VFShape::get(*CI, ElementCount::getFixed(static_cast<unsigned>(
Show All 35 Lines	case Instruction::ShuffleVector: {
Instruction::isBinaryOp(E->getAltOpcode())) \|\|		Instruction::isBinaryOp(E->getAltOpcode())) \|\|
(Instruction::isCast(E->getOpcode()) &&		(Instruction::isCast(E->getOpcode()) &&
Instruction::isCast(E->getAltOpcode()))) &&		Instruction::isCast(E->getAltOpcode()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");

Value LHS = nullptr, RHS = nullptr;		Value LHS = nullptr, RHS = nullptr;
if (Instruction::isBinaryOp(E->getOpcode())) {		if (Instruction::isBinaryOp(E->getOpcode())) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
RHS = vectorizeTree(E->getOperand(1));		if (E->VectorizedValue) {
		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		return E->VectorizedValue;
		}

		RHS = vectorizeTree(E->getOperand(1), EdgeInfo(E, 1));
} else {		} else {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), EdgeInfo(E, 0));
}		}

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value V0, V1;		Value V0, V1;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {		BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {
// All blocks must be scheduled before any instructions are inserted.		// All blocks must be scheduled before any instructions are inserted.
for (auto &BSIter : BlocksSchedules) {		for (auto &BSIter : BlocksSchedules) {
scheduleBlock(BSIter.second.get());		scheduleBlock(BSIter.second.get());
}		}

Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
auto *VectorRoot = vectorizeTree(VectorizableTree[0].get());		auto *VectorRoot = vectorizeTree(VectorizableTree[0].get());
		// Run through the list of postponed gathers and emit them, replacing the temp
		// emitted allocas with actual vector instructions.
		auto PostponedNodes = PostponedGathers.takeVector();
		for (const TreeEntry *E : PostponedNodes) {
		auto TE = const_cast<TreeEntry >(E);
		if (auto *VecTE = getTreeEntry(TE->Scalars.front()))
		if (VecTE->isSame(TE->UserTreeIndices.front().UserTE->getOperand(
		TE->UserTreeIndices.front().EdgeIdx)))
		// Found gather node which is absolutely the same as one of the
		// vectorized nodes. It may happen after reordering.
		continue;
		auto *PrevVec = cast<Instruction>(TE->VectorizedValue);
		TE->VectorizedValue = nullptr;
		auto *UserI =
		cast<Instruction>(TE->UserTreeIndices.front().UserTE->VectorizedValue);
		Builder.SetInsertPoint(PrevVec);
		Builder.SetCurrentDebugLocation(UserI->getDebugLoc());
		Value *Vec = vectorizeTree(TE);
		PrevVec->replaceAllUsesWith(Vec);
		eraseInstruction(PrevVec, /ReplaceOpsWithUndef=/true);
		}

// If the vectorized tree can be rewritten in a smaller type, we truncate the		// If the vectorized tree can be rewritten in a smaller type, we truncate the
// vectorized root. InstCombine will then rewrite the entire expression. We		// vectorized root. InstCombine will then rewrite the entire expression. We
// sign extend the extracted values below.		// sign extend the extracted values below.
auto *ScalarRoot = VectorizableTree[0]->Scalars[0];		auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
if (auto *I = dyn_cast<Instruction>(VectorRoot)) {		if (auto *I = dyn_cast<Instruction>(VectorRoot)) {
// If current instr is a phi and not the last phi, insert it after the		// If current instr is a phi and not the last phi, insert it after the
// last phi node.		// last phi node.
if (isa<PHINode>(I))		if (isa<PHINode>(I))
Builder.SetInsertPoint(&*I->getParent()->getFirstInsertionPt());		Builder.SetInsertPoint(&*I->getParent()->getFirstInsertionPt());
else		else
Builder.SetInsertPoint(&*++BasicBlock::iterator(I));		Builder.SetInsertPoint(&*++BasicBlock::iterator(I));
}		}
auto BundleWidth = VectorizableTree[0]->Scalars.size();		auto BundleWidth = VectorizableTree[0]->Scalars.size();
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);		auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);
auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);		auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);
VectorizableTree[0]->VectorizedValue = Trunc;		VectorizableTree[0]->VectorizedValue = Trunc;
}		}

LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()		LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()
<< " values .\n");		<< " values .\n");

		// Maps extract Scalar to the corresponding extractelement instruction in the
		// basic block. Only one extractelement per block should be emitted.
		DenseMap<Value , DenseMap<BasicBlock , Value *>> ScalarToEEs;
// Extract all of the elements with the external uses.		// Extract all of the elements with the external uses.
for (const auto &ExternalUse : ExternalUses) {		for (const auto &ExternalUse : ExternalUses) {
Value *Scalar = ExternalUse.Scalar;		Value *Scalar = ExternalUse.Scalar;
llvm::User *User = ExternalUse.User;		llvm::User *User = ExternalUse.User;

// Skip users that we already RAUW. This happens when one instruction		// Skip users that we already RAUW. This happens when one instruction
// has multiple uses of the same value.		// has multiple uses of the same value.
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
continue;		continue;
TreeEntry *E = getTreeEntry(Scalar);		TreeEntry *E = getTreeEntry(Scalar);
assert(E && "Invalid scalar");		assert(E && "Invalid scalar");
assert(E->State != TreeEntry::NeedToGather &&		assert(E->State != TreeEntry::NeedToGather &&
"Extracting from a gather list");		"Extracting from a gather list");

Value *Vec = E->VectorizedValue;		Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");		assert(Vec && "Can't find vectorizable value");

Value *Lane = Builder.getInt32(ExternalUse.Lane);		Value *Lane = Builder.getInt32(ExternalUse.Lane);
auto ExtractAndExtendIfNeeded = [&](Value *Vec) {		auto ExtractAndExtendIfNeeded = [&](Value *Vec) {
if (Scalar->getType() != Vec->getType()) {		if (Scalar->getType() != Vec->getType()) {
Value *Ex;		Value *Ex = nullptr;
		auto It = ScalarToEEs.find(Scalar);
		if (It != ScalarToEEs.end()) {
		// No need to emit many extracts, just move the only one in the
		// current block.
		auto EEIt = It->second.find(Builder.GetInsertBlock());
		if (EEIt != It->second.end()) {
		auto *I = cast<Instruction>(EEIt->second);
		if (Builder.GetInsertPoint() != Builder.GetInsertBlock()->end() &&
		Builder.GetInsertPoint()->comesBefore(I))
		I->moveBefore(&*Builder.GetInsertPoint());
		Ex = I;
		}
		}
		if (!Ex) {
// "Reuse" the existing extract to improve final codegen.		// "Reuse" the existing extract to improve final codegen.
if (auto *ES = dyn_cast<ExtractElementInst>(Scalar)) {		if (auto *ES = dyn_cast<ExtractElementInst>(Scalar)) {
Ex = Builder.CreateExtractElement(ES->getOperand(0),		Ex = Builder.CreateExtractElement(ES->getOperand(0),
ES->getOperand(1));		ES->getOperand(1));
} else {		} else {
Ex = Builder.CreateExtractElement(Vec, Lane);		Ex = Builder.CreateExtractElement(Vec, Lane);
}		}
		ScalarToEEs[Scalar].try_emplace(Builder.GetInsertBlock(), Ex);
		}
// If necessary, sign-extend or zero-extend ScalarRoot		// If necessary, sign-extend or zero-extend ScalarRoot
// to the larger type.		// to the larger type.
if (!MinBWs.count(ScalarRoot))		if (!MinBWs.count(ScalarRoot))
return Ex;		return Ex;
if (MinBWs[ScalarRoot].second)		if (MinBWs[ScalarRoot].second)
return Builder.CreateSExt(Ex, Scalar->getType());		return Builder.CreateSExt(Ex, Scalar->getType());
return Builder.CreateZExt(Ex, Scalar->getType());		return Builder.CreateZExt(Ex, Scalar->getType());
}		}
assert(isa<FixedVectorType>(Scalar->getType()) &&		assert(isa<FixedVectorType>(Scalar->getType()) &&
isa<InsertElementInst>(Scalar) &&		isa<InsertElementInst>(Scalar) &&
"In-tree scalar of vector type is not insertelement?");		"In-tree scalar of vector type is not insertelement?");
return Vec;		return Vec;
};		};
// If User == nullptr, the Scalar is used as extra arg. Generate		// If User == nullptr, the Scalar is used as extra arg. Generate
// ExtractElement instruction and update the record for this scalar in		// ExtractElement instruction and update the record for this scalar in
// ExternallyUsedValues.		// ExternallyUsedValues.
if (!User) {		if (!User) {
assert(ExternallyUsedValues.count(Scalar) &&		assert(ExternallyUsedValues.count(Scalar) &&
"Scalar with nullptr as an external user must be registered in "		"Scalar with nullptr as an external user must be registered in "
"ExternallyUsedValues map");		"ExternallyUsedValues map");
if (auto *VecI = dyn_cast<Instruction>(Vec)) {		if (auto *VecI = dyn_cast<Instruction>(Vec)) {
		if (auto *PHI = dyn_cast<PHINode>(VecI))
		Builder.SetInsertPoint(PHI->getParent()->getFirstNonPHI());
		else
Builder.SetInsertPoint(VecI->getParent(),		Builder.SetInsertPoint(VecI->getParent(),
std::next(VecI->getIterator()));		std::next(VecI->getIterator()));
} else {		} else {
Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
}		}
Value *NewInst = ExtractAndExtendIfNeeded(Vec);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
CSEBlocks.insert(cast<Instruction>(Scalar)->getParent());		CSEBlocks.insert(cast<Instruction>(Scalar)->getParent());
auto &NewInstLocs = ExternallyUsedValues[NewInst];		auto &NewInstLocs = ExternallyUsedValues[NewInst];
auto It = ExternallyUsedValues.find(Scalar);		auto It = ExternallyUsedValues.find(Scalar);
assert(It != ExternallyUsedValues.end() &&		assert(It != ExternallyUsedValues.end() &&
▲ Show 20 Lines • Show All 1,760 Lines • ▼ Show 20 Lines	class HorizontalReduction {
/// Total number of operands in the reduction operation.		/// Total number of operands in the reduction operation.
static unsigned getNumberOfOperands(Instruction *I) {		static unsigned getNumberOfOperands(Instruction *I) {
return isCmpSelMinMax(I) ? 3 : 2;		return isCmpSelMinMax(I) ? 3 : 2;
}		}

/// Checks if the instruction is in basic block \p BB.		/// Checks if the instruction is in basic block \p BB.
/// For a cmp+sel min/max reduction check that both ops are in \p BB.		/// For a cmp+sel min/max reduction check that both ops are in \p BB.
static bool hasSameParent(Instruction I, BasicBlock BB) {		static bool hasSameParent(Instruction I, BasicBlock BB) {
		if (isVectorLikeInstWithConstOps(I))
		return true;
if (isCmpSelMinMax(I) \|\| (isBoolLogicOp(I) && isa<SelectInst>(I))) {		if (isCmpSelMinMax(I) \|\| (isBoolLogicOp(I) && isa<SelectInst>(I))) {
auto *Sel = cast<SelectInst>(I);		auto *Sel = cast<SelectInst>(I);
auto *Cmp = dyn_cast<Instruction>(Sel->getCondition());		auto *Cmp = dyn_cast<Instruction>(Sel->getCondition());
return Sel->getParent() == BB && Cmp && Cmp->getParent() == BB;		return Sel->getParent() == BB && Cmp && Cmp->getParent() == BB;
}		}
return I->getParent() == BB;		return I->getParent() == BB;
}		}

/// Expected number of uses for reduction operations/reduced values.		/// Expected number of uses for reduction operations/reduced values.
static bool hasRequiredNumberOfUses(bool IsCmpSelMinMax, Instruction *I) {		static bool hasRequiredNumberOfUses(bool IsCmpSelMinMax, Instruction *I) {
if (IsCmpSelMinMax) {		if (IsCmpSelMinMax) {
// SelectInst must be used twice while the condition op must have single		// SelectInst must be used twice while the condition op must have single
// use only.		// use only.
if (auto *Sel = dyn_cast<SelectInst>(I))		if (auto *Sel = dyn_cast<SelectInst>(I))
return Sel->hasNUses(2) && Sel->getCondition()->hasOneUse();		return Sel->hasNUses(2) && Sel->getCondition()->hasOneUse();
return I->hasNUses(2);		return isVectorLikeInstWithConstOps(I) \|\| I->hasNUses(2);
}		}

// Arithmetic reduction operation must be used once only.		// Arithmetic reduction operation must be used once only.
return I->hasOneUse();		return isVectorLikeInstWithConstOps(I) \|\| I->hasOneUse();
}		}

/// Initializes the list of reduction operations.		/// Initializes the list of reduction operations.
void initReductionOps(Instruction *I) {		void initReductionOps(Instruction *I) {
if (isCmpSelMinMax(I))		if (isCmpSelMinMax(I))
ReductionOps.assign(2, ReductionOpsType());		ReductionOps.assign(2, ReductionOpsType());
else		else
ReductionOps.assign(1, ReductionOpsType());		ReductionOps.assign(1, ReductionOpsType());
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	while (!Stack.empty()) {
}		}
Stack.push_back(		Stack.push_back(
std::make_pair(EdgeInst, getFirstOperandIndex(EdgeInst)));		std::make_pair(EdgeInst, getFirstOperandIndex(EdgeInst)));
continue;		continue;
}		}
// I is an extra argument for TreeN (its parent operation).		// I is an extra argument for TreeN (its parent operation).
markExtraArg(Stack.back(), EdgeInst);		markExtraArg(Stack.back(), EdgeInst);
}		}
		// For vector-like instruction with constant operands all users must be
		// reduction ops only.
		if (any_of(ReducedVals, [Inst](Value *V) {
		return !isVectorLikeInstWithConstOps(V) \|\|
		!hasRequiredNumberOfUses(isCmpSelMinMax(Inst),
		cast<Instruction>(V));
		}))
		for (Value *V : ReducedVals)
		for (Value *U : V->users()) {
		bool IsFound = false;
		for (ArrayRef<Value *> RedOps : ReductionOps)
		if (is_contained(RedOps, U)) {
		IsFound = true;
		break;
		}
		if (!IsFound)
		return false;
		}

return true;		return true;
}		}

/// Attempt to vectorize the tree found by matchAssociativeReduction.		/// Attempt to vectorize the tree found by matchAssociativeReduction.
Value tryToReduce(BoUpSLP &V, TargetTransformInfo TTI) {		Value tryToReduce(BoUpSLP &V, TargetTransformInfo TTI) {
// If there are a sufficient number of reduction values, reduce		// If there are a sufficient number of reduction values, reduce
// to a nearby power-of-2. We can safely generate oversized		// to a nearby power-of-2. We can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.		// vectors and rely on the backend to split them to legal sizes.
▲ Show 20 Lines • Show All 1,144 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

	Show First 20 Lines • Show All 302 Lines • ▼ Show 20 Lines

	; Function Attrs: argmemonly nofree nosync nounwind willreturn			; Function Attrs: argmemonly nofree nosync nounwind willreturn
	declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1			declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1

	; Function Attrs: nounwind ssp uwtable mustprogress			; Function Attrs: nounwind ssp uwtable mustprogress

	define <4 x float> @reverse_hadd_v4f32(<4 x float> %a, <4 x float> %b) {			define <4 x float> @reverse_hadd_v4f32(<4 x float> %a, <4 x float> %b) {
	; CHECK-LABEL: @reverse_hadd_v4f32(			; CHECK-LABEL: @reverse_hadd_v4f32(
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <2 x i32> <i32 2, i32 0>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 2, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
	; CHECK-NEXT: [[TMP5:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> undef, <2 x i32> <i32 2, i32 0>			; CHECK-NEXT: [[TMP5:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 2, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[B]], <4 x float> undef, <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x float> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x float> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	; CHECK-NEXT: ret <4 x float> [[TMP9]]			; CHECK-NEXT: ret <4 x float> [[TMP9]]
	;			;
	%vecext = extractelement <4 x float> %a, i32 0			%vecext = extractelement <4 x float> %a, i32 0
	%vecext1 = extractelement <4 x float> %a, i32 1			%vecext1 = extractelement <4 x float> %a, i32 1
	%add = fadd float %vecext, %vecext1			%add = fadd float %vecext, %vecext1
	Show All 16 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4sf(<4 x float> %t) {			define float @test_merge_anyof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4sf(			; CHECK-LABEL: @test_merge_anyof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x float> [[T:%.]], i32 3			; CHECK-NEXT: [[TMP0:%.]] = shufflevector <4 x float> [[T:%.]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[T]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> <float poison, float poison, float poison, float poison, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[T]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[T]], <4 x float> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[T]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> <float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float poison, float poison, float poison, float poison>, <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x float> [[T]]			; CHECK-NEXT: [[TMP4:%.*]] = fcmp olt <8 x float> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = freeze <8 x i1> [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4			; CHECK-NEXT: [[TMP6:%.*]] = bitcast <8 x i1> [[TMP5]] to i8
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP6]], 0
	; CHECK-NEXT: [[CMP19:%.*]] = fcmp ogt float [[TMP3]], 1.000000e+00			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[T]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[TMP6]], i1 true, i1 [[CMP19]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[SHIFT]], [[T]]
	; CHECK-NEXT: [[CMP24:%.*]] = fcmp ogt float [[TMP2]], 1.000000e+00			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x float> [[TMP7]], i32 0
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP24]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[ADD]], float 0.000000e+00
	; CHECK-NEXT: [[CMP29:%.*]] = fcmp ogt float [[TMP1]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP29]]
	; CHECK-NEXT: [[CMP34:%.*]] = fcmp ogt float [[TMP0]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP34]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %t, i32 0			%vecext = extractelement <4 x float> %t, i32 0
	%conv = fpext float %vecext to double			%conv = fpext float %vecext to double
	%cmp = fcmp olt double %conv, 0.000000e+00			%cmp = fcmp olt double %conv, 0.000000e+00
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false

	▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4si(<4 x i32> %t) {			define float @test_merge_anyof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4si(			; CHECK-LABEL: @test_merge_anyof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x i32> [[T:%.]], i32 3			; CHECK-NEXT: [[TMP0:%.]] = shufflevector <4 x i32> [[T:%.]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[T]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 255, i32 255, i32 255, i32 255>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[T]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[T]], <4 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[T]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 poison, i32 poison, i32 poison, i32 poison>, <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x i32> [[T]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP5:%.*]] = freeze <8 x i1> [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4			; CHECK-NEXT: [[TMP6:%.*]] = bitcast <8 x i1> [[TMP5]] to i8
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i8 [[TMP6]], 0
	; CHECK-NEXT: [[CMP11:%.*]] = icmp sgt i32 [[TMP3]], 255			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[TMP6]], i1 true, i1 [[CMP11]]			; CHECK-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[SHIFT]], [[T]]
	; CHECK-NEXT: [[CMP14:%.*]] = icmp sgt i32 [[TMP2]], 255			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP14]]
	; CHECK-NEXT: [[CMP17:%.*]] = icmp sgt i32 [[TMP1]], 255
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP17]]
	; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[TMP0]], 255
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP20]]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[CONV]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[CONV]], float 0.000000e+00
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	%cmp = icmp slt i32 %vecext, 1			%cmp = icmp slt i32 %vecext, 1
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false

	lor.lhs.false:			lor.lhs.false:
	▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

Show All 35 Lines	;
%x10 = add i32 %x1, %x0		%x10 = add i32 %x1, %x0
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x210 = add i32 %x2, %x10		%x210 = add i32 %x2, %x10
ret i32 %x210		ret i32 %x210
}		}

define i32 @ext_ext_partial_add_reduction_and_extra_add_v4i32(<4 x i32> %x, <4 x i32> %y) {		define i32 @ext_ext_partial_add_reduction_and_extra_add_v4i32(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: @ext_ext_partial_add_reduction_and_extra_add_v4i32(		; CHECK-LABEL: @ext_ext_partial_add_reduction_and_extra_add_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]], <4 x i32> <i32 4, i32 2, i32 5, i32 6>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[Y:%.]], <4 x i32> [[X:%.*]], <4 x i32> <i32 0, i32 6, i32 1, i32 2>
; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])		; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])
; CHECK-NEXT: ret i32 [[TMP2]]		; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%y0 = extractelement <4 x i32> %y, i32 0		%y0 = extractelement <4 x i32> %y, i32 0
%y1 = extractelement <4 x i32> %y, i32 1		%y1 = extractelement <4 x i32> %y, i32 1
%y10 = add i32 %y1, %y0		%y10 = add i32 %y1, %y0
%y2 = extractelement <4 x i32> %y, i32 2		%y2 = extractelement <4 x i32> %y, i32 2
%y210 = add i32 %y2, %y10		%y210 = add i32 %y2, %y10
▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

	Show All 15 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @int_sin_4x(			; NOACCELERATE-LABEL: @int_sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @exp_4x(			; NOACCELERATE-LABEL: @exp_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @expf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @log_4x(			; NOACCELERATE-LABEL: @log_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @logf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @logf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @sin_4x(			; NOACCELERATE-LABEL: @sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	Show All 16 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @cos_4x(			; NOACCELERATE-LABEL: @cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @int_cos_4x(			; NOACCELERATE-LABEL: @int_cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

	Show All 15 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @int_sin_4x(			; NOACCELERATE-LABEL: @int_sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @exp_4x(			; NOACCELERATE-LABEL: @exp_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @expf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @log_4x(			; NOACCELERATE-LABEL: @log_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @logf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @logf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @sin_4x(			; NOACCELERATE-LABEL: @sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	Show All 16 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @cos_4x(			; NOACCELERATE-LABEL: @cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP1]]			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOACCELERATE-LABEL: @int_cos_4x(			; NOACCELERATE-LABEL: @int_cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP2]])
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0			; NOACCELERATE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1			; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP5:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP5]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-threshold=-5 -S -pass-remarks-output=%t < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-threshold=-2 -S -pass-remarks-output=%t < %s \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=YAML %s			; RUN: cat %t \| FileCheck -check-prefix=YAML %s


	; FIXME: The threshold is changed to keep this test case a bit smaller.			; FIXME: The threshold is changed to keep this test case a bit smaller.
	; The AArch64 cost model should not give such high costs to select statements.			; The AArch64 cost model should not give such high costs to select statements.

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux"			target triple = "aarch64--linux"
	▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/reorder-fmuladd-crash.ll

	Show All 18 Lines
	; CHECK: sw.bb:			; CHECK: sw.bb:
	; CHECK-NEXT: [[ARRAYIDX43:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 1, i64 0			; CHECK-NEXT: [[ARRAYIDX43:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 1, i64 0
	; CHECK-NEXT: [[ARRAYIDX45:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 2, i64 0			; CHECK-NEXT: [[ARRAYIDX45:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 2, i64 0
	; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 2, i64 1			; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 2, i64 1
	; CHECK-NEXT: [[ARRAYIDX58:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 1, i64 1			; CHECK-NEXT: [[ARRAYIDX58:%.]] = getelementptr inbounds [4 x [2 x double]], [4 x [2 x double]] undef, i32 0, i64 1, i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX43]] to <4 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX43]] to <4 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP1]], <double 0x7FF8000000000000, double 0x7FF8000000000000, double 0x7FF8000000000000, double 0x7FF8000000000000>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[TMP1]], <double 0x7FF8000000000000, double 0x7FF8000000000000, double 0x7FF8000000000000, double 0x7FF8000000000000>
	; CHECK-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> poison, <4 x double> zeroinitializer, <4 x double> [[TMP2]])			; CHECK-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> undef, <4 x double> zeroinitializer, <4 x double> [[TMP2]])
	; CHECK-NEXT: br label [[SW_EPILOG:%.*]]			; CHECK-NEXT: br label [[SW_EPILOG:%.*]]
	; CHECK: sw.bb195:			; CHECK: sw.bb195:
	; CHECK-NEXT: br label [[SW_EPILOG]]			; CHECK-NEXT: br label [[SW_EPILOG]]
	; CHECK: do.body:			; CHECK: do.body:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: sw.epilog:			; CHECK: sw.epilog:
	; CHECK-NEXT: [[TMP4:%.*]] = phi <4 x double> [ poison, [[SW_BB195]] ], [ [[TMP3]], [[SW_BB]] ]			; CHECK-NEXT: [[TMP4:%.*]] = phi <4 x double> [ undef, [[SW_BB195]] ], [ [[TMP3]], [[SW_BB]] ]
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	; CHECK: if.end.1:			; CHECK: if.end.1:
	; CHECK-NEXT: br label [[FOR_COND15_1:%.*]]			; CHECK-NEXT: br label [[FOR_COND15_1:%.*]]
	; CHECK: for.cond15.1:			; CHECK: for.cond15.1:
	; CHECK-NEXT: br i1 undef, label [[FOR_END39:%.*]], label [[FOR_COND15_PREHEADER]]			; CHECK-NEXT: br i1 undef, label [[FOR_END39:%.*]], label [[FOR_COND15_PREHEADER]]
	;			;
	entry:			entry:
	%conv = sitofp i32 undef to double			%conv = sitofp i32 undef to double
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S %s \| FileCheck %s

	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
	target triple = "arm64-apple-darwin"			target triple = "arm64-apple-darwin"

	declare void @use(double)			declare void @use(double)

	; The extracts %v1.lane.0 and %v1.lane.1 should be considered free during SLP,			; The extracts %v1.lane.0 and %v1.lane.1 should be considered free during SLP,
	; because they will be directly in a vector register on AArch64.			; because they will be directly in a vector register on AArch64.
	define void @noop_extracts_first_2_lanes(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extracts_first_2_lanes(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extracts_first_2_lanes(			; CHECK-LABEL: @noop_extracts_first_2_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[V2_LANE_3:%.*]] = extractelement <4 x double> [[V_2]], i32 3			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> [[V_1]], [[TMP0]]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_3]], i32 1			; CHECK-NEXT: call void @use(double [[TMP2]])
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[V_1]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: call void @use(double [[TMP3]])			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP1]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 17 Lines
	define void @extracts_first_2_lanes_different_vectors(<2 x double>* %ptr.1, <4 x double>* %ptr.2, <2 x double>* %ptr.3) {			define void @extracts_first_2_lanes_different_vectors(<2 x double>* %ptr.1, <4 x double>* %ptr.2, <2 x double>* %ptr.3) {
	; CHECK-LABEL: @extracts_first_2_lanes_different_vectors(			; CHECK-LABEL: @extracts_first_2_lanes_different_vectors(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8			; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8
	; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1			; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <2 x double> [[V_1]], <2 x double> [[V_3]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V3_LANE_1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V3_LANE_1]])			; CHECK-NEXT: call void @use(double [[V3_LANE_1]])
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8			%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8
	%v3.lane.1 = extractelement <2 x double> %v.3, i32 1			%v3.lane.1 = extractelement <2 x double> %v.3, i32 1

	Show All 17 Lines
	; because they will be directly in a vector register on AArch64.			; because they will be directly in a vector register on AArch64.
	define void @noop_extract_second_2_lanes(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extract_second_2_lanes(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extract_second_2_lanes(			; CHECK-LABEL: @noop_extract_second_2_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_1]], <4 x double> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[TMP3]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <4 x double> %v.1, i32 3			%v1.lane.3 = extractelement <4 x double> %v.1, i32 3

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 13 Lines

	; %v1.lane.0 and %v1.lane.1 are used in reverse-order, so they won't be			; %v1.lane.0 and %v1.lane.1 are used in reverse-order, so they won't be
	; directly in a vector register on AArch64.			; directly in a vector register on AArch64.
	define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extract_reverse_order(			; CHECK-LABEL: @extract_reverse_order(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> [[V_1]], [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[V_1]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: call void @use(double [[TMP4]])			; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[V_1]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: call void @use(double [[TMP5]])
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 15 Lines
	; %v1.lane.1 and %v1.lane.2 are extracted from different vector registers on AArch64.			; %v1.lane.1 and %v1.lane.2 are extracted from different vector registers on AArch64.
	define void @extract_lanes_1_and_2(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extract_lanes_1_and_2(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extract_lanes_1_and_2(			; CHECK-LABEL: @extract_lanes_1_and_2(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_1]], <4 x double> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_1]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <2 x i32> <i32 2, i32 2>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_2]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[TMP3]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.1 = extractelement <4 x double> %v.1, i32 1			%v1.lane.1 = extractelement <4 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 19 Lines
	; CHECK-LABEL: @noop_extracts_existing_vector_4_lanes(			; CHECK-LABEL: @noop_extracts_existing_vector_4_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = fmul <4 x double> [[TMP0]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_0]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[TMP1]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_1]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_0]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP5]], <4 x double> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x double> [[TMP6]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[TMP7]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP2]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	Show All 23 Lines
	; CHECK-LABEL: @extracts_jumbled_4_lanes(			; CHECK-LABEL: @extracts_jumbled_4_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 undef, i32 1, i32 2, i32 0>
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_0]], i32 0
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 undef, i32 2, i32 1, i32 3>
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[TMP1]], <4 x double> [[TMP2]], <4 x i32> <i32 2, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <4 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[A_LANE_2:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[TMP4]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[A_LANE_3:%.*]] = fmul double [[V1_LANE_3]], [[V2_LANE_0]]
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[A_LANE_0]], i32 0
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[A_LANE_2]], i32 2
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[A_LANE_3]], i32 3
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[A_INS_3]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP5]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	Show All 33 Lines
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5			; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6			; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7			; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8			; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_3]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 0, i32 1>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_4]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 0, i32 2, i32 1, i32 0, i32 2, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_5]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = fmul <8 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_6]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_7]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_8]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_0]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_1]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_0]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 1, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x double> [[TMP2]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP12]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP3]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 6, i32 7, i32 8, i32 0, i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_7]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 2, i32 1, i32 0, i32 2, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = fmul <8 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x double> [[TMP23]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 1, i32 2, i32 0, i32 1>
	; CHECK-NEXT: [[TMP24:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <8 x double> [[TMP24]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[TMP6]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP25]], double [[B_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP7]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5			; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6			; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7			; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8			; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 4, i32 3, i32 6, i32 5, i32 8, i32 7, i32 1, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 1, i32 0, i32 2, i32 0, i32 2, i32 1, i32 0, i32 2>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_6]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = fmul <8 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_5]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_1]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_0]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_2]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x double> [[TMP2]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP12]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP3]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 6, i32 7, i32 8, i32 0, i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_7]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = fmul <8 x double> [[TMP4]], [[TMP1]]
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE1]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <8 x double> [[TMP21]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x double> [[TMP5]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP22]], double [[B_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP6]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5			; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6			; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7			; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8			; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 4, i32 3, i32 5, i32 6, i32 8, i32 7, i32 1, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 0, i32 2, i32 1, i32 2, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_5]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = fmul <8 x double> [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_6]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_0]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x double> [[TMP2]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP12]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[TMP3]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_7]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <8 x i32> <i32 7, i32 6, i32 8, i32 1, i32 0, i32 3, i32 2, i32 5>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_6]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <8 x i32> <i32 2, i32 1, i32 0, i32 2, i32 0, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = fmul <8 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_1]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_0]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_3]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_2]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_5]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x double> [[TMP23]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP24:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_2]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_2]]
	; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <8 x double> [[TMP24]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[TMP6]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP25]], double [[B_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[TMP7]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[INS_11:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_11]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[TMP3]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	%arg1.1 = extractelement <3 x i16> %arg1, i64 1			%arg1.1 = extractelement <3 x i16> %arg1, i64 1
	Show All 25 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])
	; GFX8-NEXT: [[INS_32:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_32]]			; GFX8-NEXT: ret <4 x i16> [[INS_31]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[INS_11:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_11]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[TMP3]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	%arg1.1 = extractelement <3 x i16> %arg1, i64 1			%arg1.1 = extractelement <3 x i16> %arg1, i64 1
	Show All 25 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])
	; GFX8-NEXT: [[INS_32:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_32]]			; GFX8-NEXT: ret <4 x i16> [[INS_31]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck %s

	define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {			define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {
	; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(			; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[ARG0_1:%.*]] = extractelement <9 x i16> undef, i64 7			; CHECK-NEXT: [[ARG0_1:%.*]] = extractelement <9 x i16> undef, i64 7
	; CHECK-NEXT: [[ARG0_2:%.]] = extractelement <9 x i16> [[ARG0:%.]], i64 8			; CHECK-NEXT: [[TMP0:%.]] = shufflevector <9 x i16> [[ARG0:%.]], <9 x i16> poison, <2 x i32> <i32 undef, i32 8>
	; CHECK-NEXT: [[ARG1_1:%.]] = extractelement <9 x i16> [[ARG1:%.]], i64 7			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <9 x i16> [[ARG1:%.]], <9 x i16> poison, <2 x i32> <i32 7, i32 8>
	; CHECK-NEXT: [[ARG1_2:%.*]] = extractelement <9 x i16> [[ARG1]], i64 8			; CHECK-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> poison, i16 [[ARG0_1]], i32 0			; CHECK-NEXT: ret <2 x i16> [[TMP2]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 [[ARG0_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i16> poison, i16 [[ARG1_1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i16> [[TMP2]], i16 [[ARG1_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP3]])
	; CHECK-NEXT: ret <2 x i16> [[TMP4]]
	;			;
	bb:			bb:
	%arg0.1 = extractelement <9 x i16> undef, i64 7			%arg0.1 = extractelement <9 x i16> undef, i64 7
	%arg0.2 = extractelement <9 x i16> %arg0, i64 8			%arg0.2 = extractelement <9 x i16> %arg0, i64 8
	%arg1.1 = extractelement <9 x i16> %arg1, i64 7			%arg1.1 = extractelement <9 x i16> %arg1, i64 7
	%arg1.2 = extractelement <9 x i16> %arg1, i64 8			%arg1.2 = extractelement <9 x i16> %arg1, i64 8
	%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)			%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)
	%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)			%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)
	%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0			%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0
	%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1			%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1
	ret <2 x i16> %ins.2			ret <2 x i16> %ins.2
	}			}

	declare i16 @llvm.uadd.sat.i16(i16, i16) #0			declare i16 @llvm.uadd.sat.i16(i16, i16) #0
	attributes #0 = { nounwind readnone speculatable willreturn }			attributes #0 = { nounwind readnone speculatable willreturn }

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

	Show All 12 Lines
	; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0			; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0
	; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1			; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1
	; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2			; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2) to <2 x i32>*), align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2) to <2 x i32>*), align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ADD277]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ADD277]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> poison, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = ashr <4 x i32> [[TMP6]], <i32 6, i32 6, i32 6, i32 6>			; CHECK-NEXT: [[TMP7:%.*]] = ashr <4 x i32> [[TMP6]], <i32 6, i32 6, i32 6, i32 6>
	; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3			; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%add277 = add nsw i32 undef, undef			%add277 = add nsw i32 undef, undef
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> undef, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-5 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP9:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])
	; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]			; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
	Show All 16 Lines
	; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_EXTRA26]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_EXTRA26]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[SHUFFLE]], <8 x i32> poison, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = and <2 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = and <2 x i32> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> [[TMP8]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP10]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP12:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP12:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	Show All 30 Lines
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529			; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
	; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]			; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]
	; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685			; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 undef, i32 1>
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[VAL_40]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = and <2 x i32> [[TMP8]], [[TMP9]]			; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = and <2 x i32> [[TMP8]], [[TMP9]]
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP8]], [[TMP9]]			; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP8]], [[TMP9]]
	; FORCE_REDUCTION-NEXT: [[TMP12]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], <2 x i32> <i32 0, i32 3>			; FORCE_REDUCTION-NEXT: [[TMP12]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0			; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SSE-NEXT: ret <8 x float> [[R71]]			; SSE-NEXT: ret <8 x float> [[R71]]
	;			;
	; SLM-LABEL: @ceil_floor(			; SLM-LABEL: @ceil_floor(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0			; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R71]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0			; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SSE-NEXT: ret <8 x float> [[R71]]			; SSE-NEXT: ret <8 x float> [[R71]]
	;			;
	; SLM-LABEL: @ceil_floor(			; SLM-LABEL: @ceil_floor(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0			; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R71]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	;
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; CHECK-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
; CHECK-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R72]]		; CHECK-NEXT: ret <8 x float> [[R71]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%b2 = extractelement <8 x i16> %b, i32 2		%b2 = extractelement <8 x i16> %b, i32 2
Show All 18 Lines
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R53:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; CHECK-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[TMP12]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R53]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: ret <8 x float> [[R72]]		; CHECK-NEXT: ret <8 x float> [[R71]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%c0 = extractelement <16 x i8> %c, i32 0		%c0 = extractelement <16 x i8> %c, i32 0
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	;
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; CHECK-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
; CHECK-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R72]]		; CHECK-NEXT: ret <8 x float> [[R71]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%b2 = extractelement <8 x i16> %b, i32 2		%b2 = extractelement <8 x i16> %b, i32 2
Show All 18 Lines
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>		; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>		; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R53:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; CHECK-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[TMP12]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R53]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: ret <8 x float> [[R72]]		; CHECK-NEXT: ret <8 x float> [[R71]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%c0 = extractelement <16 x i8> %c, i32 0		%c0 = extractelement <16 x i8> %c, i32 0
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2			; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>			; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i32 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; AVX-NEXT: ret <4 x float> [[TMP1]]			; AVX-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; AVX512-LABEL: @fmul_fdiv_v4f32_const(			; AVX512-LABEL: @fmul_fdiv_v4f32_const(
	Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2			; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>			; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i32 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; AVX-NEXT: ret <4 x float> [[TMP1]]			; AVX-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; AVX512-LABEL: @fmul_fdiv_v4f32_const(			; AVX512-LABEL: @fmul_fdiv_v4f32_const(
	Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R71]]
;		;
; SLM-LABEL: @ashr_shl_v8i32_const(		; SLM-LABEL: @ashr_shl_v8i32_const(
; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: [[TMP8:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP8:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>		; SSE-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; SSE-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; SSE-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R52]], <8 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R52]], <8 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R71]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R71]]		; SLM-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R71]]		; AVX1-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @sdiv_v8i32_undefs(		; AVX2-LABEL: @sdiv_v8i32_undefs(
; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1		; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5		; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @sdiv_v8i32_undefs(		; AVX512-LABEL: @sdiv_v8i32_undefs(
; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1		; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5		; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R71]]
;		;
; SLM-LABEL: @ashr_shl_v8i32_const(		; SLM-LABEL: @ashr_shl_v8i32_const(
; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: [[TMP8:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP8:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>		; SSE-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; SSE-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; SSE-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R52]], <8 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R52]], <8 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R71]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R71]]		; SLM-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R71]]		; AVX1-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]		; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @sdiv_v8i32_undefs(		; AVX2-LABEL: @sdiv_v8i32_undefs(
; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1		; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5		; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX2-NEXT: ret <8 x i32> [[R71]]		; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @sdiv_v8i32_undefs(		; AVX512-LABEL: @sdiv_v8i32_undefs(
; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>
; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>		; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1		; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5		; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>		; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX512-NEXT: ret <8 x i32> [[R71]]		; AVX512-NEXT: ret <8 x i32> [[R71]]
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5			; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
	; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1			; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2			; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3			; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
	; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4			; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5			; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6			; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7			; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1			; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0			; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1
	; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]
	; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1
	; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
	; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1
	; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]
	; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
	; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1
	; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
	; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1
	; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]
	; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
	; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1
	; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
	; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1
	; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]
	; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP21]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R73]]			; SLM-NEXT: ret <8 x double> [[R73]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fdiv <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 2, i32 3>
	; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5			; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6			; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP9:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP8]]
	; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1			; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x double> [[A]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2			; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x double> [[B]], <8 x double> poison, <2 x i32> <i32 6, i32 7>
	; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3			; SLM-NEXT: [[TMP12:%.*]] = fdiv <2 x double> [[TMP10]], [[TMP11]]
	; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4			; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5			; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6			; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP13]], <8 x double> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7			; SLM-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1			; SLM-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0			; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP16]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1
	; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]
	; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1
	; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
	; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1
	; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]
	; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
	; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1
	; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
	; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1
	; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]
	; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
	; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1
	; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
	; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1
	; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]
	; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R31:%.*]] = shufflevector <8 x double> [[TMP21]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x double> [[R31]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R73:%.*]] = shufflevector <8 x double> [[R52]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R73]]			; SLM-NEXT: ret <8 x double> [[R73]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	;
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i32 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i32 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i32 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i32 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	;
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i32 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i32 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i32 0
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i32 1
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
	; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4			; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4			; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
	; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]			; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]			; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]
	; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> poison, i1 [[C0]], i32 0			; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> poison, i1 [[C0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>			; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3			; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3
	; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>			; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>
	; CHECK-NEXT: ret <4 x i32> [[R]]			; CHECK-NEXT: ret <4 x i32> [[R]]
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
	; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4			; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4			; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
	; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]			; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]			; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]
	; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> undef, i1 [[C0]], i32 0			; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> undef, i1 [[C0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>			; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3			; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3
	; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>			; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>
	; CHECK-NEXT: ret <4 x i32> [[R]]			; CHECK-NEXT: ret <4 x i32> [[R]]
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

Show All 11 Lines


; Check that we correctly detect a splat/broadcast by leveraging the		; Check that we correctly detect a splat/broadcast by leveraging the
; commutativity property of `xor`.		; commutativity property of `xor`.

define void @splat(i8 %a, i8 %b, i8 %c) {		define void @splat(i8 %a, i8 %b, i8 %c) {
; SSE-LABEL: @splat(		; SSE-LABEL: @splat(
; SSE-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0		; SSE-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer
; SSE-NEXT: [[TMP2:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0		; SSE-NEXT: [[TMP3:%.]] = insertelement <2 x i8> poison, i8 [[A:%.]], i32 0
; SSE-NEXT: [[TMP3:%.]] = insertelement <16 x i8> [[TMP2]], i8 [[B:%.]], i32 1		; SSE-NEXT: [[TMP4:%.]] = insertelement <2 x i8> [[TMP3]], i8 [[B:%.]], i32 1
; SSE-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <2 x i8> [[TMP4]], <2 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
; SSE-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]		; SSE-NEXT: [[TMP6:%.*]] = xor <16 x i8> [[TMP2]], [[TMP5]]
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @splat(		; AVX-LABEL: @splat(
; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0		; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer
; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0		; AVX-NEXT: [[TMP3:%.]] = insertelement <2 x i8> poison, i8 [[A:%.]], i32 0
; AVX-NEXT: [[TMP3:%.]] = insertelement <16 x i8> [[TMP2]], i8 [[B:%.]], i32 1		; AVX-NEXT: [[TMP4:%.]] = insertelement <2 x i8> [[TMP3]], i8 [[B:%.]], i32 1
; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>		; AVX-NEXT: [[TMP5:%.*]] = shufflevector <2 x i8> [[TMP4]], <2 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
; AVX-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]		; AVX-NEXT: [[TMP6:%.*]] = xor <16 x i8> [[TMP2]], [[TMP5]]
; AVX-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16		; AVX-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%1 = xor i8 %c, %a		%1 = xor i8 %c, %a
store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16		store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16
%2 = xor i8 %a, %c		%2 = xor i8 %a, %c
store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)		store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)
%3 = xor i8 %a, %c		%3 = xor i8 %a, %c
store i8 %3, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 2)		store i8 %3, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 2)
Show All 26 Lines	;
ret void		ret void
}		}

; Check that we correctly detect that we can have the same opcode on one side by		; Check that we correctly detect that we can have the same opcode on one side by
; leveraging the commutativity property of `xor`.		; leveraging the commutativity property of `xor`.

define void @same_opcode_on_one_side(i32 %a, i32 %b, i32 %c) {		define void @same_opcode_on_one_side(i32 %a, i32 %b, i32 %c) {
; SSE-LABEL: @same_opcode_on_one_side(		; SSE-LABEL: @same_opcode_on_one_side(
; SSE-NEXT: [[ADD1:%.]] = add i32 [[C:%.]], [[A:%.*]]		; SSE-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
; SSE-NEXT: [[ADD2:%.*]] = add i32 [[C]], [[A]]		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
; SSE-NEXT: [[ADD3:%.*]] = add i32 [[A]], [[C]]		; SSE-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
; SSE-NEXT: [[ADD4:%.*]] = add i32 [[C]], [[A]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
; SSE-NEXT: [[TMP1:%.*]] = xor i32 [[ADD1]], [[A]]		; SSE-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16		; SSE-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 undef, i32 4, i32 0>
; SSE-NEXT: [[TMP2:%.]] = xor i32 [[B:%.]], [[ADD2]]		; SSE-NEXT: [[TMP7:%.]] = insertelement <4 x i32> [[TMP6]], i32 [[B:%.]], i32 1
; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1), align 4		; SSE-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP5]], [[TMP7]]
; SSE-NEXT: [[TMP3:%.*]] = xor i32 [[C]], [[ADD3]]		; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
; SSE-NEXT: store i32 [[TMP3]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2), align 4
; SSE-NEXT: [[TMP4:%.*]] = xor i32 [[A]], [[ADD4]]
; SSE-NEXT: store i32 [[TMP4]], i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @same_opcode_on_one_side(		; AVX-LABEL: @same_opcode_on_one_side(
; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0		; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0		; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer		; AVX-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
; AVX-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; AVX-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 1		; AVX-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 undef, i32 4, i32 0>
; AVX-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[C]], i32 2		; AVX-NEXT: [[TMP7:%.]] = insertelement <4 x i32> [[TMP6]], i32 [[B:%.]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 3		; AVX-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP5]], [[TMP7]]
; AVX-NEXT: [[TMP7:%.*]] = xor <4 x i32> [[TMP3]], [[TMP6]]		; AVX-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
; AVX-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%add1 = add i32 %c, %a		%add1 = add i32 %c, %a
%add2 = add i32 %c, %a		%add2 = add i32 %c, %a
%add3 = add i32 %a, %c		%add3 = add i32 %a, %c
%add4 = add i32 %c, %a		%add4 = add i32 %c, %a
%1 = xor i32 %add1, %a		%1 = xor i32 %add1, %a
store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16		store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16
%2 = xor i32 %b, %add2		%2 = xor i32 %b, %add2
store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)		store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)
%3 = xor i32 %c, %add3		%3 = xor i32 %c, %add3
store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)		store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)
%4 = xor i32 %a, %add4		%4 = xor i32 %a, %add4
store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)		store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 23 Lines
	; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP5]], <2 x i32> <i32 0, i32 2>
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	}			}

	define fastcc void @dct36(double* %inbuf) {			define fastcc void @dct36(double* %inbuf) {
	; CHECK-LABEL: @dct36(			; CHECK-LABEL: @dct36(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2			%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2
	%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1			%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1
	%0 = load double, double* %arrayidx44, align 8			%0 = load double, double* %arrayidx44, align 8
	%add46 = fadd double %0, undef			%add46 = fadd double %0, undef
	store double %add46, double* %arrayidx41, align 8			store double %add46, double* %arrayidx41, align 8
	%1 = load double, double* %inbuf, align 8			%1 = load double, double* %inbuf, align 8
	%add49 = fadd double %1, %0			%add49 = fadd double %1, %0
	store double %add49, double* %arrayidx44, align 8			store double %add49, double* %arrayidx44, align 8
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }			%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }
	%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }			%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }

	define void @_Z8radianceRK3RayiPt() #0 {			define void @_Z8radianceRK3RayiPt() #0 {
	; CHECK-LABEL: @_Z8radianceRK3RayiPt(			; CHECK-LABEL: @_Z8radianceRK3RayiPt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]
	; CHECK: if.then38:			; CHECK: if.then38:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double undef, double poison>, double undef, i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> undef, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> undef, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> undef, [[TMP6]]
	; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x double> undef, <2 x double>* [[TMP0]], align 8
	; CHECK-NEXT: br label [[RETURN:%.*]]			; CHECK-NEXT: br label [[RETURN:%.*]]
	; CHECK: if.then78:			; CHECK: if.then78:
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.then78, label %if.then38			br i1 undef, label %if.then78, label %if.then38
	Show All 34 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll

	Show All 32 Lines
	; CHECK-NEXT: br i1 undef, label [[TMP12:%.]], label [[TMP13:%.]]			; CHECK-NEXT: br i1 undef, label [[TMP12:%.]], label [[TMP13:%.]]
	; CHECK: ret void			; CHECK: ret void
	; CHECK: [[TMP14:%.]] = bitcast double [[TMP5]] to <2 x double>*			; CHECK: [[TMP14:%.]] = bitcast double [[TMP5]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: br i1 undef, label [[TMP15:%.]], label [[TMP16:%.]]			; CHECK-NEXT: br i1 undef, label [[TMP15:%.]], label [[TMP16:%.]]
	; CHECK: br label [[TMP16]]			; CHECK: br label [[TMP16]]
	; CHECK: br i1 undef, label [[TMP17:%.*]], label [[TMP18]]			; CHECK: br i1 undef, label [[TMP17:%.*]], label [[TMP18]]
	; CHECK: unreachable			; CHECK: unreachable
	; CHECK: [[TMP19:%.*]] = extractelement <2 x double> [[TMP11]], i32 0			; CHECK: switch i32 undef, label [[TMP21]] [
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x double> [[TMP11]], i32 1
	; CHECK-NEXT: switch i32 undef, label [[TMP21]] [
	; CHECK-NEXT: i32 32, label [[TMP7]]			; CHECK-NEXT: i32 32, label [[TMP7]]
	; CHECK-NEXT: i32 103, label [[TMP7]]			; CHECK-NEXT: i32 103, label [[TMP7]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: br i1 undef, label [[TMP7]], label [[TMP22:%.*]]			; CHECK: br i1 undef, label [[TMP7]], label [[TMP22:%.*]]
	; CHECK: unreachable			; CHECK: unreachable
	;			;
	%1 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0			%1 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0
	%2 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1			%2 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 17 Lines
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00			; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP5]], 4.000000e+00
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP2]], double [[MUL11]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[MUL11]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %G, i64 5			%arrayidx = getelementptr inbounds double, double* %G, i64 5
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	%add = fadd double %mul, 1.000000e+00			%add = fadd double %mul, 1.000000e+00
	store double %add, double* %G, align 8			store double %add, double* %G, align 8
	▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-2 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-2 \| FileCheck %s

	define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast(			; CHECK-LABEL: @diamond_broadcast(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	%mul8 = mul i32 %ld, %ld			%mul8 = mul i32 %ld, %ld
	%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1			%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1
	store i32 %mul8, i32* %arrayidx9, align 4			store i32 %mul8, i32* %arrayidx9, align 4
	%mul14 = mul i32 %ld, %ld			%mul14 = mul i32 %ld, %ld
	%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2			%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2
	store i32 %mul14, i32* %arrayidx15, align 4			store i32 %mul14, i32* %arrayidx15, align 4
	%mul20 = mul i32 %ld, undef			%mul20 = mul i32 %ld, undef
	%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3			%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3
	store i32 %mul20, i32* %arrayidx21, align 4			store i32 %mul20, i32* %arrayidx21, align 4
	ret i32 0			ret i32 0
	}			}

	define i32 @diamond_broadcast2(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast2(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast2(			; CHECK-LABEL: @diamond_broadcast2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> <i32 poison, i32 undef>, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	%mul8 = mul i32 %ld, %ld			%mul8 = mul i32 %ld, %ld
	%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1			%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1
	store i32 %mul8, i32* %arrayidx9, align 4			store i32 %mul8, i32* %arrayidx9, align 4
	%mul14 = mul i32 %ld, %ld			%mul14 = mul i32 %ld, %ld
	%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2			%arrayidx15 = getelementptr inbounds i32, i32* %B, i64 2
	store i32 %mul14, i32* %arrayidx15, align 4			store i32 %mul14, i32* %arrayidx15, align 4
	%mul20 = mul i32 undef, %ld			%mul20 = mul i32 undef, %ld
	%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3			%arrayidx21 = getelementptr inbounds i32, i32* %B, i64 3
	store i32 %mul20, i32* %arrayidx21, align 4			store i32 %mul20, i32* %arrayidx21, align 4
	ret i32 0			ret i32 0
	}			}

	define i32 @diamond_broadcast3(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast3(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast3(			; CHECK-LABEL: @diamond_broadcast3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> <i32 poison, i32 undef>, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 0>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	%mul8 = mul i32 %ld, %ld			%mul8 = mul i32 %ld, %ld
	%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1			%arrayidx9 = getelementptr inbounds i32, i32* %B, i64 1
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s

	define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x i8> [[X:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[Y1:%.]] = extractelement <2 x i8> [[Y:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i8> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i8> poison, i8 [[X0]], i32 0			; CHECK-NEXT: ret <2 x i8> [[TMP2]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i8> [[TMP1]], i8 [[Y1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = mul <2 x i8> [[TMP2]], [[TMP2]]
	; CHECK-NEXT: ret <2 x i8> [[TMP3]]
	;			;
	%x0 = extractelement <2 x i8> %x, i32 0			%x0 = extractelement <2 x i8> %x, i32 0
	%y1 = extractelement <2 x i8> %y, i32 1			%y1 = extractelement <2 x i8> %y, i32 1
	%x0x0 = mul i8 %x0, %x0			%x0x0 = mul i8 %x0, %x0
	%y1y1 = mul i8 %y1, %y1			%y1y1 = mul i8 %y1, %y1
	%ins1 = insertelement <2 x i8> poison, i8 %x0x0, i32 0			%ins1 = insertelement <2 x i8> poison, i8 %x0x0, i32 0
	%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1			%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
	ret <2 x i8> %ins2			ret <2 x i8> %ins2
	}			}

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -slp-schedule-budget=1 \| FileCheck %s

	define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x i8> [[X:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[Y1:%.]] = extractelement <2 x i8> [[Y:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i8> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i8> poison, i8 [[X0]], i32 0			; CHECK-NEXT: ret <2 x i8> [[TMP2]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i8> [[TMP1]], i8 [[Y1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = mul <2 x i8> [[TMP2]], [[TMP2]]
	; CHECK-NEXT: ret <2 x i8> [[TMP3]]
	;			;
	%x0 = extractelement <2 x i8> %x, i32 0			%x0 = extractelement <2 x i8> %x, i32 0
	%y1 = extractelement <2 x i8> %y, i32 1			%y1 = extractelement <2 x i8> %y, i32 1
	%x0x0 = mul i8 %x0, %x0			%x0x0 = mul i8 %x0, %x0
	%y1y1 = mul i8 %y1, %y1			%y1y1 = mul i8 %y1, %y1
	%ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0			%ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0
	%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1			%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
	ret <2 x i8> %ins2			ret <2 x i8> %ins2
	}			}

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

define void @fextr2(double* %ptr) {		define void @fextr2(double* %ptr) {
; CHECK-LABEL: @fextr2(		; CHECK-LABEL: @fextr2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32		; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32
; CHECK-NEXT: [[V0:%.*]] = extractelement <4 x double> [[LD]], i32 0
; CHECK-NEXT: [[V1:%.*]] = extractelement <4 x double> [[LD]], i32 1
; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0		; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0]], i32 0		; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x double> [[LD]], <4 x double> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> [[TMP0]], <double 5.500000e+00, double 6.600000e+00>
; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 5.500000e+00, double 6.600000e+00>		; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[P0]] to <2 x double>*
; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[P0]] to <2 x double>*		; CHECK-NEXT: store <2 x double> [[TMP1]], <2 x double>* [[TMP2]], align 4
; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%LD = load <4 x double>, <4 x double>* undef		%LD = load <4 x double>, <4 x double>* undef
%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.		%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.
%V1 = extractelement <4 x double> %LD, i32 1		%V1 = extractelement <4 x double> %LD, i32 1
%P0 = getelementptr inbounds double, double* %ptr, i64 0		%P0 = getelementptr inbounds double, double* %ptr, i64 0
%P1 = getelementptr inbounds double, double* %ptr, i64 1		%P1 = getelementptr inbounds double, double* %ptr, i64 1
%A0 = fadd double %V0, 5.5		%A0 = fadd double %V0, 5.5
%A1 = fadd double %V1, 6.6		%A1 = fadd double %V1, 6.6
store double %A0, double* %P0, align 4		store double %A0, double* %P0, align 4
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0			; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0
	; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1			; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1
	; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]			; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[X]], [[TMP1]]
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH1-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[X]], [[TMP1]]
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH2-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/hoist.ll

	Show All 10 Lines
	; A[i+2] += n;			; A[i+2] += n;
	; A[i+3] += k;			; A[i+3] += k;
	; }			; }
	;}			;}

	define i32 @foo(i32* nocapture %A, i32 %n, i32 %k) {			define i32 @foo(i32* nocapture %A, i32 %n, i32 %k) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[N:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[N:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> [[TMP0]], i32 [[K:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[K:%.]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_024:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD10:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_024:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD10:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[I_024]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[I_024]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

	Show All 17 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8			; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x float> poison, float [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP3]], <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 1, i32 0, i32 3, i32 2>
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP11]])
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x float> [[TMP12]], float [[TMP5]], i32 2			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP12]], [[CONV]]
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x float> [[TMP13]], float [[TMP4]], i32 3
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x float> [[TMP14]], float [[TMP10]], i32 4
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x float> [[TMP15]], float [[TMP9]], i32 5
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x float> [[TMP16]], float [[TMP5]], i32 6
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x float> [[TMP17]], float [[TMP4]], i32 7
	; CHECK-NEXT: [[TMP19:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP18]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP19]], [[CONV]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; CHECK-NEXT: store float [[OP_EXTRA1]], float* @res, align 4			; CHECK-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
	; CHECK-NEXT: ret float [[OP_EXTRA1]]			; CHECK-NEXT: ret float [[OP_EXTRA1]]
	;			;
	; THRESHOLD-LABEL: @baz(			; THRESHOLD-LABEL: @baz(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3			; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr to <2 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr to <2 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1			; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; THRESHOLD-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8			; THRESHOLD-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
	; THRESHOLD-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8			; THRESHOLD-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
	; THRESHOLD-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]			; THRESHOLD-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
	; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0			; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
	; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1			; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
	; THRESHOLD-NEXT: [[TMP11:%.*]] = insertelement <8 x float> poison, float [[TMP10]], i32 0			; THRESHOLD-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP3]], <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 1, i32 0, i32 3, i32 2>
	; THRESHOLD-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[TMP9]], i32 1			; THRESHOLD-NEXT: [[TMP12:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP11]])
	; THRESHOLD-NEXT: [[TMP13:%.*]] = insertelement <8 x float> [[TMP12]], float [[TMP5]], i32 2			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP12]], [[CONV]]
	; THRESHOLD-NEXT: [[TMP14:%.*]] = insertelement <8 x float> [[TMP13]], float [[TMP4]], i32 3
	; THRESHOLD-NEXT: [[TMP15:%.*]] = insertelement <8 x float> [[TMP14]], float [[TMP10]], i32 4
	; THRESHOLD-NEXT: [[TMP16:%.*]] = insertelement <8 x float> [[TMP15]], float [[TMP9]], i32 5
	; THRESHOLD-NEXT: [[TMP17:%.*]] = insertelement <8 x float> [[TMP16]], float [[TMP5]], i32 6
	; THRESHOLD-NEXT: [[TMP18:%.*]] = insertelement <8 x float> [[TMP17]], float [[TMP4]], i32 7
	; THRESHOLD-NEXT: [[TMP19:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP18]])
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP19]], [[CONV]]
	; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; THRESHOLD-NEXT: store float [[OP_EXTRA1]], float* @res, align 4			; THRESHOLD-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
	; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	%mul = mul nsw i32 %0, 3			%mul = mul nsw i32 %0, 3
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	▲ Show 20 Lines • Show All 1,235 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

	Show First 20 Lines • Show All 898 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]			; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
	; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]			; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]
	; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP4]], i32 3, i32 4			; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP4]], i32 3, i32 4
	; AVX2-NEXT: store i32 [[TMP14]], i32* @var, align 8			; AVX2-NEXT: store i32 [[TMP14]], i32* @var, align 8
	; AVX2-NEXT: ret i32 [[OP_EXTRA1]]			; AVX2-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	; THRESH-LABEL: @maxi8_mutiple_uses(			; THRESH-LABEL: @maxi8_mutiple_uses(
	; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0			; THRESH-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1			; THRESH-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; THRESH-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; THRESH-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; THRESH-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; THRESH-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP3]])
	; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; THRESH-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP6]], [[TMP4]]
	; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])			; THRESH-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP6]], i32 [[TMP4]]
	; THRESH-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP6]]			; THRESH-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <2 x i32> <i32 undef, i32 0>
	; THRESH-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP6]]			; THRESH-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[TMP8]], i32 0
	; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP10]], i32 0			; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP2]], i32 [[TMP5]], i32 0
	; THRESH-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP3]], i32 1			; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP10]], [[TMP11]]
	; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0			; THRESH-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP10]], <2 x i32> [[TMP11]]
	; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP4]], i32 1			; THRESH-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
	; THRESH-NEXT: [[TMP15:%.*]] = icmp sgt <2 x i32> [[TMP12]], [[TMP14]]			; THRESH-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
	; THRESH-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP15]], <2 x i32> [[TMP12]], <2 x i32> [[TMP14]]			; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
	; THRESH-NEXT: [[TMP17:%.*]] = extractelement <2 x i32> [[TMP16]], i32 0			; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP14]], i32 [[TMP15]]
	; THRESH-NEXT: [[TMP18:%.*]] = extractelement <2 x i32> [[TMP16]], i32 1			; THRESH-NEXT: [[TMP16:%.*]] = extractelement <2 x i1> [[TMP12]], i32 1
	; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]			; THRESH-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 3, i32 4
	; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP17]], i32 [[TMP18]]			; THRESH-NEXT: store i32 [[TMP17]], i32* @var, align 8
	; THRESH-NEXT: [[TMP19:%.*]] = extractelement <2 x i1> [[TMP15]], i32 1
	; THRESH-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 3, i32 4
	; THRESH-NEXT: store i32 [[TMP20]], i32* @var, align 8
	; THRESH-NEXT: ret i32 [[OP_EXTRA1]]			; THRESH-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = icmp sgt i32 %2, %3			%4 = icmp sgt i32 %2, %3
	%5 = select i1 %4, i32 %2, i32 %3			%5 = select i1 %4, i32 %2, i32 %3
	%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	%7 = icmp sgt i32 %5, %6			%7 = icmp sgt i32 %5, %6
	▲ Show 20 Lines • Show All 535 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[T10:%.*]] = sub nsw i32 undef, [[T9]]			; SSE-NEXT: [[T10:%.*]] = sub nsw i32 undef, [[T9]]
	; SSE-NEXT: [[T11:%.*]] = call i32 @llvm.umin.i32(i32 [[T8]], i32 [[T10]])			; SSE-NEXT: [[T11:%.*]] = call i32 @llvm.umin.i32(i32 [[T8]], i32 [[T10]])
	; SSE-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef			; SSE-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
	; SSE-NEXT: [[T13:%.*]] = call i32 @llvm.umin.i32(i32 [[T11]], i32 [[T12]])			; SSE-NEXT: [[T13:%.*]] = call i32 @llvm.umin.i32(i32 [[T11]], i32 [[T12]])
	; SSE-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[T13]], i32 93)			; SSE-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[T13]], i32 93)
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @PR49730(			; AVX-LABEL: @PR49730(
	; AVX-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)			; AVX-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
	; AVX-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]			; AVX-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
	; AVX-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef			; AVX-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
	; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])			; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
	; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])			; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])
	; AVX-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)			; AVX-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)
	; AVX-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)			; AVX-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @PR49730(			; AVX2-LABEL: @PR49730(
	; AVX2-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)			; AVX2-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
	; AVX2-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]			; AVX2-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
	; AVX2-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef			; AVX2-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
	; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
	; AVX2-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])			; AVX2-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])
	; AVX2-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)			; AVX2-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)
	; AVX2-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)			; AVX2-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; THRESH-LABEL: @PR49730(			; THRESH-LABEL: @PR49730(
	; THRESH-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)			; THRESH-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
	; THRESH-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]			; THRESH-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
	; THRESH-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef			; THRESH-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
	; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])			; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
	; THRESH-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])			; THRESH-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])
	; THRESH-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)			; THRESH-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)
	; THRESH-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)			; THRESH-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
	; THRESH-NEXT: ret void			; THRESH-NEXT: ret void
	;			;
	%t = call i32 @llvm.smin.i32(i32 undef, i32 2)			%t = call i32 @llvm.smin.i32(i32 undef, i32 2)
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]		; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]
; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]		; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]
; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[S0]], i32 0		; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[S0]], i32 0
; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1		; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1
; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2		; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2
; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3		; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3
; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0		; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0
; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1		; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1
; MINTREESIZE-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0		; MINTREESIZE-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[Q1]], i32 1
; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2		; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2
; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3		; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0		; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[Q3]], i32 1
; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]		; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]		; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0		; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q5]], i32 1		; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[Q5]], i32 1
; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]		; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0		; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0
; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q5]], i32 1		; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q5]], i32 1
; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]		; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])		; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
; MINTREESIZE-NEXT: ret <4 x float> undef		; MINTREESIZE-NEXT: ret <4 x float> undef
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[TMP1]], zeroinitializer
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = icmp ne <2 x i32> [[TMP3]], zeroinitializer
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[TMP5:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[TMP6:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP7:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[TMP5]], <2 x float> [[TMP6]]
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = select <2 x i1> [[TMP4]], <2 x float> [[TMP8]], <2 x float> [[TMP9]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x float> [[RD1]]		; CHECK-NEXT: ret <4 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; NOTHRESHOLD-LABEL: @reschedule_extract(		; NOTHRESHOLD-LABEL: @reschedule_extract(
; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; MINTREESIZE-LABEL: @reschedule_extract(		; MINTREESIZE-LABEL: @reschedule_extract(
; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2		; MINTREESIZE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1		; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0		; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2		; MINTREESIZE-NEXT: ret <4 x float> [[TMP2]]
; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1
; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1
; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1
; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%v0 = insertelement <4 x float> poison, float %c0, i32 0		%v0 = insertelement <4 x float> poison, float %c0, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
Show All 16 Lines
; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; NOTHRESHOLD-LABEL: @take_credit(		; NOTHRESHOLD-LABEL: @take_credit(
; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; MINTREESIZE-LABEL: @take_credit(		; MINTREESIZE-LABEL: @take_credit(
; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2		; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1		; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0		; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2		; MINTREESIZE-NEXT: ret <4 x float> [[TMP5]]
; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
Show All 40 Lines
; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]		; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]
;		;
; NOTHRESHOLD-LABEL: @_vadd256(		; NOTHRESHOLD-LABEL: @_vadd256(
; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]		; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]
;		;
; MINTREESIZE-LABEL: @_vadd256(		; MINTREESIZE-LABEL: @_vadd256(
; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <8 x float> [[B:%.]], i32 7		; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <2 x i32> <i32 0, i32 8>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[B]], i32 6		; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 1, i32 9>
; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[B]], i32 5		; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 2, i32 10>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[B]], i32 4		; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 3, i32 11>
; MINTREESIZE-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[B]], i32 3		; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 4, i32 12>
; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[B]], i32 2		; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 5, i32 13>
; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[B]], i32 1		; MINTREESIZE-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 6, i32 14>
; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[B]], i32 0		; MINTREESIZE-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 7, i32 15>
; MINTREESIZE-NEXT: [[TMP9:%.]] = extractelement <8 x float> [[A:%.]], i32 7		; MINTREESIZE-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP10:%.*]] = extractelement <8 x float> [[A]], i32 6		; MINTREESIZE-NEXT: ret <8 x float> [[TMP9]]
; MINTREESIZE-NEXT: [[TMP11:%.*]] = extractelement <8 x float> [[A]], i32 5
; MINTREESIZE-NEXT: [[TMP12:%.*]] = extractelement <8 x float> [[A]], i32 4
; MINTREESIZE-NEXT: [[TMP13:%.*]] = extractelement <8 x float> [[A]], i32 3
; MINTREESIZE-NEXT: [[TMP14:%.*]] = extractelement <8 x float> [[A]], i32 2
; MINTREESIZE-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[A]], i32 1
; MINTREESIZE-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[A]], i32 0
; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> poison, float [[TMP16]], i32 0
; MINTREESIZE-NEXT: [[TMP18:%.*]] = insertelement <2 x float> [[TMP17]], float [[TMP8]], i32 1
; MINTREESIZE-NEXT: [[TMP19:%.*]] = insertelement <2 x float> poison, float [[TMP15]], i32 0
; MINTREESIZE-NEXT: [[TMP20:%.*]] = insertelement <2 x float> [[TMP19]], float [[TMP7]], i32 1
; MINTREESIZE-NEXT: [[TMP21:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i32 0
; MINTREESIZE-NEXT: [[TMP22:%.*]] = insertelement <2 x float> [[TMP21]], float [[TMP6]], i32 1
; MINTREESIZE-NEXT: [[TMP23:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
; MINTREESIZE-NEXT: [[TMP24:%.*]] = insertelement <2 x float> [[TMP23]], float [[TMP5]], i32 1
; MINTREESIZE-NEXT: [[TMP25:%.*]] = insertelement <2 x float> poison, float [[TMP12]], i32 0
; MINTREESIZE-NEXT: [[TMP26:%.*]] = insertelement <2 x float> [[TMP25]], float [[TMP4]], i32 1
; MINTREESIZE-NEXT: [[TMP27:%.*]] = insertelement <2 x float> poison, float [[TMP11]], i32 0
; MINTREESIZE-NEXT: [[TMP28:%.*]] = insertelement <2 x float> [[TMP27]], float [[TMP3]], i32 1
; MINTREESIZE-NEXT: [[TMP29:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i32 0
; MINTREESIZE-NEXT: [[TMP30:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP2]], i32 1
; MINTREESIZE-NEXT: [[TMP31:%.*]] = insertelement <2 x float> poison, float [[TMP9]], i32 0
; MINTREESIZE-NEXT: [[TMP32:%.*]] = insertelement <2 x float> [[TMP31]], float [[TMP1]], i32 1
; MINTREESIZE-NEXT: [[TMP33:%.*]] = fadd <8 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: ret <8 x float> [[TMP33]]
;		;
%vecext = extractelement <8 x float> %a, i32 0		%vecext = extractelement <8 x float> %a, i32 0
%vecext1 = extractelement <8 x float> %b, i32 0		%vecext1 = extractelement <8 x float> %b, i32 0
%add = fadd float %vecext, %vecext1		%add = fadd float %vecext, %vecext1
%vecext2 = extractelement <8 x float> %a, i32 1		%vecext2 = extractelement <8 x float> %a, i32 1
%vecext3 = extractelement <8 x float> %b, i32 1		%vecext3 = extractelement <8 x float> %b, i32 1
%add4 = fadd float %vecext2, %vecext3		%add4 = fadd float %vecext2, %vecext3
%vecext5 = extractelement <8 x float> %a, i32 2		%vecext5 = extractelement <8 x float> %a, i32 2
Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

Show All 38 Lines	;
ret <4 x float> %rd		ret <4 x float> %rd
}		}

define <8 x float> @simple_select2(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <8 x float> @simple_select2(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select2(		; CHECK-LABEL: @simple_select2(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <8 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 undef, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <8 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 undef, i32 3>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <8 x float> undef, <8 x float> [[TMP3]], <8 x i32> <i32 8, i32 1, i32 10, i32 3, i32 12, i32 5, i32 6, i32 15>		; CHECK-NEXT: ret <8 x float> [[TMP3]]
; CHECK-NEXT: ret <8 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]		; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]
; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]		; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]
; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[S0]], i32 0		; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[S0]], i32 0
; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1		; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1
; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2		; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2
; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3		; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3
; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0		; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0
; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1		; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1
; MINTREESIZE-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0		; MINTREESIZE-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[Q1]], i32 1
; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2		; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2
; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3		; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0		; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[RD]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[Q3]], i32 1
; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]		; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]		; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0		; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q5]], i32 1		; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[Q5]], i32 1
; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]		; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0		; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0
; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q5]], i32 1		; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q5]], i32 1
; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]		; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])		; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
; MINTREESIZE-NEXT: ret <4 x float> undef		; MINTREESIZE-NEXT: ret <4 x float> undef
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[TMP1]], zeroinitializer
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = icmp ne <2 x i32> [[TMP3]], zeroinitializer
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[TMP5:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[TMP6:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP7:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[TMP5]], <2 x float> [[TMP6]]
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = select <2 x i1> [[TMP4]], <2 x float> [[TMP8]], <2 x float> [[TMP9]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x float> [[RD1]]		; CHECK-NEXT: ret <4 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; NOTHRESHOLD-LABEL: @reschedule_extract(		; NOTHRESHOLD-LABEL: @reschedule_extract(
; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; MINTREESIZE-LABEL: @reschedule_extract(		; MINTREESIZE-LABEL: @reschedule_extract(
; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2		; MINTREESIZE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1		; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0		; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2		; MINTREESIZE-NEXT: ret <4 x float> [[TMP2]]
; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1
; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1
; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1
; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%v0 = insertelement <4 x float> undef, float %c0, i32 0		%v0 = insertelement <4 x float> undef, float %c0, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
Show All 16 Lines
; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; NOTHRESHOLD-LABEL: @take_credit(		; NOTHRESHOLD-LABEL: @take_credit(
; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]		; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
;		;
; MINTREESIZE-LABEL: @take_credit(		; MINTREESIZE-LABEL: @take_credit(
; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2		; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1		; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0		; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3		; MINTREESIZE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2		; MINTREESIZE-NEXT: ret <4 x float> [[TMP5]]
; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
Show All 40 Lines
; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]		; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]
;		;
; NOTHRESHOLD-LABEL: @_vadd256(		; NOTHRESHOLD-LABEL: @_vadd256(
; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]		; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]
;		;
; MINTREESIZE-LABEL: @_vadd256(		; MINTREESIZE-LABEL: @_vadd256(
; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <8 x float> [[B:%.]], i32 7		; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <2 x i32> <i32 0, i32 8>
; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[B]], i32 6		; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 1, i32 9>
; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[B]], i32 5		; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 2, i32 10>
; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[B]], i32 4		; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 3, i32 11>
; MINTREESIZE-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[B]], i32 3		; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 4, i32 12>
; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[B]], i32 2		; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 5, i32 13>
; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[B]], i32 1		; MINTREESIZE-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 6, i32 14>
; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[B]], i32 0		; MINTREESIZE-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 7, i32 15>
; MINTREESIZE-NEXT: [[TMP9:%.]] = extractelement <8 x float> [[A:%.]], i32 7		; MINTREESIZE-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: [[TMP10:%.*]] = extractelement <8 x float> [[A]], i32 6		; MINTREESIZE-NEXT: ret <8 x float> [[TMP9]]
; MINTREESIZE-NEXT: [[TMP11:%.*]] = extractelement <8 x float> [[A]], i32 5
; MINTREESIZE-NEXT: [[TMP12:%.*]] = extractelement <8 x float> [[A]], i32 4
; MINTREESIZE-NEXT: [[TMP13:%.*]] = extractelement <8 x float> [[A]], i32 3
; MINTREESIZE-NEXT: [[TMP14:%.*]] = extractelement <8 x float> [[A]], i32 2
; MINTREESIZE-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[A]], i32 1
; MINTREESIZE-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[A]], i32 0
; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> poison, float [[TMP16]], i32 0
; MINTREESIZE-NEXT: [[TMP18:%.*]] = insertelement <2 x float> [[TMP17]], float [[TMP8]], i32 1
; MINTREESIZE-NEXT: [[TMP19:%.*]] = insertelement <2 x float> poison, float [[TMP15]], i32 0
; MINTREESIZE-NEXT: [[TMP20:%.*]] = insertelement <2 x float> [[TMP19]], float [[TMP7]], i32 1
; MINTREESIZE-NEXT: [[TMP21:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i32 0
; MINTREESIZE-NEXT: [[TMP22:%.*]] = insertelement <2 x float> [[TMP21]], float [[TMP6]], i32 1
; MINTREESIZE-NEXT: [[TMP23:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
; MINTREESIZE-NEXT: [[TMP24:%.*]] = insertelement <2 x float> [[TMP23]], float [[TMP5]], i32 1
; MINTREESIZE-NEXT: [[TMP25:%.*]] = insertelement <2 x float> poison, float [[TMP12]], i32 0
; MINTREESIZE-NEXT: [[TMP26:%.*]] = insertelement <2 x float> [[TMP25]], float [[TMP4]], i32 1
; MINTREESIZE-NEXT: [[TMP27:%.*]] = insertelement <2 x float> poison, float [[TMP11]], i32 0
; MINTREESIZE-NEXT: [[TMP28:%.*]] = insertelement <2 x float> [[TMP27]], float [[TMP3]], i32 1
; MINTREESIZE-NEXT: [[TMP29:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i32 0
; MINTREESIZE-NEXT: [[TMP30:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP2]], i32 1
; MINTREESIZE-NEXT: [[TMP31:%.*]] = insertelement <2 x float> poison, float [[TMP9]], i32 0
; MINTREESIZE-NEXT: [[TMP32:%.*]] = insertelement <2 x float> [[TMP31]], float [[TMP1]], i32 1
; MINTREESIZE-NEXT: [[TMP33:%.*]] = fadd <8 x float> [[A]], [[B]]
; MINTREESIZE-NEXT: ret <8 x float> [[TMP33]]
;		;
%vecext = extractelement <8 x float> %a, i32 0		%vecext = extractelement <8 x float> %a, i32 0
%vecext1 = extractelement <8 x float> %b, i32 0		%vecext1 = extractelement <8 x float> %b, i32 0
%add = fadd float %vecext, %vecext1		%add = fadd float %vecext, %vecext1
%vecext2 = extractelement <8 x float> %a, i32 1		%vecext2 = extractelement <8 x float> %a, i32 1
%vecext3 = extractelement <8 x float> %b, i32 1		%vecext3 = extractelement <8 x float> %b, i32 1
%add4 = fadd float %vecext2, %vecext3		%add4 = fadd float %vecext2, %vecext3
%vecext5 = extractelement <8 x float> %a, i32 2		%vecext5 = extractelement <8 x float> %a, i32 2
Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	Show All 9 Lines
	; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0			; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[Y:%.]] = getelementptr inbounds [[STRUCT_SW]], %struct.sw [[V]], i64 0, i32 1			; CHECK-NEXT: [[Y:%.]] = getelementptr inbounds [[STRUCT_SW]], %struct.sw [[V]], i64 0, i32 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[X]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[X]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 16			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 16
	; CHECK-NEXT: [[TMP3:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP3:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> <float poison, float undef, float poison, float poison>, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> <float poison, float undef, float poison, float poison>, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <4 x i32> <i32 3, i32 2, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP7]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP3]], i32 2			; CHECK-NEXT: [[TMP9:%.*]] = fmul <4 x float> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <4 x float> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x float> undef, [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> poison, [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[TMP10]], undef
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <4 x float> [[TMP11]], poison			; CHECK-NEXT: [[TMP12:%.*]] = fadd <4 x float> [[TMP11]], undef
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <4 x float> [[TMP12]], poison			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP13]], i32 0			; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP13]], i32 0
	; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP14]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP12]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP13]], i32 1			; CHECK-NEXT: [[VEC2:%.*]] = insertelement <2 x float> [[VEC1]], float [[TMP14]], i32 1
	; CHECK-NEXT: [[VEC2:%.*]] = insertelement <2 x float> [[VEC1]], float [[TMP15]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP12]], i32 2
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x float> [[TMP13]], i32 2			; CHECK-NEXT: [[VEC3:%.*]] = insertelement <2 x float> undef, float [[TMP15]], i32 0
	; CHECK-NEXT: [[VEC3:%.*]] = insertelement <2 x float> undef, float [[TMP16]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x float> [[TMP12]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[TMP13]], i32 3			; CHECK-NEXT: [[VEC4:%.*]] = insertelement <2 x float> [[VEC3]], float [[TMP16]], i32 1
	; CHECK-NEXT: [[VEC4:%.*]] = insertelement <2 x float> [[VEC3]], float [[TMP17]], i32 1
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VEC2]], 0			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VEC2]], 0
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[VEC4]], 1			; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[VEC4]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	;			;
	entry:			entry:
	%0 = load float, float* undef, align 4			%0 = load float, float* undef, align 4
	%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0			%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0
	%1 = load float, float* %x, align 16			%1 = load float, float* %x, align 16
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, <4 x i32> <i32 4, i32 1, i32 6, i32 7>
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4			%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	%cond = select i1 %cmp, i32 8, i32 0			%cond = select i1 %cmp, i32 8, i32 0
	store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4			store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX65:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX65:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX66:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX66:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP26:%.]], <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP26:%.]], <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = phi <4 x i32> [ poison, [[ENTRY]] ], [ [[TMP26]], [[FOR_INC]] ]			; CHECK-NEXT: [[TMP2:%.*]] = phi <4 x i32> [ undef, [[ENTRY]] ], [ [[TMP26]], [[FOR_INC]] ]
	; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]			; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {			define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {
	; CHECK-LABEL: @jumbled-load-multiuses(			; CHECK-LABEL: @jumbled-load-multiuses(
	; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0			; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 0, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i32> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP2]], i32 2
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[TMP9]], i32 3
	; CHECK-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP2]], [[TMP10]]
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0			; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1			; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2			; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3			; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP12]], align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3
	%load.2 = load i32, i32* %gep.1, align 4			%load.2 = load i32, i32* %gep.1, align 4
	%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 1			%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 1
	%load.3 = load i32, i32* %gep.2, align 4			%load.3 = load i32, i32* %gep.2, align 4
	Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

	Show All 36 Lines
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP10]], i32 3			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP10]], i32 3
	; CHECK-NEXT: store float [[TMP13]], float* @e, align 4			; CHECK-NEXT: store float [[TMP13]], float* @e, align 4
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP10]], i32 1
	; CHECK-NEXT: store float [[TMP14]], float* @f, align 4			; CHECK-NEXT: store float [[TMP14]], float* @f, align 4
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14
	; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15			; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15
	; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 @a, align 4			; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 @a, align 4
	; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP15]] to float			; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP15]] to float
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <4 x float> [[SHUFFLE]], <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, <4 x i32> <i32 undef, i32 5, i32 0, i32 7>
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x float> [[TMP16]], float [[CONV19]], i32 0
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP17]], i32 2			; CHECK-NEXT: [[TMP18:%.*]] = fsub <4 x float> [[TMP10]], [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.*]] = fsub <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP19:%.*]] = fadd <4 x float> [[TMP10]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = fadd <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> [[TMP19]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x float> [[TMP19]], <4 x float> [[TMP20]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; CHECK-NEXT: [[TMP21:%.*]] = fptosi <4 x float> [[TMP20]] to <4 x i32>
	; CHECK-NEXT: [[TMP22:%.*]] = fptosi <4 x float> [[TMP21]] to <4 x i32>			; CHECK-NEXT: [[TMP22:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*
	; CHECK-NEXT: [[TMP23:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*			; CHECK-NEXT: store <4 x i32> [[TMP21]], <4 x i32>* [[TMP22]], align 4
	; CHECK-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP23]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* @b, align 8			%0 = load i32, i32* @b, align 8
	%arrayidx = getelementptr inbounds i32, i32* %0, i64 4			%arrayidx = getelementptr inbounds i32, i32* %0, i64 4
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%arrayidx1 = getelementptr inbounds i32, i32* %0, i64 12			%arrayidx1 = getelementptr inbounds i32, i32* %0, i64 12
	%2 = load i32, i32* %arrayidx1, align 4			%2 = load i32, i32* %arrayidx1, align 4
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

Show First 20 Lines • Show All 554 Lines • ▼ Show 20 Lines	;
ret i1 %cmp.i185		ret i1 %cmp.i185
}		}


define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {		define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {
; CHECK-LABEL: @foo(		; CHECK-LABEL: @foo(
; CHECK-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0		; CHECK-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0
; CHECK-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]		; CHECK-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; CHECK-NEXT: [[FM:%.]] = fmul float [[A:%.]], [[SUB14_I167]]		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0
; CHECK-NEXT: [[SUB25_I168:%.]] = fsub float [[FM]], [[B:%.]]		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1
; CHECK-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[VEC]], <4 x float> poison, <2 x i32> <i32 undef, i32 1>
; CHECK-NEXT: [[ADD36_I173:%.*]] = fadd float [[SUB25_I168]], 1.000000e+01		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[SUB14_I167]], i32 0
; CHECK-NEXT: [[MUL72_I179:%.]] = fmul float [[C:%.]], [[VECEXT_I276_I169]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[ADD78_I180:%.*]] = fsub float [[MUL72_I179]], 3.000000e+01		; CHECK-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0
; CHECK-NEXT: [[ADD79_I181:%.*]] = fadd float 2.000000e+00, [[ADD78_I180]]		; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]
; CHECK-NEXT: [[MUL123_I184:%.*]] = fmul float [[ADD36_I173]], [[ADD79_I181]]		; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>
		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
		; CHECK-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]
; CHECK-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00		; CHECK-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; CHECK-NEXT: ret i1 [[CMP_I185]]		; CHECK-NEXT: ret i1 [[CMP_I185]]
;		;
%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0		%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0
%sub14.i167 = fsub float undef, %vecext.i291.i166		%sub14.i167 = fsub float undef, %vecext.i291.i166
%fm = fmul float %a, %sub14.i167		%fm = fmul float %a, %sub14.i167
%sub25.i168 = fsub float %fm, %b		%sub25.i168 = fsub float %fm, %b
%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1		%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1
Show All 10 Lines
define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {		define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {
; CHECK-LABEL: @ChecksExtractScores_different_vectors(		; CHECK-LABEL: @ChecksExtractScores_different_vectors(
; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0		; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1		; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4		; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4		; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4		; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4		; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[LOADVEC2]], <2 x double> [[LOADVEC3]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], [[TMP2]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRA1]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRB0]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[LOADVEC]], <2 x double> [[LOADVEC4]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], [[TMP2]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], [[SHUFFLE]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1
; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP2]]
; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], [[SHUFFLE]]
; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0		; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1		; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*		; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[SIDX0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8		; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%idx0 = getelementptr inbounds double, double* %array, i64 0		%idx0 = getelementptr inbounds double, double* %array, i64 0
%idx1 = getelementptr inbounds double, double* %array, i64 1		%idx1 = getelementptr inbounds double, double* %array, i64 1
%loadA0 = load double, double* %idx0, align 4		%loadA0 = load double, double* %idx0, align 4
%loadA1 = load double, double* %idx1, align 4		%loadA1 = load double, double* %idx1, align 4

%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4		%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4
Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s

	define i32 @bar() local_unnamed_addr {			define i32 @bar() local_unnamed_addr {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_3]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> <i32 poison, i32 undef, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 undef, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[SUB102_3]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[SUB102_1]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[SUB102_1]], i32 2
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD94_1]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD94_1]], i32 7
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[ADD78_1]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[ADD78_1]], i32 8
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[SUB86_1]], i32 4			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[SUB86_1]], i32 9
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[ADD78_2]], i32 5			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[ADD78_2]], i32 11
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> poison, <16 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 2, i32 3, i32 4, i32 undef, i32 5, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 7, i32 2, i32 11, i32 undef, i32 undef, i32 undef, i32 undef, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> poison, i32 [[SUB86_1]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = add nsw <16 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD78_1]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = sub nsw <16 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[ADD94_1]], i32 2			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <16 x i32> [[TMP7]], <16 x i32> [[TMP8]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_1]], i32 3			; CHECK-NEXT: [[TMP10:%.*]] = lshr <16 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[ADD78_2]], i32 4			; CHECK-NEXT: [[TMP11:%.*]] = and <16 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[SUB102_3]], i32 5			; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <16 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP11]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 2, i32 3, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 5>			; CHECK-NEXT: [[TMP13:%.*]] = add <16 x i32> [[TMP12]], [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP14:%.*]] = xor <16 x i32> [[TMP13]], [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP14]])
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <16 x i32> [[TMP12]], <16 x i32> [[TMP13]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>			; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP15]], 16
	; CHECK-NEXT: [[TMP15:%.*]] = lshr <16 x i32> [[TMP14]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK-NEXT: [[TMP16:%.*]] = and <16 x i32> [[TMP15]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[TMP17:%.*]] = mul nuw <16 x i32> [[TMP16]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
	; CHECK-NEXT: [[TMP18:%.*]] = add <16 x i32> [[TMP17]], [[TMP14]]
	; CHECK-NEXT: [[TMP19:%.*]] = xor <16 x i32> [[TMP18]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP19]])
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP20]], 16
	; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]			; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]
	; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1			; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1
	; CHECK-NEXT: ret i32 [[SHR120]]			; CHECK-NEXT: ret i32 [[SHR120]]
	;			;
	entry:			entry:
	%add103 = add nsw i32 undef, undef			%add103 = add nsw i32 undef, undef
	%sub104 = sub nsw i32 undef, undef			%sub104 = sub nsw i32 undef, undef
	%add105 = add nsw i32 undef, undef			%add105 = add nsw i32 undef, undef
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

	Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @gather_sequence_crash(			; CHECK-LABEL: @gather_sequence_crash(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br i1 [[C_1:%.]], label [[BB16:%.]], label [[BB6:%.*]]			; CHECK-NEXT: br i1 [[C_1:%.]], label [[BB16:%.]], label [[BB6:%.*]]
	; CHECK: bb6:			; CHECK: bb6:
	; CHECK-NEXT: [[TMP:%.]] = getelementptr inbounds float, float [[ARG1:%.*]], i32 4			; CHECK-NEXT: [[TMP:%.]] = getelementptr inbounds float, float [[ARG1:%.*]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[ARG1]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[ARG1]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[ARG1]], i32 3			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[ARG1]], i32 3
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[ARG1]], i32 6			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[ARG1]], i32 6
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x float> <float poison, float poison, float poison, float 0.000000e+00>, float [[ARG2:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = shufflevector <2 x float> [[ARG:%.]], <2 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[ARG:%.]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> <float poison, float poison, float poison, float 0.000000e+00>, <4 x i32> <i32 undef, i32 1, i32 2, i32 7>
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x float> [[TMP1]], float [[ARG2:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP3]], <4 x float>* [[TMP4]], align 4			; CHECK-NEXT: store <4 x float> [[TMP3]], <4 x float>* [[TMP4]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb16:			; CHECK: bb16:
	; CHECK-NEXT: br label [[BB17:%.*]]			; CHECK-NEXT: br label [[BB17:%.*]]
	; CHECK: bb17:			; CHECK: bb17:
	; CHECK-NEXT: br label [[BB18:%.*]]			; CHECK-NEXT: br label [[BB18:%.*]]
	▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

	Show All 30 Lines
	define void @vecload_vs_broadcast(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast(			; CHECK-LABEL: @vecload_vs_broadcast(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[P]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP1]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4			; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]			; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; CHECK-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x float> [[TMP9]], float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4			; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]			; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]
	; CHECK-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4			; CHECK-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i32 3			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i32 3
	▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @get_block(i32 %y_pos) local_unnamed_addr #0 {			define void @get_block(i32 %y_pos) local_unnamed_addr #0 {
	; CHECK-LABEL: @get_block(			; CHECK-LABEL: @get_block(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]
	; CHECK: land.lhs.true:			; CHECK: land.lhs.true:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef			; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef
	; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2			; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[SHR15]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[SUB14]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP0]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[SHUFFLE]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[TMP3]], poison			; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[TMP3]], undef
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[TMP3]], <4 x i32> poison			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[TMP3]], <4 x i32> undef
	; CHECK-NEXT: [[TMP6:%.*]] = sext <4 x i32> [[TMP5]] to <4 x i64>			; CHECK-NEXT: [[TMP6:%.*]] = sext <4 x i32> [[TMP5]] to <4 x i64>
	; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i64> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i64> [[TMP6]] to <4 x i32>
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64			; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64			; CHECK-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64
	; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP11]]			; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP11]]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines

	define float @foo3(float* nocapture readonly %A) #0 {			define float @foo3(float* nocapture readonly %A) #0 {
	; CHECK-LABEL: @foo3(			; CHECK-LABEL: @foo3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <2 x i32> <i32 undef, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP18:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x float> [ [[TMP4]], [[ENTRY]] ], [ [[TMP11:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP7]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP8:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP9:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*
	; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4			; CHECK-NEXT: [[TMP11]] = load <2 x float>, <2 x float>* [[TMP10]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> poison, float [[TMP13]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x float> poison, float [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <4 x float> [[TMP14]], <4 x float> [[TMP15]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[TMP17]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>			; CHECK-NEXT: [[TMP17:%.*]] = fmul <4 x float> [[TMP16]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
	; CHECK-NEXT: [[TMP19]] = fadd <4 x float> [[TMP6]], [[TMP18]]			; CHECK-NEXT: [[TMP18]] = fadd <4 x float> [[TMP5]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP20]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP19]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP19]], i32 0			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP18]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP20]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP19]], i32 1			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP18]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP21]]
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP19]], i32 2			; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP18]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP22]]
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x float> [[TMP19]], i32 3			; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP18]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP23]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; Make sure the order of phi nodes of different types does not prevent			; Make sure the order of phi nodes of different types does not prevent
	; vectorization of same typed phi nodes.			; vectorization of same typed phi nodes.
	define float @sort_phi_type(float* nocapture readonly %A) {			define float @sort_phi_type(float* nocapture readonly %A) {
	; CHECK-LABEL: @sort_phi_type(			; CHECK-LABEL: @sort_phi_type(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP9:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP2:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2]] = fmul <4 x float> [[TMP1]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP6]], float [[TMP7]], i32 3
	; CHECK-NEXT: [[TMP9]] = fmul <4 x float> [[TMP8]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP12]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP5]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP13]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP6]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%Y = phi float [ 1.000000e+01, %entry ], [ %mul10, %for.body ]			%Y = phi float [ 1.000000e+01, %entry ], [ %mul10, %for.body ]
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1			; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1
	; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5			; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
	; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
	; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4			; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
	; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
	; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1			; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
	; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> poison, <2 x i32> <i32 1, i32 undef>
	; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0			; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[ADD]], i32 1
	; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1			; SSE-NEXT: [[TMP7:%.*]] = shl <2 x i64> [[TMP6]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>			; SSE-NEXT: [[TMP8:%.*]] = and <2 x i64> [[TMP7]], <i64 20, i64 20>
	; SSE-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>
	; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0			; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
	; SSE-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*			; SSE-NEXT: [[TMP9:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
	; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1			; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP9]], align 1
	; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; SSE-NEXT: [[TMP10:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
	; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]			; SSE-NEXT: [[TMP11:%.*]] = add nuw nsw <2 x i64> [[TMP8]], [[TMP10]]
	; SSE-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*			; SSE-NEXT: [[TMP12:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
	; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1			; SSE-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* [[TMP12]], align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @pr35497(			; AVX-LABEL: @pr35497(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1			; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
	; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef			; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1			; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1
	; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5			; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4			; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
	; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
	; AVX-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1			; AVX-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> poison, <2 x i32> <i32 1, i32 undef>
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0			; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[ADD]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1			; AVX-NEXT: [[TMP7:%.*]] = shl <2 x i64> [[TMP6]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>			; AVX-NEXT: [[TMP8:%.*]] = and <2 x i64> [[TMP7]], <i64 20, i64 20>
	; AVX-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>
	; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0			; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
	; AVX-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*			; AVX-NEXT: [[TMP9:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
	; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1			; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP9]], align 1
	; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; AVX-NEXT: [[TMP10:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
	; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]			; AVX-NEXT: [[TMP11:%.*]] = add nuw nsw <2 x i64> [[TMP8]], [[TMP10]]
	; AVX-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*			; AVX-NEXT: [[TMP12:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
	; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1			; AVX-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* [[TMP12]], align 1
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i64, i64* undef, align 1			%0 = load i64, i64* undef, align 1
	%and = shl i64 %0, 2			%and = shl i64 %0, 2
	%shl = and i64 %and, 20			%shl = and i64 %and, 20
	%add = add i64 undef, undef			%add = add i64 undef, undef
	store i64 %add, i64* undef, align 1			store i64 %add, i64* undef, align 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

	Show All 20 Lines
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4			; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8			; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4			; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @foo(			; AVX-LABEL: @foo(
	; AVX-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			; AVX-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16
	; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i32 0			; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[TMP1]], i32 0
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i32 1			; AVX-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[TMP2]], i32 1
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16			; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16
	%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s
	; These conversions should be vectorized by reviews.llvm.org/D57059			; These conversions should be vectorized by reviews.llvm.org/D57059

	define dso_local <4 x float> @foo(<4 x i32> %0) {			define dso_local <4 x float> @foo(<4 x i32> %0) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP0:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP0:%.]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = sitofp i32 [[TMP2]] to float			; CHECK-NEXT: [[TMP3:%.*]] = sitofp i32 [[TMP2]] to float
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP6]] to <2 x float>			; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP6]] to <2 x float>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x float> [[TMP9]]			; CHECK-NEXT: ret <4 x float> [[TMP9]]
	;			;
	%2 = extractelement <4 x i32> %0, i32 1			%2 = extractelement <4 x i32> %0, i32 1
	%3 = sitofp i32 %2 to float			%3 = sitofp i32 %2 to float
	%4 = insertelement <4 x float> undef, float %3, i32 0			%4 = insertelement <4 x float> undef, float %3, i32 0
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
}		}

; TODO: This is better than all-scalar and still safe,		; TODO: This is better than all-scalar and still safe,
; but we want this to be 2 reductions with glue		; but we want this to be 2 reductions with glue
; logic...or a wide reduction?		; logic...or a wide reduction?

define i1 @logical_and_icmp_clamp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp(		; CHECK-LABEL: @logical_and_icmp_clamp(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[X:%.]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 17, i32 17, i32 17, i32 17>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[X]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[X]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <8 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[TMP4]], 17		; CHECK-NEXT: [[TMP6:%.*]] = freeze <8 x i1> [[TMP5]]
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[TMP3]], 17		; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP6]])
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[TMP2]], 17		; CHECK-NEXT: ret i1 [[TMP7]]
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[TMP1]], 17
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[S4:%.*]] = select i1 [[TMP7]], i1 [[D0]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 9 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_cmp(		; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_cmp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[X1:%.]] = extractelement <4 x i32> [[X:%.]], i32 1
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42		; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42
; CHECK-NEXT: call void @use1(i1 [[C2]])		; CHECK-NEXT: call void @use1(i1 [[C2]])
; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> <i32 poison, i32 poison, i32 poison, i32 17>, <4 x i32> <i32 0, i32 1, i32 3, i32 7>
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> <i32 42, i32 42, i32 42, i32 poison>, <4 x i32> <i32 4, i32 5, i32 6, i32 0>
		; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17		; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17		; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17		; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false		; CHECK-NEXT: [[TMP4:%.*]] = freeze <4 x i1> [[TMP3]]
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP4]])
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false		; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP5]], i1 [[C2]], i1 false
; CHECK-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false		; CHECK-NEXT: [[S5:%.*]] = select i1 [[OP_EXTRA]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false		; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false		; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]		; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {		define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {
; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(		; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(
; CHECK-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[X:%.]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[Y:%.]], <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <8 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = freeze <8 x i1> [[TMP3]]
; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP4]])
; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1		; CHECK-NEXT: ret i1 [[TMP5]]
; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2
; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[X1]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[X2]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[X3]], i32 3
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[Y0]], i32 4
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[Y1]], i32 5
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[Y2]], i32 6
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[Y3]], i32 7
; CHECK-NEXT: [[TMP9:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], [[TMP8]]
; CHECK-NEXT: [[TMP10:%.*]] = freeze <8 x i1> [[TMP9]]
; CHECK-NEXT: [[TMP11:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP10]])
; CHECK-NEXT: ret i1 [[TMP11]]
;		;
%x0 = extractelement <8 x i32> %x, i32 0		%x0 = extractelement <8 x i32> %x, i32 0
%x1 = extractelement <8 x i32> %x, i32 1		%x1 = extractelement <8 x i32> %x, i32 1
%x2 = extractelement <8 x i32> %x, i32 2		%x2 = extractelement <8 x i32> %x, i32 2
%x3 = extractelement <8 x i32> %x, i32 3		%x3 = extractelement <8 x i32> %x, i32 3
%y0 = extractelement <8 x i32> %y, i32 0		%y0 = extractelement <8 x i32> %y, i32 0
%y1 = extractelement <8 x i32> %y, i32 1		%y1 = extractelement <8 x i32> %y, i32 1
%y2 = extractelement <8 x i32> %y, i32 2		%y2 = extractelement <8 x i32> %y, i32 2
Show All 13 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_partial(		; CHECK-LABEL: @logical_and_icmp_clamp_partial(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[X1:%.]] = extractelement <4 x i32> [[X:%.]], i32 1
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> <i32 poison, i32 poison, i32 poison, i32 17>, <4 x i32> <i32 0, i32 1, i32 2, i32 7>
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[X]], <4 x i32> <i32 42, i32 42, i32 42, i32 poison>, <4 x i32> <i32 4, i32 5, i32 6, i32 0>
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42		; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17		; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17		; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17		; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false		; CHECK-NEXT: [[TMP4:%.*]] = freeze <4 x i1> [[TMP3]]
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP4]])
; CHECK-NEXT: [[S4:%.*]] = select i1 [[S2]], i1 [[D0]], i1 false		; CHECK-NEXT: [[S5:%.*]] = select i1 [[TMP5]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false		; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false		; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]		; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

	define i1 @fcmp_lt_gt(double %a, double %b, double %c) {			define i1 @fcmp_lt_gt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt_gt(			; CHECK-LABEL: @fcmp_lt_gt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	Show All 32 Lines
	}			}

	define i1 @fcmp_lt(double %a, double %b, double %c) {			define i1 @fcmp_lt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt(			; CHECK-LABEL: @fcmp_lt(
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[MUL]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[MUL]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>			; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	Show All 10 Lines
	; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 1
	; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 2			; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 2
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 3
	; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 4			; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 4
	; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 5			; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 5
	; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6			; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6
	; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7			; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <8 x i16> [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> poison, i16 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = add <8 x i16> [[LD]], [[TMP0]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; CHECK-NEXT: [[TMP2:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]			; CHECK-NEXT: store <8 x i16> [[TMP1]], <8 x i16>* [[TMP2]], align 2
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* [[TMP3]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'Stores SLP vectorized with cost '			; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-20'			; YAML-NEXT: - Cost: '-22'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '4'			; YAML-NEXT: - TreeSize: '4'

	entry:			entry:
	%LD = load <8 x i16>, <8 x i16>* undef			%LD = load <8 x i16>, <8 x i16>* undef
	%V0 = extractelement <8 x i16> %LD, i32 0			%V0 = extractelement <8 x i16> %LD, i32 0
	br label %t			br label %t

	Show All 34 Lines

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

	Show All 9 Lines
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, i32 [[PTR1:%.*]], i32 3			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, i32 [[PTR1:%.*]], i32 3
	; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>			; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP34:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 4			; CHECK-NEXT: [[TMP34:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 4
	; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 5			; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 5
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], poison			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], undef
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> poison, <4 x i32> [[SHUFFLE1]]			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> undef, <4 x i32> [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> poison, <4 x i32> zeroinitializer, <4 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> zeroinitializer, <4 x i32> zeroinitializer, <4 x i32> [[TMP4]]
	; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 6			; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 6
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP27]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP27]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 8			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%tmp7 = getelementptr inbounds i32, i32* %ptr, i64 1			%tmp7 = getelementptr inbounds i32, i32* %ptr, i64 1
	%tmp8 = getelementptr inbounds i32, i32* %ptr, i64 0			%tmp8 = getelementptr inbounds i32, i32* %ptr, i64 0
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder2.ll

	Show All 24 Lines
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 2, i32 0>			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 2, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>			; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>
	; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP3]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP3]], [[TMP4]]
	; CHECK-NEXT: br label [[SW_EPILOG]]			; CHECK-NEXT: br label [[SW_EPILOG]]
	; CHECK: sw.epilog:			; CHECK: sw.epilog:
	; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i32> [ undef, [[ENTRY:%.]] ], [ [[TMP5]], [[SW_BB]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i32> [ undef, [[ENTRY:%.]] ], [ [[TMP5]], [[SW_BB]] ]
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <4 x i32> <i32 1, i32 1, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <4 x i32> <i32 1, i32 1, i32 0, i32 0>
	; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> poison, [[SHUFFLE]]			; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> undef, [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[TMP7]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%b = getelementptr inbounds %struct.a, %struct.a* %p, i64 0, i32 0			%b = getelementptr inbounds %struct.a, %struct.a* %p, i64 0, i32 0
	%c = getelementptr inbounds %struct.a, %struct.a* %p, i64 0, i32 1			%c = getelementptr inbounds %struct.a, %struct.a* %p, i64 0, i32 1
	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/X86/split-load8_2-unord.ll

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[G20]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[G20]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP4]], <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 3, i32 1, i32 2, i32 0, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 3, i32 1, i32 2, i32 0, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP5]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*			; CHECK-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: store <8 x i32> [[TMP7]], <8 x i32>* [[TMP8]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%p1 = alloca [16 x i32], align 16			%p1 = alloca [16 x i32], align 16
	%p2 = alloca [16 x i32], align 16			%p2 = alloca [16 x i32], align 16
	%g10 = getelementptr inbounds [16 x i32], [16 x i32]* %p1, i32 0, i64 4			%g10 = getelementptr inbounds [16 x i32], [16 x i32]* %p1, i32 0, i64 4
	%g11 = getelementptr inbounds [16 x i32], [16 x i32]* %p1, i32 0, i64 5			%g11 = getelementptr inbounds [16 x i32], [16 x i32]* %p1, i32 0, i64 5
	%g12 = getelementptr inbounds [16 x i32], [16 x i32]* %p1, i32 0, i64 6			%g12 = getelementptr inbounds [16 x i32], [16 x i32]* %p1, i32 0, i64 6
	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/tiny-tree.ll

	Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds i16, i16 [[A:%.*]], i64 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds i16, i16 [[A:%.*]], i64 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds i16, i16 [[A]], i64 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds i16, i16 [[A]], i64 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds i16, i16 [[A]], i64 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds i16, i16 [[A]], i64 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 3
	; CHECK-NEXT: [[PTR4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 4			; CHECK-NEXT: [[PTR4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 4
	; CHECK-NEXT: [[PTR5:%.]] = getelementptr inbounds i16, i16 [[A]], i64 5			; CHECK-NEXT: [[PTR5:%.]] = getelementptr inbounds i16, i16 [[A]], i64 5
	; CHECK-NEXT: [[PTR6:%.]] = getelementptr inbounds i16, i16 [[A]], i64 6			; CHECK-NEXT: [[PTR6:%.]] = getelementptr inbounds i16, i16 [[A]], i64 6
	; CHECK-NEXT: [[PTR7:%.]] = getelementptr inbounds i16, i16 [[A]], i64 7			; CHECK-NEXT: [[PTR7:%.]] = getelementptr inbounds i16, i16 [[A]], i64 7
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i16> poison, i16 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i16> poison, i16 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i16> [[TMP3]], i16 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i16> [[TMP3]], i16 [[TMP2]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP4]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP4]], <2 x i16> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[PTR0]] to <8 x i16>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[PTR0]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* [[TMP5]], align 16			; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* [[TMP5]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = load i16, i16* %v1, align 4			%1 = load i16, i16* %v1, align 4
	%2 = trunc i64 undef to i16			%2 = trunc i64 undef to i16
	%ptr0 = getelementptr inbounds i16, i16* %a, i64 0			%ptr0 = getelementptr inbounds i16, i16* %a, i64 0
	store i16 %1, i16* %ptr0, align 16			store i16 %1, i16* %ptr0, align 16
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-alt-shuffle.ll

	Show All 11 Lines
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[TMP1]] to <4 x i32>			; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[TMP1]] to <4 x i32>
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 1, i32 2, i32 7, i32 0>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 1, i32 2, i32 7, i32 0>
	; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[D:%.*]], i64 -1			; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[D:%.*]], i64 -1
	; CHECK-NEXT: [[ADD_PTR37:%.]] = getelementptr inbounds float, float [[D]], i64 -2			; CHECK-NEXT: [[ADD_PTR37:%.]] = getelementptr inbounds float, float [[D]], i64 -2
	; CHECK-NEXT: [[ADD_PTR45:%.]] = getelementptr inbounds float, float [[D]], i64 -3			; CHECK-NEXT: [[ADD_PTR45:%.]] = getelementptr inbounds float, float [[D]], i64 -3
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> poison, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp <4 x i32> [[TMP6]] to <4 x float>			; CHECK-NEXT: [[TMP7:%.*]] = sitofp <4 x i32> [[TMP6]] to <4 x float>
	; CHECK-NEXT: [[TMP8:%.*]] = fdiv <4 x float> [[TMP7]], poison			; CHECK-NEXT: [[TMP8:%.*]] = fdiv <4 x float> [[TMP7]], undef
	; CHECK-NEXT: [[ADD_PTR53:%.]] = getelementptr inbounds float, float [[D]], i64 -4			; CHECK-NEXT: [[ADD_PTR53:%.]] = getelementptr inbounds float, float [[D]], i64 -4
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ADD_PTR53]] to <4 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ADD_PTR53]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP9]], align 4			; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP9]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx1 = getelementptr inbounds i8, i8* %c, i64 4			%arrayidx1 = getelementptr inbounds i8, i8* %c, i64 4
	%0 = load i8, i8* %arrayidx1, align 1			%0 = load i8, i8* %arrayidx1, align 1
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define void @foo() {			define void @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float			; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float
	; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef			; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> <float poison, float poison, float undef, float undef>, float [[SUB]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP18:%.]], [[BB3:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP10:%.]], [[BB3:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8
	; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double			; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double
				; CHECK-NEXT: [[ADD1:%.*]] = fadd double [[TMP3]], [[CONV2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1			; CHECK-NEXT: [[SUB1:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> <double poison, double poison, double undef, double undef>, double [[SUB1]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[ADD1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <4 x double> [[TMP6]], [[TMP4]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = fptrunc <4 x double> [[TMP6]] to <4 x float>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = select <4 x i1> [[TMP7]], <4 x float> [[TMP2]], <4 x float> [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x double> poison, double [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x double> [[TMP11]], double [[TMP12]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = fcmp ogt <4 x double> [[TMP13]], [[TMP4]]
	; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP16:%.*]] = fptrunc <4 x double> [[TMP15]] to <4 x float>
	; CHECK-NEXT: [[TMP17:%.*]] = select <4 x i1> [[TMP14]], <4 x float> [[TMP2]], <4 x float> [[TMP16]]
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP18]] = phi <4 x float> [ [[TMP17]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]			; CHECK-NEXT: [[TMP10]] = phi <4 x float> [ [[TMP9]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	entry:			entry:
	%conv = uitofp i16 undef to float			%conv = uitofp i16 undef to float
	%sub = fsub float 6.553500e+04, undef			%sub = fsub float 6.553500e+04, undef
	br label %bb1			br label %bb1

	bb1:			bb1:
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/slp-umax-rdx-matcher-crash.ll

	Show All 37 Lines

	declare i32 @llvm.smin.i32(i32, i32)			declare i32 @llvm.smin.i32(i32, i32)
	declare i32 @llvm.umin.i32(i32, i32)			declare i32 @llvm.umin.i32(i32, i32)

	; Given LLVM IR caused crash in SLP.			; Given LLVM IR caused crash in SLP.
	define void @test2() {			define void @test2() {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>)			; CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>)
	; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <4 x i32> poison, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <4 x i32> undef, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP1]])			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP1]])
	; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP2]], i32 77)			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP2]], i32 77)
	; CHECK-NEXT: [[E:%.*]] = icmp ugt i32 [[TMP3]], 1			; CHECK-NEXT: [[E:%.*]] = icmp ugt i32 [[TMP3]], 1
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%smin0 = call i32 @llvm.smin.i32(i32 undef, i32 0)			%smin0 = call i32 @llvm.smin.i32(i32 undef, i32 0)
	%smin1 = call i32 @llvm.smin.i32(i32 undef, i32 1)			%smin1 = call i32 @llvm.smin.i32(i32 undef, i32 1)
	Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of the scalars used in the graph.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 390783

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

llvm/test/Transforms/SLPVectorizer/AArch64/reorder-fmuladd-crash.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/hoist.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder2.ll

llvm/test/Transforms/SLPVectorizer/X86/split-load8_2-unord.ll

llvm/test/Transforms/SLPVectorizer/X86/tiny-tree.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-alt-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

llvm/test/Transforms/SLPVectorizer/slp-umax-rdx-matcher-crash.ll

[SLP]Improve gathering of the scalars used in the graph.
ClosedPublic