This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetTransformInfo.h
-
test/Transforms/
-
Transforms/
-
LoopVectorize/X86/
-
X86/
2/2
imprecise-through-phis.ll
-
invariant-store-vectorization.ll
-
load-deref-pred.ll
-
pr35432.ll
-
pr42674.ll
-
reduction-fastmath.ll
-
strided_load_cost.ll
-
tail_loop_folding.ll
-
PhaseOrdering/X86/
-
X86/
1/2
vector-reductions-expanded.ll
-
vector-reductions.ll
-
SLPVectorizer/X86/
-
X86/
-
PR35628_1.ll
-
PR35628_2.ll
-
PR39774.ll
-
PR40310.ll
-
horizontal-list.ll
-
horizontal-minmax.ll
-
horizontal.ll
-
reassociated-loads.ll
-
reduction.ll
-
reduction_loads.ll
-
reduction_unrolled.ll
-
remark_horcost.ll
-
reorder_repeated_ops.ll
-
reverse_extract_elements.ll
-
scheduling.ll
-
undef_vect.ll
-
used-reduced-op.ll
-
vectorize-reorder-reuse.ll

Differential D80867

[x86] form reduction intrinsics over raw IR
ClosedPublic

Authored by spatel on May 30 2020, 6:34 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon

Commits

rGe50059f6b6b3: [x86] form reduction intrinsics from vectorizers instead of raw IR

Summary

This flips the switch on forming reduction intrinsics in the vectorizers. The IR diffs seem ok, but that doesn't provide any info on what happens in expansion/codegen. I will see if we can expose any obvious bugs using PhaseOrdering IR tests that include the expansion pass.

A motivating example is seen in https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we had intrinsics there, we might get CGP or InstCombine to fold them.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.May 30 2020, 6:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2020, 6:34 AM

Herald added subscribers: zzheng, hiraditya, mcrosier. · View Herald Transcript

nikic added a subscriber: nikic.May 30 2020, 9:16 AM

spatel mentioned this in rGecbf34c0e480: [PhaseOrdering] add more tests for vector reductions; NFC.Jun 4 2020, 5:28 AM

Patch updated:
Added a PhaseOrdering test file (vector-reductions-expanded.ll) that checks the output from close to initial IR from clang through -O2 followed by -expand-reductions.

I haven't figured out how to get the new pass manager to mimic that, so for now the file uses the old pass manager only.

That shows the reduction expansion pass creates FP instructions for accumulation that could be avoided, but the backend squashes those. Everything else is identical.

Note the failure to generate fmin/fmax reductions in SLP - it doesn't seem to recognize the min/max IR intrinsics as reduction candidates, but that's an existing/different issue.

We have codegen test coverage for all of the reduction intrinsics too, so that covers everything from intrinsics to x86 asm. So I think we can enable this now without any regressions.

This makes sense to me. LGTM

This revision is now accepted and ready to land.Jun 4 2020, 11:52 AM

fhahn added a subscriber: fhahn.Jun 4 2020, 11:54 AM

fhahn added inline comments.

llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll
113	maybe it's time to drop the `experimental` bit? :)

spatel marked 2 inline comments as done.Jun 4 2020, 2:44 PM

spatel added inline comments.

llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll
113	Yes, and we can clean up the 'v2' at that point too. I think it's best to push this patch first, see if we get any complaints, then declare the intrinsics non-experimental if things are still ok after a few days.

Closed by commit rGe50059f6b6b3: [x86] form reduction intrinsics from vectorizers instead of raw IR (authored by spatel). · Explain WhyJun 5 2020, 10:04 AM

This revision was automatically updated to reflect the committed changes.

spatel marked an inline comment as done.

spatel mentioned this in D81491: [InstCombine] reassociate FP diff of sums into sum of diffs.Jun 9 2020, 11:23 AM

nick added a subscriber: nick.Jun 10 2020, 3:38 PM

nick added inline comments.

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-expanded.ll
258	Is this expected behavior?

spatel marked an inline comment as done.Jun 11 2020, 7:14 AM

spatel added inline comments.

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-expanded.ll
258	Yes, the "fadd fast 0.0" is expected as commented on previously in the review. That's because the expansion pass is not doing any simplifications itself. We do expect SDAG or other codegen to simplify that. If that's not working as expected, please let me know.

spatel mentioned this in rGb5fb26951a8e: [InstCombine] reassociate FP diff of sums into sum of diffs.Jun 14 2020, 6:24 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.h

9 lines

test/

Transforms/

LoopVectorize/

X86/

imprecise-through-phis.ll

6 lines

invariant-store-vectorization.ll

10 lines

load-deref-pred.ll

60 lines

pr35432.ll

6 lines

pr42674.ll

20 lines

reduction-fastmath.ll

18 lines

strided_load_cost.ll

8 lines

tail_loop_folding.ll

8 lines

PhaseOrdering/

X86/

vector-reductions-expanded.ll

8 lines

vector-reductions.ll

48 lines

SLPVectorizer/

X86/

14 lines

10 lines

36 lines

10 lines

292 lines

116 lines

150 lines

reassociated-loads.ll

12 lines

reduction.ll

12 lines

reduction_loads.ll

24 lines

reduction_unrolled.ll

46 lines

remark_horcost.ll

6 lines

reorder_repeated_ops.ll

38 lines

reverse_extract_elements.ll

24 lines

scheduling.ll

6 lines

undef_vect.ll

15 lines

used-reduced-op.ll

8 lines

vectorize-reorder-reuse.ll

33 lines

Diff 268860

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	public:
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;
bool areFunctionArgsABICompatible(const Function *Caller,		bool areFunctionArgsABICompatible(const Function *Caller,
const Function *Callee,		const Function *Callee,
SmallPtrSetImpl<Argument *> &Args) const;		SmallPtrSetImpl<Argument *> &Args) const;
TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const;		bool IsZeroCmp) const;
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();

		/// Allow vectorizers to form reduction intrinsics in IR. The IR is expanded
		/// into shuffles and vector math/logic by the backend
		/// (see TTI::shouldExpandReduction)
		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
		TTI::ReductionFlags Flags) const {
		return true;
		}

private:		private:
int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,		int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,		int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP4:%.*]] = fcmp fast une <4 x double> [[WIDE_LOAD]], <double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01>			; AVX-NEXT: [[TMP4:%.*]] = fcmp fast une <4 x double> [[WIDE_LOAD]], <double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01>
	; AVX-NEXT: [[TMP5:%.*]] = fadd fast <4 x double> [[VEC_PHI]], [[WIDE_LOAD]]			; AVX-NEXT: [[TMP5:%.*]] = fadd fast <4 x double> [[VEC_PHI]], [[WIDE_LOAD]]
	; AVX-NEXT: [[TMP6:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>			; AVX-NEXT: [[TMP6:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>
	; AVX-NEXT: [[PREDPHI]] = select <4 x i1> [[TMP4]], <4 x double> [[TMP5]], <4 x double> [[VEC_PHI]]			; AVX-NEXT: [[PREDPHI]] = select <4 x i1> [[TMP4]], <4 x double> [[TMP5]], <4 x double> [[VEC_PHI]]
	; AVX-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; AVX-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; AVX-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 32			; AVX-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 32
	; AVX-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; AVX-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; AVX: middle.block:			; AVX: middle.block:
	; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x double> [[PREDPHI]], <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; AVX-NEXT: [[TMP8:%.*]] = call fast double @llvm.experimental.vector.reduce.v2.fadd.f64.v4f64(double 0.000000e+00, <4 x double> [[PREDPHI]])
				fhahnUnsubmitted Done Reply Inline Actions maybe it's time to drop the `experimental` bit? :) fhahn: maybe it's time to drop the `experimental` bit? :)
				spatelAuthorUnsubmitted Done Reply Inline Actions Yes, and we can clean up the 'v2' at that point too. I think it's best to push this patch first, see if we get any complaints, then declare the intrinsics non-experimental if things are still ok after a few days. spatel: Yes, and we can clean up the 'v2' at that point too. I think it's best to push this patch…
	; AVX-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x double> [[PREDPHI]], [[RDX_SHUF]]
	; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x double> [[BIN_RDX]], <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x double> [[BIN_RDX]], [[RDX_SHUF1]]
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <4 x double> [[BIN_RDX2]], i32 0
	; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i32 32, 32			; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i32 32, 32
	; AVX-NEXT: br i1 [[CMP_N]], label [[DONE:%.*]], label [[SCALAR_PH]]			; AVX-NEXT: br i1 [[CMP_N]], label [[DONE:%.*]], label [[SCALAR_PH]]
	; AVX: scalar.ph:			; AVX: scalar.ph:
	; AVX-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; AVX-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; AVX-NEXT: [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; AVX-NEXT: [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; AVX-NEXT: br label [[LOOP:%.*]]			; AVX-NEXT: br label [[LOOP:%.*]]
	; AVX: loop:			; AVX: loop:
	; AVX-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[NEXT_ITER:%.*]] ]			; AVX-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[NEXT_ITER:%.*]] ]
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store i32 [[NTRUNC]], i32* [[A]], align 4, !alias.scope !3, !noalias !0			; CHECK-NEXT: store i32 [[NTRUNC]], i32* [[A]], align 4, !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 64			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 64
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5			; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <16 x i32> [[TMP11]], [[TMP10]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <16 x i32> [[TMP11]], [[TMP10]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <16 x i32> [[TMP12]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <16 x i32> [[TMP12]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <16 x i32> [[TMP13]], [[BIN_RDX11]]			; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <16 x i32> [[TMP13]], [[BIN_RDX11]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[BIN_RDX12]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> [[BIN_RDX12]])
	; CHECK-NEXT: [[BIN_RDX13:%.*]] = add <16 x i32> [[BIN_RDX12]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF14:%.*]] = shufflevector <16 x i32> [[BIN_RDX13]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX15:%.*]] = add <16 x i32> [[BIN_RDX13]], [[RDX_SHUF14]]
	; CHECK-NEXT: [[RDX_SHUF16:%.*]] = shufflevector <16 x i32> [[BIN_RDX15]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX17:%.*]] = add <16 x i32> [[BIN_RDX15]], [[RDX_SHUF16]]
	; CHECK-NEXT: [[RDX_SHUF18:%.*]] = shufflevector <16 x i32> [[BIN_RDX17]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX19:%.*]] = add <16 x i32> [[BIN_RDX17]], [[RDX_SHUF18]]
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x i32> [[BIN_RDX19]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP15]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP15]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP40]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP40]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP37]], [[TMP36]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP37]], [[TMP36]]
	; CHECK-NEXT: [[BIN_RDX19:%.*]] = add <4 x i32> [[TMP38]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX19:%.*]] = add <4 x i32> [[TMP38]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX20:%.*]] = add <4 x i32> [[TMP39]], [[BIN_RDX19]]			; CHECK-NEXT: [[BIN_RDX20:%.*]] = add <4 x i32> [[TMP39]], [[BIN_RDX19]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX20]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP41:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX20]])
	; CHECK-NEXT: [[BIN_RDX21:%.*]] = add <4 x i32> [[BIN_RDX20]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF22:%.*]] = shufflevector <4 x i32> [[BIN_RDX21]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX23:%.*]] = add <4 x i32> [[BIN_RDX21]], [[RDX_SHUF22]]
	; CHECK-NEXT: [[TMP41:%.*]] = extractelement <4 x i32> [[BIN_RDX23]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP41]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP41]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4			; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP85:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <4 x i32> [[BIN_RDX11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x i32> [[BIN_RDX12]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <4 x i32> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <4 x i32> [[BIN_RDX14]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP103]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI6]]			; CHECK-NEXT: [[TMP103]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI6]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP104:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP104:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP104]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6			; CHECK-NEXT: br i1 [[TMP104]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP101]], [[TMP100]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP101]], [[TMP100]]
	; CHECK-NEXT: [[BIN_RDX7:%.*]] = add <4 x i32> [[TMP102]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX7:%.*]] = add <4 x i32> [[TMP102]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <4 x i32> [[TMP103]], [[BIN_RDX7]]			; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <4 x i32> [[TMP103]], [[BIN_RDX7]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX8]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP105:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX8]])
	; CHECK-NEXT: [[BIN_RDX9:%.*]] = add <4 x i32> [[BIN_RDX8]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF10:%.*]] = shufflevector <4 x i32> [[BIN_RDX9]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[BIN_RDX9]], [[RDX_SHUF10]]
	; CHECK-NEXT: [[TMP105:%.*]] = extractelement <4 x i32> [[BIN_RDX11]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP105]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP105]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 309 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP183]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI36]]			; CHECK-NEXT: [[TMP183]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI36]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP184:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP184:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP184]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !8			; CHECK-NEXT: br i1 [[TMP184]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !8
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP181]], [[TMP180]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP181]], [[TMP180]]
	; CHECK-NEXT: [[BIN_RDX37:%.*]] = add <4 x i32> [[TMP182]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX37:%.*]] = add <4 x i32> [[TMP182]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX38:%.*]] = add <4 x i32> [[TMP183]], [[BIN_RDX37]]			; CHECK-NEXT: [[BIN_RDX38:%.*]] = add <4 x i32> [[TMP183]], [[BIN_RDX37]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX38]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP185:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX38]])
	; CHECK-NEXT: [[BIN_RDX39:%.*]] = add <4 x i32> [[BIN_RDX38]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF40:%.*]] = shufflevector <4 x i32> [[BIN_RDX39]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX41:%.*]] = add <4 x i32> [[BIN_RDX39]], [[RDX_SHUF40]]
	; CHECK-NEXT: [[TMP185:%.*]] = extractelement <4 x i32> [[BIN_RDX41]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP185]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP185]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP84]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP84]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP85:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP85:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP85]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !10			; CHECK-NEXT: br i1 [[TMP85]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !10
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP82]], [[TMP81]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP82]], [[TMP81]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP84]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP84]], [[BIN_RDX10]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP86:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <4 x i32> [[BIN_RDX11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x i32> [[BIN_RDX12]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <4 x i32> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[TMP86:%.*]] = extractelement <4 x i32> [[BIN_RDX14]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP86]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP86]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 3072			; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 3072
	; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !12			; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !12
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP85:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <4 x i32> [[BIN_RDX11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x i32> [[BIN_RDX12]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <4 x i32> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <4 x i32> [[BIN_RDX14]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 3072, 3072			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 3072, 3072
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 1024, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 1024, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 334 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP151]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI36]]			; CHECK-NEXT: [[TMP151]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI36]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP152:%.*]] = icmp eq i64 [[INDEX_NEXT]], 2048			; CHECK-NEXT: [[TMP152:%.*]] = icmp eq i64 [[INDEX_NEXT]], 2048
	; CHECK-NEXT: br i1 [[TMP152]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !14			; CHECK-NEXT: br i1 [[TMP152]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !14
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP149]], [[TMP148]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP149]], [[TMP148]]
	; CHECK-NEXT: [[BIN_RDX37:%.*]] = add <4 x i32> [[TMP150]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX37:%.*]] = add <4 x i32> [[TMP150]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX38:%.*]] = add <4 x i32> [[TMP151]], [[BIN_RDX37]]			; CHECK-NEXT: [[BIN_RDX38:%.*]] = add <4 x i32> [[TMP151]], [[BIN_RDX37]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX38]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP153:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX38]])
	; CHECK-NEXT: [[BIN_RDX39:%.*]] = add <4 x i32> [[BIN_RDX38]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF40:%.*]] = shufflevector <4 x i32> [[BIN_RDX39]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX41:%.*]] = add <4 x i32> [[BIN_RDX39]], [[RDX_SHUF40]]
	; CHECK-NEXT: [[TMP153:%.*]] = extractelement <4 x i32> [[BIN_RDX41]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 2048, 2048			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 2048, 2048
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP153]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP153]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !16			; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !16
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP85:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <4 x i32> [[BIN_RDX11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x i32> [[BIN_RDX12]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <4 x i32> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <4 x i32> [[BIN_RDX14]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !18			; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !18
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP85:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <4 x i32> [[BIN_RDX11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x i32> [[BIN_RDX12]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <4 x i32> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <4 x i32> [[BIN_RDX14]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !20			; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !20
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP81]], [[TMP80]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP82]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP83]], [[BIN_RDX10]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP85:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <4 x i32> [[BIN_RDX11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x i32> [[BIN_RDX12]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <4 x i32> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <4 x i32> [[BIN_RDX14]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP85]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/pr35432.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP29:%.*]] = add i8 [[TMP25]], -1			; CHECK-NEXT: [[TMP29:%.*]] = add i8 [[TMP25]], -1
	; CHECK-NEXT: [[TMP30:%.*]] = zext i8 [[TMP28]] to i32			; CHECK-NEXT: [[TMP30:%.*]] = zext i8 [[TMP28]] to i32
	; CHECK-NEXT: [[TMP31:%.*]] = zext i8 [[TMP29]] to i32			; CHECK-NEXT: [[TMP31:%.*]] = zext i8 [[TMP29]] to i32
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; CHECK-NEXT: [[TMP32:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP32:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP32]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP32]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP27]], [[TMP26]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP27]], [[TMP26]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP33:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])
	; CHECK-NEXT: [[BIN_RDX3:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <4 x i32> [[BIN_RDX3]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX5:%.*]] = add <4 x i32> [[BIN_RDX3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i32> [[BIN_RDX5]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP7]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP7]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND4_FOR_INC9_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND4_FOR_INC9_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i8 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[CONV3]], [[FOR_BODY8_LR_PH]] ], [ [[CONV3]], [[VECTOR_SCEVCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i8 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[CONV3]], [[FOR_BODY8_LR_PH]] ], [ [[CONV3]], [[VECTOR_SCEVCHECK]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[DOTPROMOTED]], [[FOR_BODY8_LR_PH]] ], [ [[DOTPROMOTED]], [[VECTOR_SCEVCHECK]] ], [ [[TMP33]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[DOTPROMOTED]], [[FOR_BODY8_LR_PH]] ], [ [[DOTPROMOTED]], [[VECTOR_SCEVCHECK]] ], [ [[TMP33]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY8:%.*]]			; CHECK-NEXT: br label [[FOR_BODY8:%.*]]
	; CHECK: for.body8:			; CHECK: for.body8:
	; CHECK-NEXT: [[INC5:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY8]] ]			; CHECK-NEXT: [[INC5:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY8]] ]
	▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/pr42674.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt %s -loop-vectorize -instcombine -simplifycfg -mtriple=x86_64-unknown-linux-gnu -mattr=avx512vl,avx512dq,avx512bw -S \| FileCheck %s			; RUN: opt %s -loop-vectorize -instcombine -simplifycfg -mtriple=x86_64-unknown-linux-gnu -mattr=avx512vl,avx512dq,avx512bw -S \| FileCheck %s

	@bytes = global [128 x i8] zeroinitializer, align 16			@bytes = global [128 x i8] zeroinitializer, align 16

	; Make sure we end up with vector code for this loop. We used to try to create			; Make sure we end up with vector code for this loop. We used to try to create
	; a VF=64,UF=4 loop, but the scalar trip count is only 128 so			; a VF=64,UF=4 loop, but the scalar trip count is only 128 so
	; the vector loop was dead code leaving only a scalar remainder.			; the vector loop was dead code leaving only a scalar remainder.
	define zeroext i8 @sum() {			define zeroext i8 @sum() {
	; CHECK-LABEL: @sum(			; CHECK-LABEL: @sum(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <64 x i8> [ zeroinitializer, [[ENTRY]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <64 x i8> [ zeroinitializer, [[ENTRY]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <64 x i8> [ zeroinitializer, [[ENTRY]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <64 x i8> [ zeroinitializer, [[ENTRY]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [128 x i8], [128 x i8] @bytes, i64 0, i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [128 x i8], [128 x i8] @bytes, i64 0, i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <64 x i8>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <64 x i8>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <64 x i8>, <64 x i8> [[TMP1]], align 16			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <64 x i8>, <64 x i8> [[TMP1]], align 16
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 64			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 64
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <64 x i8>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <64 x i8>*
	; CHECK-NEXT: [[WIDE_LOAD3:%.]] = load <64 x i8>, <64 x i8> [[TMP3]], align 16			; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <64 x i8>, <64 x i8> [[TMP3]], align 16
	; CHECK-NEXT: [[TMP4]] = add <64 x i8> [[WIDE_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4]] = add <64 x i8> [[WIDE_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP5]] = add <64 x i8> [[WIDE_LOAD3]], [[VEC_PHI2]]			; CHECK-NEXT: [[TMP5]] = add <64 x i8> [[WIDE_LOAD2]], [[VEC_PHI1]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 128			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 128
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX]], 0
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <64 x i8> [[TMP5]], [[TMP4]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <64 x i8> [[TMP5]], [[TMP4]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <64 x i8> [[BIN_RDX]], <64 x i8> undef, <64 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = call i8 @llvm.experimental.vector.reduce.add.v64i8(<64 x i8> [[BIN_RDX]])
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <64 x i8> [[BIN_RDX]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <64 x i8> [[BIN_RDX4]], <64 x i8> undef, <64 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = add <64 x i8> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <64 x i8> [[BIN_RDX6]], <64 x i8> undef, <64 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <64 x i8> [[BIN_RDX6]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <64 x i8> [[BIN_RDX8]], <64 x i8> undef, <64 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <64 x i8> [[BIN_RDX8]], [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <64 x i8> [[BIN_RDX10]], <64 x i8> undef, <64 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = add <64 x i8> [[BIN_RDX10]], [[RDX_SHUF11]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <64 x i8> [[BIN_RDX12]], <64 x i8> undef, <64 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <64 x i8> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <64 x i8> [[BIN_RDX14]], i32 0
	; CHECK-NEXT: ret i8 [[TMP7]]			; CHECK-NEXT: ret i8 [[TMP7]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%r.010 = phi i8 [ 0, %entry ], [ %add, %for.body ]			%r.010 = phi i8 [ 0, %entry ], [ %add, %for.body ]
	Show All 11 Lines

llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP8]] = fadd fast <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP8]] = fadd fast <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP9]] = fadd fast <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]			; CHECK-NEXT: [[TMP9]] = fadd fast <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP9]], [[TMP8]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[BIN_RDX]])
	; CHECK-NEXT: [[BIN_RDX3:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <4 x float> [[BIN_RDX3]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX5:%.*]] = fadd fast <4 x float> [[BIN_RDX3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[BIN_RDX5]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP8]] = fadd reassoc <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP8]] = fadd reassoc <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP9]] = fadd reassoc <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]			; CHECK-NEXT: [[TMP9]] = fadd reassoc <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc <4 x float> [[TMP9]], [[TMP8]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc <4 x float> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = call reassoc float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[BIN_RDX]])
	; CHECK-NEXT: [[BIN_RDX3:%.*]] = fadd reassoc <4 x float> [[BIN_RDX]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <4 x float> [[BIN_RDX3]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX5:%.*]] = fadd reassoc <4 x float> [[BIN_RDX3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[BIN_RDX5]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP8]] = fadd reassoc contract <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP8]] = fadd reassoc contract <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP9]] = fadd reassoc contract <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]			; CHECK-NEXT: [[TMP9]] = fadd reassoc contract <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc contract <4 x float> [[TMP9]], [[TMP8]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc contract <4 x float> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = call reassoc contract float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[BIN_RDX]])
	; CHECK-NEXT: [[BIN_RDX3:%.*]] = fadd reassoc contract <4 x float> [[BIN_RDX]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <4 x float> [[BIN_RDX3]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX5:%.*]] = fadd reassoc contract <4 x float> [[BIN_RDX3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[BIN_RDX5]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	Show All 32 Lines

llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP33]], i32 [[TMP26]], i32 7			; CHECK-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP33]], i32 [[TMP26]], i32 7
	; CHECK-NEXT: [[TMP35:%.*]] = mul nsw <8 x i32> [[TMP34]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP35:%.*]] = mul nsw <8 x i32> [[TMP34]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP36:%.*]] = add <8 x i32> [[VEC_PHI]], <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[TMP36:%.*]] = add <8 x i32> [[VEC_PHI]], <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP37]] = add <8 x i32> [[TMP36]], [[TMP35]]			; CHECK-NEXT: [[TMP37]] = add <8 x i32> [[TMP36]], [[TMP35]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP38:%.*]] = icmp eq i64 [[INDEX_NEXT]], 96			; CHECK-NEXT: [[TMP38:%.*]] = icmp eq i64 [[INDEX_NEXT]], 96
	; CHECK-NEXT: br i1 [[TMP38]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5			; CHECK-NEXT: br i1 [[TMP38]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP37]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP39:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP37]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP37]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP39:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 100, 96			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 100, 96
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 96, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 96, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP39]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP39]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[ADD7_LCSSA:%.]] = phi i32 [ [[ADD7:%.]], [[FOR_BODY]] ], [ [[TMP39]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ADD7_LCSSA:%.]] = phi i32 [ [[ADD7:%.]], [[FOR_BODY]] ], [ [[TMP39]], [[MIDDLE_BLOCK]] ]
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll

	Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP12:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD3]], [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP12:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD3]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP13]] = add <8 x i32> [[TMP12]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP13]] = add <8 x i32> [[TMP12]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP4]] to i32			; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[TMP4]] to i32
	; CHECK-NEXT: [[TMP15:%.*]] = select <8 x i1> [[TMP6]], <8 x i32> [[TMP13]], <8 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP15:%.*]] = select <8 x i1> [[TMP6]], <8 x i32> [[TMP13]], <8 x i32> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6			; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP15]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP17:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP15]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP15]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX5:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_SHUF6:%.*]] = shufflevector <8 x i32> [[BIN_RDX5]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX7:%.*]] = add <8 x i32> [[BIN_RDX5]], [[RDX_SHUF6]]
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <8 x i32> [[BIN_RDX7]], i32 0
	; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP17]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP17]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[SUM_0:%.]] = phi i32 [ [[SUM_1:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[SUM_0:%.]] = phi i32 [ [[SUM_1:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-expanded.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O2 -expand-reductions -S < %s \| FileCheck %s			; RUN: opt -O2 -expand-reductions -S < %s \| FileCheck %s

	; Test if SLP vector reduction patterns are recognized			; Test if SLP vector reduction patterns are recognized
	; and optionally converted to reduction intrinsics and			; and optionally converted to reduction intrinsics and
	; back to raw IR.			; back to raw IR.

	target triple = "x86_64--"			target triple = "x86_64--"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define i32 @add_v4i32(i32* %p) #0 {			define i32 @add_v4i32(i32* %p) #0 {
	; CHECK-LABEL: @add_v4i32(			; CHECK-LABEL: @add_v4i32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[P:%.]] to <4 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[P:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4, !tbaa !7			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4, !tbaa !7
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], 4.200000e+01			; CHECK-NEXT: [[BIN_RDX5:%.*]] = fadd fast float 0.000000e+00, [[TMP2]]
				nickUnsubmitted Not Done Reply Inline Actions Is this expected behavior? nick: Is this expected behavior?
				spatelAuthorUnsubmitted Done Reply Inline Actions Yes, the "fadd fast 0.0" is expected as commented on previously in the review. That's because the expansion pass is not doing any simplifications itself. We do expect SDAG or other codegen to simplify that. If that's not working as expected, please let me know. spatel: Yes, the "fadd fast 0.0" is expected as commented on previously in the review. That's because…
				; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[BIN_RDX5]], 4.200000e+01
	; CHECK-NEXT: ret float [[OP_EXTRA]]			; CHECK-NEXT: ret float [[OP_EXTRA]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%r.0 = phi float [ 4.200000e+01, %entry ], [ %add, %for.inc ]			%r.0 = phi float [ 4.200000e+01, %entry ], [ %add, %for.inc ]
	%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]			%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
	Show All 23 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[P:%.]] to <4 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[P:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4, !tbaa !7			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4, !tbaa !7
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fmul fast <4 x float> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fmul fast <4 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fmul fast <4 x float> [[BIN_RDX]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fmul fast <4 x float> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fmul fast float [[TMP2]], 4.200000e+01			; CHECK-NEXT: [[BIN_RDX5:%.*]] = fmul fast float 1.000000e+00, [[TMP2]]
				; CHECK-NEXT: [[OP_EXTRA:%.*]] = fmul fast float [[BIN_RDX5]], 4.200000e+01
	; CHECK-NEXT: ret float [[OP_EXTRA]]			; CHECK-NEXT: ret float [[OP_EXTRA]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%r.0 = phi float [ 4.200000e+01, %entry ], [ %mul, %for.inc ]			%r.0 = phi float [ 4.200000e+01, %entry ], [ %mul, %for.inc ]
	%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]			%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O2 -S -mattr=avx < %s \| FileCheck %s			; RUN: opt -O2 -S -mattr=avx < %s \| FileCheck %s
	; RUN: opt -passes='default<O2>' -S -mattr=avx < %s \| FileCheck %s			; RUN: opt -passes='default<O2>' -S -mattr=avx < %s \| FileCheck %s

	target triple = "x86_64--"			target triple = "x86_64--"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define i32 @ext_ext_or_reduction_v4i32(<4 x i32> %x, <4 x i32> %y) {			define i32 @ext_ext_or_reduction_v4i32(<4 x i32> %x, <4 x i32> %y) {
	; CHECK-LABEL: @ext_ext_or_reduction_v4i32(			; CHECK-LABEL: @ext_ext_or_reduction_v4i32(
	; CHECK-NEXT: [[Z:%.]] = and <4 x i32> [[Y:%.]], [[X:%.*]]			; CHECK-NEXT: [[Z:%.]] = and <4 x i32> [[Y:%.]], [[X:%.*]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[Z]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> [[Z]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = or <4 x i32> [[Z]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: ret i32 [[TMP1]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	%z = and <4 x i32> %x, %y			%z = and <4 x i32> %x, %y
	%z0 = extractelement <4 x i32> %z, i32 0			%z0 = extractelement <4 x i32> %z, i32 0
	%z1 = extractelement <4 x i32> %z, i32 1			%z1 = extractelement <4 x i32> %z, i32 1
	%z01 = or i32 %z0, %z1			%z01 = or i32 %z0, %z1
	%z2 = extractelement <4 x i32> %z, i32 2			%z2 = extractelement <4 x i32> %z, i32 2
	%z012 = or i32 %z01, %z2			%z012 = or i32 %z01, %z2
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[VEC0:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[VEC0:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[VEC1:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[VEC1:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <4 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <4 x i32> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP4]]
	; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP6]], <4 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP6]], <4 x i32> [[TMP4]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP7]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[CMP5:%.]] = icmp sle i32 [[TMP8]], [[TOLERANCE:%.]]			; CHECK-NEXT: [[CMP5:%.]] = icmp sle i32 [[TMP8]], [[TOLERANCE:%.]]
	; CHECK-NEXT: [[COND6:%.*]] = zext i1 [[CMP5]] to i32			; CHECK-NEXT: [[COND6:%.*]] = zext i1 [[CMP5]] to i32
	; CHECK-NEXT: ret i32 [[COND6]]			; CHECK-NEXT: ret i32 [[COND6]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

	define i32 @TestVectorsEqual_alt(i32* noalias %Vec0, i32* noalias %Vec1, i32 %Tolerance) {			define i32 @TestVectorsEqual_alt(i32* noalias %Vec0, i32* noalias %Vec1, i32 %Tolerance) {
	; CHECK-LABEL: @TestVectorsEqual_alt(			; CHECK-LABEL: @TestVectorsEqual_alt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[VEC0:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[VEC0:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[VEC1:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[VEC1:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = add <4 x i32> [[TMP1]], [[RDX_SHUF5]]			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP3]])
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <4 x i32> [[BIN_RDX6]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[ADD_3:%.*]] = sub i32 [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <4 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]			; CHECK-NEXT: [[CMP3:%.]] = icmp ule i32 [[ADD_3]], [[TOLERANCE:%.]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[BIN_RDX8]], [[BIN_RDX4]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
	; CHECK-NEXT: [[CMP3:%.]] = icmp ule i32 [[TMP5]], [[TOLERANCE:%.]]
	; CHECK-NEXT: [[COND:%.*]] = zext i1 [[CMP3]] to i32			; CHECK-NEXT: [[COND:%.*]] = zext i1 [[CMP3]] to i32
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[COND]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%sum.0 = phi i32 [ 0, %entry ], [ %add, %for.inc ]			%sum.0 = phi i32 [ 0, %entry ], [ %add, %for.inc ]
	Show All 30 Lines
	; CHECK-LABEL: @TestVectorsEqualFP(			; CHECK-LABEL: @TestVectorsEqualFP(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[VEC0:%.]] to <4 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[VEC0:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[VEC1:%.]] to <4 x float>			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[VEC1:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <4 x float> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <4 x float> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP4]])
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP5]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[CMP4:%.]] = fcmp fast ole float [[TMP6]], [[TOLERANCE:%.]]			; CHECK-NEXT: [[CMP4:%.]] = fcmp fast ole float [[TMP6]], [[TOLERANCE:%.]]
	; CHECK-NEXT: [[COND5:%.*]] = zext i1 [[CMP4]] to i32			; CHECK-NEXT: [[COND5:%.*]] = zext i1 [[CMP4]] to i32
	; CHECK-NEXT: ret i32 [[COND5]]			; CHECK-NEXT: ret i32 [[COND5]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

	define i32 @TestVectorsEqualFP_alt(float* noalias %Vec0, float* noalias %Vec1, float %Tolerance) {			define i32 @TestVectorsEqualFP_alt(float* noalias %Vec0, float* noalias %Vec1, float %Tolerance) {
	; CHECK-LABEL: @TestVectorsEqualFP_alt(			; CHECK-LABEL: @TestVectorsEqualFP_alt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[VEC0:%.]] to <4 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[VEC0:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[VEC1:%.]] to <4 x float>			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[VEC1:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <4 x float> [[TMP1]], [[RDX_SHUF5]]			; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <4 x float> [[BIN_RDX6]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[ADD_3:%.*]] = fsub fast float [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <4 x float> [[BIN_RDX6]], [[RDX_SHUF7]]			; CHECK-NEXT: [[CMP3:%.]] = fcmp fast ole float [[ADD_3]], [[TOLERANCE:%.]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <4 x float> [[BIN_RDX8]], [[BIN_RDX4]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP4]], i32 0
	; CHECK-NEXT: [[CMP3:%.]] = fcmp fast ole float [[TMP5]], [[TOLERANCE:%.]]
	; CHECK-NEXT: [[COND:%.*]] = zext i1 [[CMP3]] to i32			; CHECK-NEXT: [[COND:%.*]] = zext i1 [[CMP3]] to i32
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[COND]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%sum.0 = phi float [ 0.000000e+00, %entry ], [ %add, %for.inc ]			%sum.0 = phi float [ 0.000000e+00, %entry ], [ %add, %for.inc ]
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35628_1.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

	define void @mainTest(i32* %ptr) #0 {			define void @mainTest(i32* %ptr) #0 {
	; CHECK-LABEL: @mainTest(			; CHECK-LABEL: @mainTest(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[PTR:%.*]], null			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[PTR:%.*]], null
	; CHECK-NEXT: br i1 [[CMP]], label [[LOOP:%.]], label [[BAIL_OUT:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[LOOP:%.]], label [[BAIL_OUT:%.]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA5:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA3:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 1			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 2			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 3			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[PTR]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[PTR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i32> [[TMP4]], [[TMP4]]			; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i32> [[TMP4]], [[TMP4]]
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP6]] to i64			; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP6]] to i64
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP8]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP10]], 1			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP10]], 1
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add i32 [[OP_EXTRA]], [[TMP7]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = add i32 [[OP_EXTRA]], [[TMP7]]
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = add i32 [[OP_EXTRA3]], [[TMP6]]			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = add i32 [[OP_EXTRA1]], [[TMP6]]
	; CHECK-NEXT: [[OP_EXTRA5]] = add i32 [[OP_EXTRA4]], [[TMP5]]			; CHECK-NEXT: [[OP_EXTRA3]] = add i32 [[OP_EXTRA2]], [[TMP5]]
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	; CHECK: bail_out:			; CHECK: bail_out:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%cmp = icmp eq i32* %ptr, null			%cmp = icmp eq i32* %ptr, null
	br i1 %cmp, label %loop, label %bail_out			br i1 %cmp, label %loop, label %bail_out

	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

	define void @test() #0 {			define void @test() #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i64 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA3:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i64 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA1:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi i64 [ 2, [[ENTRY]] ], [ [[TMP6:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi i64 [ 2, [[ENTRY]] ], [ [[TMP6:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[DUMMY_ADD:%.*]] = add i16 0, 0			; CHECK-NEXT: [[DUMMY_ADD:%.*]] = add i16 0, 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> undef, i64 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> undef, i64 [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i64> [[TMP1]], i64 [[TMP0]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i64> [[TMP1]], i64 [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i64> [[TMP2]], i64 [[TMP0]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i64> [[TMP2]], i64 [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i64> [[TMP3]], i64 [[TMP0]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i64> [[TMP3]], i64 [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> [[TMP4]], <i64 3, i64 2, i64 1, i64 0>			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> [[TMP4]], <i64 3, i64 2, i64 1, i64 0>
	; CHECK-NEXT: [[TMP6]] = extractelement <4 x i64> [[TMP5]], i32 3			; CHECK-NEXT: [[TMP6]] = extractelement <4 x i64> [[TMP5]], i32 3
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP5]], i32 0
	; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP7]], 32			; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP7]], 32
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP5]]			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.*]] = ashr exact <4 x i64> [[TMP8]], <i64 32, i64 32, i64 32, i64 32>			; CHECK-NEXT: [[TMP9:%.*]] = ashr exact <4 x i64> [[TMP8]], <i64 32, i64 32, i64 32, i64 32>
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i64> [[TMP9]], <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> [[TMP9]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i64> [[TMP9]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i64> [[BIN_RDX]], <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i64> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i64> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i64 [[TMP10]], 0			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i64 [[TMP10]], 0
	; CHECK-NEXT: [[OP_EXTRA3]] = add i64 [[OP_EXTRA]], [[TMP6]]			; CHECK-NEXT: [[OP_EXTRA1]] = add i64 [[OP_EXTRA]], [[TMP6]]
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%dummy_phi = phi i64 [ 1, %entry ], [ %last, %loop ]			%dummy_phi = phi i64 [ 1, %entry ], [ %last, %loop ]
	%0 = phi i64 [ 2, %entry ], [ %fork, %loop ]			%0 = phi i64 [ 2, %entry ], [ %fork, %loop ]
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefixes=ALL,CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefixes=ALL,CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefixes=ALL,FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefixes=ALL,FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP15:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP15:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.experimental.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = and <8 x i32> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]			; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
				; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
				; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
				; CHECK-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
				; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP0]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> undef, i32 [[OP_EXTRA26]], i32 0
	; CHECK-NEXT: [[OP_EXTRA28:%.*]] = and i32 [[OP_EXTRA27]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA29:%.*]] = and i32 [[OP_EXTRA28]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA30:%.*]] = and i32 [[OP_EXTRA29]], [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> undef, i32 [[OP_EXTRA30]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> undef, i32 [[TMP12]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> undef, i32 [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1
	; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1			; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>			; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>
	; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496			; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496
	; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555			; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555
	; FORCE_REDUCTION-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.experimental.vector.reduce.and.v4i32(<4 x i32> [[TMP3]])
	; FORCE_REDUCTION-NEXT: [[BIN_RDX:%.*]] = and <4 x i32> [[TMP3]], [[RDX_SHUF]]
	; FORCE_REDUCTION-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; FORCE_REDUCTION-NEXT: [[BIN_RDX2:%.*]] = and <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]			; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
				; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
				; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA28:%.*]] = and i32 [[OP_EXTRA27]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA29:%.*]] = and i32 [[OP_EXTRA28]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529			; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
	; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA29]], [[VAL_39]]			; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]
	; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685			; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_40]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_40]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_41]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_41]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 14910, i32 1			; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 14910, i32 1
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = and <2 x i32> [[TMP8]], [[TMP10]]			; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = and <2 x i32> [[TMP8]], [[TMP10]]
	; FORCE_REDUCTION-NEXT: [[TMP12:%.*]] = add <2 x i32> [[TMP8]], [[TMP10]]			; FORCE_REDUCTION-NEXT: [[TMP12:%.*]] = add <2 x i32> [[TMP8]], [[TMP10]]
	; FORCE_REDUCTION-NEXT: [[TMP13]] = shufflevector <2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> <i32 0, i32 3>			; FORCE_REDUCTION-NEXT: [[TMP13]] = shufflevector <2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> <i32 0, i32 3>
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR40310.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s

	define void @mainTest(i32 %param, i32 * %vals, i32 %len) {			define void @mainTest(i32 %param, i32 * %vals, i32 %len) {
	; CHECK-LABEL: @mainTest(			; CHECK-LABEL: @mainTest(
	; CHECK-NEXT: bci_15.preheader:			; CHECK-NEXT: bci_15.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 31, i32 undef>, i32 [[PARAM:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 31, i32 undef>, i32 [[PARAM:%.]], i32 1
	; CHECK-NEXT: br label [[BCI_15:%.*]]			; CHECK-NEXT: br label [[BCI_15:%.*]]
	; CHECK: bci_15:			; CHECK: bci_15:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 15			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 15
	; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[VALS:%.*]] unordered, align 4			; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[VALS:%.*]] unordered, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 -1>			; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 -1>
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.and.v16i32(<16 x i32> [[TMP4]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = and <16 x i32> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = and <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP5]], [[TMP2]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP5]], [[TMP2]]
	; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16			; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> undef, i32 [[V44]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> undef, i32 [[V44]], i32 0
	; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[OP_EXTRA]], i32 1			; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[OP_EXTRA]], i32 1
	; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2			; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
	; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float			; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP3]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
	; CHECK-NEXT: store float [[OP_EXTRA5]], float* @res, align 4			; CHECK-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
	; CHECK-NEXT: ret float [[OP_EXTRA5]]			; CHECK-NEXT: ret float [[OP_EXTRA1]]
	;			;
	; THRESHOLD-LABEL: @bazz(			; THRESHOLD-LABEL: @bazz(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3			; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2			; THRESHOLD-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
	; THRESHOLD-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float			; THRESHOLD-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP3]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]			; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
	; THRESHOLD-NEXT: store float [[OP_EXTRA5]], float* @res, align 4			; THRESHOLD-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
	; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	%mul = mul nsw i32 %0, 3			%mul = mul nsw i32 %0, 3
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16			%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
	%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16			%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
	%mul4 = fmul fast float %2, %1			%mul4 = fmul fast float %2, %1
	Show All 36 Lines
	define float @bazzz() {			define float @bazzz() {
	; CHECK-LABEL: @bazzz(			; CHECK-LABEL: @bazzz(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; CHECK-NEXT: store float [[TMP5]], float* @res, align 4			; CHECK-NEXT: store float [[TMP5]], float* @res, align 4
	; CHECK-NEXT: ret float [[TMP5]]			; CHECK-NEXT: ret float [[TMP5]]
	;			;
	; THRESHOLD-LABEL: @bazzz(			; THRESHOLD-LABEL: @bazzz(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; THRESHOLD-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]			; THRESHOLD-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; THRESHOLD-NEXT: store float [[TMP5]], float* @res, align 4			; THRESHOLD-NEXT: store float [[TMP5]], float* @res, align 4
	; THRESHOLD-NEXT: ret float [[TMP5]]			; THRESHOLD-NEXT: ret float [[TMP5]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	%conv = sitofp i32 %0 to float			%conv = sitofp i32 %0 to float
	%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16			%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
	Show All 19 Lines
	define i32 @foo() {			define i32 @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP5]] to i32			; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP5]] to i32
	; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4			; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4
	; CHECK-NEXT: ret i32 [[CONV4]]			; CHECK-NEXT: ret i32 [[CONV4]]
	;			;
	; THRESHOLD-LABEL: @foo(			; THRESHOLD-LABEL: @foo(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; THRESHOLD-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]			; THRESHOLD-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; THRESHOLD-NEXT: [[CONV4:%.*]] = fptosi float [[TMP5]] to i32			; THRESHOLD-NEXT: [[CONV4:%.*]] = fptosi float [[TMP5]] to i32
	; THRESHOLD-NEXT: store i32 [[CONV4]], i32* @n, align 4			; THRESHOLD-NEXT: store i32 [[CONV4]], i32* @n, align 4
	; THRESHOLD-NEXT: ret i32 [[CONV4]]			; THRESHOLD-NEXT: ret i32 [[CONV4]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	%conv = sitofp i32 %0 to float			%conv = sitofp i32 %0 to float
	Show All 19 Lines
	}			}

	define float @bar() {			define float @bar() {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.experimental.vector.reduce.fmax.v4f32(<4 x float> [[TMP2]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0
	; CHECK-NEXT: store float [[TMP3]], float* @res, align 4			; CHECK-NEXT: store float [[TMP3]], float* @res, align 4
	; CHECK-NEXT: ret float [[TMP3]]			; CHECK-NEXT: ret float [[TMP3]]
	;			;
	; THRESHOLD-LABEL: @bar(			; THRESHOLD-LABEL: @bar(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]			; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP3:%.*]] = call fast float @llvm.experimental.vector.reduce.fmax.v4f32(<4 x float> [[TMP2]])
	; THRESHOLD-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0
	; THRESHOLD-NEXT: store float [[TMP3]], float* @res, align 4			; THRESHOLD-NEXT: store float [[TMP3]], float* @res, align 4
	; THRESHOLD-NEXT: ret float [[TMP3]]			; THRESHOLD-NEXT: ret float [[TMP3]]
	;			;
	entry:			entry:
	%0 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16			%0 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
	%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16			%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
	%mul = fmul fast float %1, %0			%mul = fmul fast float %1, %0
	%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4			%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42			; CHECK-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42
	; CHECK-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43			; CHECK-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43
	; CHECK-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44			; CHECK-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44
	; CHECK-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45			; CHECK-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45
	; CHECK-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46			; CHECK-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46
	; CHECK-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47			; CHECK-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP3]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v32f32(float 0.000000e+00, <32 x float> [[TMP3]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP3]], [[RDX_SHUF]]			; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v16f32(float 0.000000e+00, <16 x float> [[TMP1]])
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <16 x float> [[TMP1]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0
	; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
	; CHECK-NEXT: ret float [[OP_RDX]]			; CHECK-NEXT: ret float [[OP_RDX]]
	;			;
	; THRESHOLD-LABEL: @f(			; THRESHOLD-LABEL: @f(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3
	Show All 40 Lines
	; THRESHOLD-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42			; THRESHOLD-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42
	; THRESHOLD-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43			; THRESHOLD-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43
	; THRESHOLD-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44			; THRESHOLD-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44
	; THRESHOLD-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45			; THRESHOLD-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45
	; THRESHOLD-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46			; THRESHOLD-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46
	; THRESHOLD-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47			; THRESHOLD-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47
	; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*			; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*
	; THRESHOLD-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4			; THRESHOLD-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP3]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v32f32(float 0.000000e+00, <32 x float> [[TMP3]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP3]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[TMP5:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v16f32(float 0.000000e+00, <16 x float> [[TMP1]])
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; THRESHOLD-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <16 x float> [[TMP1]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]
	; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
	; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]
	; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0
	; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]			; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
	; THRESHOLD-NEXT: ret float [[OP_RDX]]			; THRESHOLD-NEXT: ret float [[OP_RDX]]
	;			;
	entry:			entry:
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1			%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1
	%1 = load float, float* %arrayidx.1, align 4			%1 = load float, float* %arrayidx.1, align 4
	%add.1 = fadd fast float %1, %0			%add.1 = fadd fast float %1, %0
	▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26			; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27			; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28			; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29			; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30			; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31			; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP1]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v32f32(float 0.000000e+00, <32 x float> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
	; CHECK-NEXT: ret float [[OP_EXTRA]]			; CHECK-NEXT: ret float [[OP_EXTRA]]
	;			;
	; THRESHOLD-LABEL: @f1(			; THRESHOLD-LABEL: @f1(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float
	; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	Show All 24 Lines
	; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26			; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27			; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28			; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29			; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30			; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31			; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP1]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v32f32(float 0.000000e+00, <32 x float> [[TMP1]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA]]
	;			;
	entry:			entry:
	%rem = srem i32 %a, %b			%rem = srem i32 %a, %b
	%conv = sitofp i32 %rem to float			%conv = sitofp i32 %rem to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%add = fadd fast float %0, %conv			%add = fadd fast float %0, %conv
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25			; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25
	; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26			; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27			; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28			; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29			; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30			; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*
	; CHECK-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP7]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v16f32(float 0.000000e+00, <16 x float> [[TMP7]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP7]], [[RDX_SHUF]]			; CHECK-NEXT: [[TMP9:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP5]])
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <8 x float> [[TMP5]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <8 x float> [[BIN_RDX8]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <8 x float> [[BIN_RDX8]], [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[BIN_RDX12]], i32 0
	; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
	; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX1]], [[TMP1]]
	; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0
	; CHECK-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]			; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
	; CHECK-NEXT: ret float [[TMP12]]			; CHECK-NEXT: ret float [[TMP12]]
	;			;
	; THRESHOLD-LABEL: @loadadd31(			; THRESHOLD-LABEL: @loadadd31(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4
	; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	Show All 27 Lines
	; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25			; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25
	; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26			; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27			; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28			; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29			; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30			; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; THRESHOLD-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*			; THRESHOLD-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*
	; THRESHOLD-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4			; THRESHOLD-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP7]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP8:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v16f32(float 0.000000e+00, <16 x float> [[TMP7]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP7]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[TMP9:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP5]])
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
	; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <8 x float> [[TMP5]], [[RDX_SHUF7]]
	; THRESHOLD-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <8 x float> [[BIN_RDX8]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <8 x float> [[BIN_RDX8]], [[RDX_SHUF9]]
	; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
	; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[BIN_RDX12]], i32 0
	; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]			; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
	; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP10:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
	; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]			; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
	; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX1]], [[TMP1]]
	; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0
	; THRESHOLD-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
	; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
	; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]			; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
	; THRESHOLD-NEXT: ret float [[TMP12]]			; THRESHOLD-NEXT: ret float [[TMP12]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds float, float* %x, i64 1			%arrayidx = getelementptr inbounds float, float* %x, i64 1
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2			%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2
	%1 = load float, float* %arrayidx.1, align 4			%1 = load float, float* %arrayidx.1, align 4
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; CHECK-NEXT: ret float [[OP_EXTRA5]]			; CHECK-NEXT: ret float [[OP_EXTRA1]]
	;			;
	; THRESHOLD-LABEL: @extra_args(			; THRESHOLD-LABEL: @extra_args(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]			; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00			; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
	; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP1]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%add = fadd fast float %conv, 3.000000e+00			%add = fadd fast float %conv, 3.000000e+00
	%add1 = fadd fast float %0, %add			%add1 = fadd fast float %0, %add
	%arrayidx3 = getelementptr inbounds float, float* %x, i64 1			%arrayidx3 = getelementptr inbounds float, float* %x, i64 1
	Show All 31 Lines
	; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = fadd fast float [[OP_EXTRA1]], 5.000000e+00
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = fadd fast float [[OP_EXTRA2]], [[CONV]]
	; CHECK-NEXT: ret float [[OP_EXTRA7]]			; CHECK-NEXT: ret float [[OP_EXTRA3]]
	;			;
	; THRESHOLD-LABEL: @extra_args_same_several_times(			; THRESHOLD-LABEL: @extra_args_same_several_times(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]			; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00			; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
	; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP1]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00			; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00
	; THRESHOLD-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00			; THRESHOLD-NEXT: [[OP_EXTRA2:%.*]] = fadd fast float [[OP_EXTRA1]], 5.000000e+00
	; THRESHOLD-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA3:%.*]] = fadd fast float [[OP_EXTRA2]], [[CONV]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA7]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA3]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%add = fadd fast float %conv, 3.000000e+00			%add = fadd fast float %conv, 3.000000e+00
	%add1 = fadd fast float %0, %add			%add1 = fadd fast float %0, %add
	%arrayidx3 = getelementptr inbounds float, float* %x, i64 1			%arrayidx3 = getelementptr inbounds float, float* %x, i64 1
	Show All 35 Lines
	; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; CHECK-NEXT: ret float [[OP_EXTRA5]]			; CHECK-NEXT: ret float [[OP_EXTRA1]]
	;			;
	; THRESHOLD-LABEL: @extra_args_no_replace(			; THRESHOLD-LABEL: @extra_args_no_replace(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]			; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float			; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
	; THRESHOLD-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00			; THRESHOLD-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]			; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]
	; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP1]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%convc = sitofp i32 %c to float			%convc = sitofp i32 %c to float
	%addc = fadd fast float %convc, 3.000000e+00			%addc = fadd fast float %convc, 3.000000e+00
	%add = fadd fast float %conv, %addc			%add = fadd fast float %conv, %addc
	Show All 33 Lines
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer
	; CHECK-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>			; CHECK-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP11]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]
	; CHECK-NEXT: ret i32 [[OP_EXTRA3]]			; CHECK-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	; THRESHOLD-LABEL: @wobble(			; THRESHOLD-LABEL: @wobble(
	; THRESHOLD-NEXT: bb:			; THRESHOLD-NEXT: bb:
	; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> undef, i32 [[ARG:%.]], i32 0			; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> undef, i32 [[ARG:%.]], i32 0
	; THRESHOLD-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG]], i32 1			; THRESHOLD-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG]], i32 1
	; THRESHOLD-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[ARG]], i32 2			; THRESHOLD-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[ARG]], i32 2
	; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ARG]], i32 3			; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ARG]], i32 3
	; THRESHOLD-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0
	; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1			; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1
	; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2			; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2
	; THRESHOLD-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3			; THRESHOLD-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3
	; THRESHOLD-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]			; THRESHOLD-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]
	; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3			; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3
	; THRESHOLD-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer			; THRESHOLD-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer
	; THRESHOLD-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>			; THRESHOLD-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[TMP12:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP11]])
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
	; THRESHOLD-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]			; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]
	; THRESHOLD-NEXT: ret i32 [[OP_EXTRA3]]			; THRESHOLD-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	bb:			bb:
	%x1 = xor i32 %arg, %bar			%x1 = xor i32 %arg, %bar
	%i1 = icmp eq i32 %x1, 0			%i1 = icmp eq i32 %x1, 0
	%s1 = sext i1 %i1 to i32			%s1 = sext i1 %i1 to i32
	%x2 = xor i32 %arg, %bar			%x2 = xor i32 %arg, %bar
	%i2 = icmp eq i32 %x2, 0			%i2 = icmp eq i32 %x2, 0
	%s2 = sext i1 %i2 to i32			%s2 = sext i1 %i2 to i32
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown-linux -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,DEFAULT,SSE		; RUN: opt < %s -mtriple=x86_64-unknown-linux -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,DEFAULT,SSE
; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,DEFAULT,AVX,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,DEFAULT,AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,DEFAULT,AVX,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,DEFAULT,AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=skx -slp-vectorizer -S -slp-threshold=-100 \| FileCheck %s --check-prefixes=CHECK,THRESH		; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=skx -slp-vectorizer -S -slp-threshold=-100 \| FileCheck %s --check-prefixes=CHECK,THRESH

@arr = local_unnamed_addr global [32 x i32] zeroinitializer, align 16		@arr = local_unnamed_addr global [32 x i32] zeroinitializer, align 16
@arr1 = local_unnamed_addr global [32 x float] zeroinitializer, align 16		@arr1 = local_unnamed_addr global [32 x float] zeroinitializer, align 16
@arrp = local_unnamed_addr global [32 x i32*] zeroinitializer, align 16		@arrp = local_unnamed_addr global [32 x i32*] zeroinitializer, align 16
@var = global i32 zeroinitializer, align 8		@var = global i32 zeroinitializer, align 8

define i32 @maxi8(i32) {		define i32 @maxi8(i32) {
; CHECK-LABEL: @maxi8(		; CHECK-LABEL: @maxi8(
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP2]], <8 x i32> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
Show All 14 Lines	;
%22 = icmp sgt i32 %20, %21		%22 = icmp sgt i32 %20, %21
%23 = select i1 %22, i32 %20, i32 %21		%23 = select i1 %22, i32 %20, i32 %21
ret i32 %23		ret i32 %23
}		}

define i32 @maxi16(i32) {		define i32 @maxi16(i32) {
; CHECK-LABEL: @maxi16(		; CHECK-LABEL: @maxi16(
; CHECK-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v16i32(<16 x i32> [[TMP2]])
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP]], <16 x i32> [[TMP2]], <16 x i32> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP2]], <16 x i32> [[RDX_MINMAX_SELECT]], <16 x i32> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT3]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP5]], <16 x i32> [[RDX_MINMAX_SELECT3]], <16 x i32> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT6]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP8]], <16 x i32> [[RDX_MINMAX_SELECT6]], <16 x i32> [[RDX_SHUF7]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[RDX_MINMAX_SELECT9]], i32 0
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
Show All 38 Lines	;
%46 = icmp sgt i32 %44, %45		%46 = icmp sgt i32 %44, %45
%47 = select i1 %46, i32 %44, i32 %45		%47 = select i1 %46, i32 %44, i32 %45
ret i32 %47		ret i32 %47
}		}

define i32 @maxi32(i32) {		define i32 @maxi32(i32) {
; CHECK-LABEL: @maxi32(		; CHECK-LABEL: @maxi32(
; CHECK-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v32i32(<32 x i32> [[TMP2]])
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP]], <32 x i32> [[TMP2]], <32 x i32> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP2]], <32 x i32> [[RDX_MINMAX_SELECT]], <32 x i32> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT3]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP5]], <32 x i32> [[RDX_MINMAX_SELECT3]], <32 x i32> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT6]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP8]], <32 x i32> [[RDX_MINMAX_SELECT6]], <32 x i32> [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_SHUF10:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT9]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP11:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT9]], [[RDX_SHUF10]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT12:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP11]], <32 x i32> [[RDX_MINMAX_SELECT9]], <32 x i32> [[RDX_SHUF10]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <32 x i32> [[RDX_MINMAX_SELECT12]], i32 0
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	;
%94 = icmp sgt i32 %92, %93		%94 = icmp sgt i32 %92, %93
%95 = select i1 %94, i32 %92, i32 %93		%95 = select i1 %94, i32 %92, i32 %93
ret i32 %95		ret i32 %95
}		}

define float @maxf8(float) {		define float @maxf8(float) {
; CHECK-LABEL: @maxf8(		; CHECK-LABEL: @maxf8(
; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr1 to <8 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr1 to <8 x float>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.experimental.vector.reduce.fmax.v8f32(<8 x float> [[TMP2]])
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <8 x float> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x float> [[TMP2]], <8 x float> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[RDX_MINMAX_SELECT]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <8 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x float> [[RDX_MINMAX_SELECT]], <8 x float> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x float> [[RDX_MINMAX_SELECT3]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <8 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x float> [[RDX_MINMAX_SELECT3]], <8 x float> [[RDX_SHUF4]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[RDX_MINMAX_SELECT6]], i32 0
; CHECK-NEXT: ret float [[TMP3]]		; CHECK-NEXT: ret float [[TMP3]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
Show All 14 Lines	;
%22 = fcmp fast ogt float %20, %21		%22 = fcmp fast ogt float %20, %21
%23 = select i1 %22, float %20, float %21		%23 = select i1 %22, float %20, float %21
ret float %23		ret float %23
}		}

define float @maxf16(float) {		define float @maxf16(float) {
; CHECK-LABEL: @maxf16(		; CHECK-LABEL: @maxf16(
; CHECK-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr1 to <16 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr1 to <16 x float>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.experimental.vector.reduce.fmax.v16f32(<16 x float> [[TMP2]])
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <16 x float> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP]], <16 x float> [[TMP2]], <16 x float> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP2]], <16 x float> [[RDX_MINMAX_SELECT]], <16 x float> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT3]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP5]], <16 x float> [[RDX_MINMAX_SELECT3]], <16 x float> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT6]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP8]], <16 x float> [[RDX_MINMAX_SELECT6]], <16 x float> [[RDX_SHUF7]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[RDX_MINMAX_SELECT9]], i32 0
; CHECK-NEXT: ret float [[TMP3]]		; CHECK-NEXT: ret float [[TMP3]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
Show All 38 Lines	;
%46 = fcmp fast ogt float %44, %45		%46 = fcmp fast ogt float %44, %45
%47 = select i1 %46, float %44, float %45		%47 = select i1 %46, float %44, float %45
ret float %47		ret float %47
}		}

define float @maxf32(float) {		define float @maxf32(float) {
; CHECK-LABEL: @maxf32(		; CHECK-LABEL: @maxf32(
; CHECK-NEXT: [[TMP2:%.]] = load <32 x float>, <32 x float> bitcast ([32 x float]* @arr1 to <32 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <32 x float>, <32 x float> bitcast ([32 x float]* @arr1 to <32 x float>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.experimental.vector.reduce.fmax.v32f32(<32 x float> [[TMP2]])
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <32 x float> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP]], <32 x float> [[TMP2]], <32 x float> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP2]], <32 x float> [[RDX_MINMAX_SELECT]], <32 x float> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT3]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP5]], <32 x float> [[RDX_MINMAX_SELECT3]], <32 x float> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT6]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP8]], <32 x float> [[RDX_MINMAX_SELECT6]], <32 x float> [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_SHUF10:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT9]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP11:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT9]], [[RDX_SHUF10]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT12:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP11]], <32 x float> [[RDX_MINMAX_SELECT9]], <32 x float> [[RDX_SHUF10]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <32 x float> [[RDX_MINMAX_SELECT12]], i32 0
; CHECK-NEXT: ret float [[TMP3]]		; CHECK-NEXT: ret float [[TMP3]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
;		;
; AVX-LABEL: @maxi8_mutiple_uses(		; AVX-LABEL: @maxi8_mutiple_uses(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
; AVX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; AVX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; AVX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; AVX-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP7]]		; AVX-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP7]]
; AVX-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP7]]		; AVX-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP7]]
; AVX-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], [[TMP5]]		; AVX-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], [[TMP5]]
; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 [[TMP5]]		; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 [[TMP5]]
; AVX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP12]]		; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP12]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA]], i32 [[TMP12]]		; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA]], i32 [[TMP12]]
; AVX-NEXT: [[TMP15:%.*]] = select i1 [[TMP4]], i32 3, i32 4		; AVX-NEXT: [[TMP15:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; AVX-NEXT: store i32 [[TMP15]], i32* @var, align 8		; AVX-NEXT: store i32 [[TMP15]], i32* @var, align 8
; AVX-NEXT: ret i32 [[TMP14]]		; AVX-NEXT: ret i32 [[TMP14]]
;		;
; THRESH-LABEL: @maxi8_mutiple_uses(		; THRESH-LABEL: @maxi8_mutiple_uses(
; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16		; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0		; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1		; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
; THRESH-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; THRESH-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; THRESH-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; THRESH-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; THRESH-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; THRESH-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])
; THRESH-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP5]], [[RDX_SHUF]]
; THRESH-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP5]], <4 x i32> [[RDX_SHUF]]
; THRESH-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; THRESH-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; THRESH-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; THRESH-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; THRESH-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> undef, i32 [[TMP7]], i32 0		; THRESH-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> undef, i32 [[TMP7]], i32 0
; THRESH-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[TMP3]], i32 1		; THRESH-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[TMP3]], i32 1
; THRESH-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> undef, i32 [[TMP6]], i32 0		; THRESH-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> undef, i32 [[TMP6]], i32 0
; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1		; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1
; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP9]], [[TMP11]]		; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP9]], [[TMP11]]
; THRESH-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP9]], <2 x i32> [[TMP11]]		; THRESH-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP9]], <2 x i32> [[TMP11]]
; THRESH-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1		; THRESH-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
; THRESH-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0		; THRESH-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX-NEXT: br label [[PP:%.*]]		; AVX-NEXT: br label [[PP:%.*]]
; AVX: pp:		; AVX: pp:
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
; AVX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; AVX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; AVX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]		; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]		; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]		; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]		; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; AVX-NEXT: [[TMP14:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]		; AVX-NEXT: [[TMP14:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP5]]		; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP5]]
; AVX-NEXT: ret i32 [[OP_EXTRA]]		; AVX-NEXT: ret i32 [[OP_EXTRA]]
;		;
; THRESH-LABEL: @maxi8_wrong_parent(		; THRESH-LABEL: @maxi8_wrong_parent(
; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16		; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0		; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1		; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]		; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
; THRESH-NEXT: br label [[PP:%.*]]		; THRESH-NEXT: br label [[PP:%.*]]
; THRESH: pp:		; THRESH: pp:
; THRESH-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; THRESH-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; THRESH-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
; THRESH-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; THRESH-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; THRESH-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; THRESH-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; THRESH-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; THRESH-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]		; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]
; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]		; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]		; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> undef, i1 [[TMP12]], i32 0		; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> undef, i1 [[TMP12]], i32 0
; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1		; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1
; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[TMP11]], i32 0		; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[TMP11]], i32 0
; THRESH-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1		; THRESH-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1
; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> undef, i32 [[TMP8]], i32 0		; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> undef, i32 [[TMP8]], i32 0
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll

Show All 31 Lines
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]
; CHECK-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2		; CHECK-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2
; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]		; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]
; CHECK-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3		; CHECK-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3
; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]		; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>		; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]		; CHECK-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_033]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_033]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
; CHECK: for.end:		; CHECK: for.end:
Show All 16 Lines
; STORE-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]		; STORE-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]
; STORE-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2		; STORE-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2
; STORE-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]		; STORE-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]
; STORE-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]		; STORE-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]
; STORE-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*		; STORE-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4		; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; STORE-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>		; STORE-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[TMP4:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP3]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]		; STORE-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]
; STORE-NEXT: [[INC]] = add nsw i64 [[I_033]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_033]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
; STORE: for.end:		; STORE: for.end:
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]		; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]
; CHECK-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2		; CHECK-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2
; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]		; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]
; CHECK-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3		; CHECK-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3
; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]		; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP5]])
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]		; CHECK-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_040]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_040]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
; CHECK: for.end:		; CHECK: for.end:
Show All 21 Lines
; STORE-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]		; STORE-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]
; STORE-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2		; STORE-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2
; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]		; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]
; STORE-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]		; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]
; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; STORE-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]		; STORE-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[TMP6:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP5]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]		; STORE-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]
; STORE-NEXT: [[INC]] = add nsw i64 [[I_040]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_040]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
; STORE: for.end:		; STORE: for.end:
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]		; CHECK-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]
; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*
; CHECK-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4		; CHECK-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]		; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]
; CHECK-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8		; CHECK-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8
; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]		; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]
; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4		; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4
; CHECK-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]		; CHECK-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP6]])
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP6]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]		; CHECK-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]
; CHECK-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]		; CHECK-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_083]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_083]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
Show All 40 Lines
; STORE-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]		; STORE-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]
; STORE-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*		; STORE-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*
; STORE-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4		; STORE-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
; STORE-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]		; STORE-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]
; STORE-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8		; STORE-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8
; STORE-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]		; STORE-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]
; STORE-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4		; STORE-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4
; STORE-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]		; STORE-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP8:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP6]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP6]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]		; STORE-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]
; STORE-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]		; STORE-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]
; STORE-NEXT: [[INC]] = add nsw i64 [[I_083]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_083]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]		; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]
; CHECK-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2		; CHECK-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2
; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]		; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]
; CHECK-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3		; CHECK-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3
; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]		; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP5]])
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]		; CHECK-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_043]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_043]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
; CHECK: for.end:		; CHECK: for.end:
Show All 21 Lines
; STORE-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]		; STORE-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]
; STORE-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2		; STORE-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2
; STORE-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]		; STORE-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]
; STORE-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]		; STORE-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]
; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]		; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[TMP6:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP5]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]		; STORE-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]
; STORE-NEXT: [[INC]] = add nsw i64 [[I_043]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_043]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
; STORE: for.end:		; STORE: for.end:
▲ Show 20 Lines • Show All 440 Lines • ▼ Show 20 Lines
; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1135]]		; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1135]]
; STORE-NEXT: [[TMP1:%.]] = bitcast float [[B]] to <4 x float>*		; STORE-NEXT: [[TMP1:%.]] = bitcast float [[B]] to <4 x float>*
; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4		; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; STORE-NEXT: [[ADD1736:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1736:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1736]]		; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1736]]
; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP4]]		; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP4]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[TMP6:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP5]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: store float [[TMP6]], float* [[C_ADDR_038]], align 4		; STORE-NEXT: store float [[TMP6]], float* [[C_ADDR_038]], align 4
; STORE-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[C_ADDR_038]], i64 1		; STORE-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[C_ADDR_038]], i64 1
; STORE-NEXT: [[INC]] = add nsw i64 [[I_039]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_039]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]
; STORE: for.end:		; STORE: for.end:
; STORE-NEXT: ret i32 0		; STORE-NEXT: ret i32 0
;		;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 3), align 4		; CHECK-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 3), align 4
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP3]], [[ADD_1]]		; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP3]], [[ADD_1]]
; CHECK-NEXT: store float [[ADD_2]], float* [[RES:%.*]], align 16		; CHECK-NEXT: store float [[ADD_2]], float* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @float_red_example4(		; STORE-LABEL: @float_red_example4(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([32 x float]* @arr_float to <4 x float>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([32 x float]* @arr_float to <4 x float>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16		; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16		%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16
%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4		%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4
%add = fadd fast float %1, %0		%add = fadd fast float %1, %0
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8
Show All 23 Lines
; CHECK-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 7), align 4
; CHECK-NEXT: [[ADD_6:%.*]] = fadd fast float [[TMP7]], [[ADD_5]]		; CHECK-NEXT: [[ADD_6:%.*]] = fadd fast float [[TMP7]], [[ADD_5]]
; CHECK-NEXT: store float [[ADD_6]], float* [[RES:%.*]], align 16		; CHECK-NEXT: store float [[ADD_6]], float* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @float_red_example8(		; STORE-LABEL: @float_red_example8(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr_float to <8 x float>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr_float to <8 x float>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v8f32(float 0.000000e+00, <8 x float> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16		; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16		%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16
%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4		%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4
%add = fadd fast float %1, %0		%add = fadd fast float %1, %0
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 15), align 4		; CHECK-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 15), align 4
; CHECK-NEXT: [[ADD_14:%.*]] = fadd fast float [[TMP15]], [[ADD_13]]		; CHECK-NEXT: [[ADD_14:%.*]] = fadd fast float [[TMP15]], [[ADD_13]]
; CHECK-NEXT: store float [[ADD_14]], float* [[RES:%.*]], align 16		; CHECK-NEXT: store float [[ADD_14]], float* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @float_red_example16(		; STORE-LABEL: @float_red_example16(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr_float to <16 x float>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr_float to <16 x float>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP0]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v16f32(float 0.000000e+00, <16 x float> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16		; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16		%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16
%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4		%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4
%add = fadd fast float %1, %0		%add = fadd fast float %1, %0
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8
Show All 39 Lines
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 3), align 4		; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 3), align 4
; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[TMP3]], [[ADD_1]]		; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[TMP3]], [[ADD_1]]
; CHECK-NEXT: store i32 [[ADD_2]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_2]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example4(		; STORE-LABEL: @i32_red_example4(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([32 x i32]* @arr_i32 to <4 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([32 x i32]* @arr_i32 to <4 x i32>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
Show All 23 Lines
; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 7), align 4
; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 [[TMP7]], [[ADD_5]]		; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 [[TMP7]], [[ADD_5]]
; CHECK-NEXT: store i32 [[ADD_6]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_6]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example8(		; STORE-LABEL: @i32_red_example8(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 15), align 4		; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 15), align 4
; CHECK-NEXT: [[ADD_14:%.*]] = add nsw i32 [[TMP15]], [[ADD_13]]		; CHECK-NEXT: [[ADD_14:%.*]] = add nsw i32 [[TMP15]], [[ADD_13]]
; CHECK-NEXT: store i32 [[ADD_14]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_14]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example16(		; STORE-LABEL: @i32_red_example16(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr_i32 to <16 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr_i32 to <16 x i32>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = add <16 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX6:%.*]] = add <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP31:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 31), align 4		; CHECK-NEXT: [[TMP31:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 31), align 4
; CHECK-NEXT: [[ADD_30:%.*]] = add nsw i32 [[TMP31]], [[ADD_29]]		; CHECK-NEXT: [[ADD_30:%.*]] = add nsw i32 [[TMP31]], [[ADD_29]]
; CHECK-NEXT: store i32 [[ADD_30]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_30]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example32(		; STORE-LABEL: @i32_red_example32(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr_i32 to <32 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr_i32 to <32 x i32>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP0]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = add <32 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add <32 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add <32 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX6:%.*]] = add <32 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; STORE-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX8:%.*]] = add <32 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <32 x i32> [[BIN_RDX8]], i32 0
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
}		}

declare i32 @foobar(i32)		declare i32 @foobar(i32)

define void @i32_red_call(i32 %val) {		define void @i32_red_call(i32 %val) {
; CHECK-LABEL: @i32_red_call(		; CHECK-LABEL: @i32_red_call(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP0]])
; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP0]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])		; CHECK-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_call(		; STORE-LABEL: @i32_red_call(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])		; STORE-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
Show All 11 Lines	entry:
%res = call i32 @foobar(i32 %add.6)		%res = call i32 @foobar(i32 %add.6)
ret void		ret void
}		}

define void @i32_red_invoke(i32 %val) personality i32 (...)* @__gxx_personality_v0 {		define void @i32_red_invoke(i32 %val) personality i32 (...)* @__gxx_personality_v0 {
; CHECK-LABEL: @i32_red_invoke(		; CHECK-LABEL: @i32_red_invoke(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP0]])
; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP0]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])		; CHECK-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])
; CHECK-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]		; CHECK-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]
; CHECK: exception:		; CHECK: exception:
; CHECK-NEXT: [[CLEANUP:%.*]] = landingpad i8		; CHECK-NEXT: [[CLEANUP:%.*]] = landingpad i8
; CHECK-NEXT: cleanup		; CHECK-NEXT: cleanup
; CHECK-NEXT: br label [[NORMAL]]		; CHECK-NEXT: br label [[NORMAL]]
; CHECK: normal:		; CHECK: normal:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_invoke(		; STORE-LABEL: @i32_red_invoke(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP0]])
; STORE-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])		; STORE-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])
; STORE-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]		; STORE-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]
; STORE: exception:		; STORE: exception:
; STORE-NEXT: [[CLEANUP:%.*]] = landingpad i8		; STORE-NEXT: [[CLEANUP:%.*]] = landingpad i8
; STORE-NEXT: cleanup		; STORE-NEXT: cleanup
; STORE-NEXT: br label [[NORMAL]]		; STORE-NEXT: br label [[NORMAL]]
; STORE: normal:		; STORE: normal:
; STORE-NEXT: ret void		; STORE-NEXT: ret void
Show All 26 Lines

llvm/test/Transforms/SLPVectorizer/X86/reassociated-loads.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -reassociate -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-apple-macosx -mcpu=corei7-avx -mattr=+avx2 \| FileCheck %s			; RUN: opt -reassociate -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-apple-macosx -mcpu=corei7-avx -mattr=+avx2 \| FileCheck %s

	define signext i8 @Foo(<32 x i8>* %__v) {			define signext i8 @Foo(<32 x i8>* %__v) {
	; CHECK-LABEL: @Foo(			; CHECK-LABEL: @Foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <32 x i8>, <32 x i8> [[__V:%.*]], align 32			; CHECK-NEXT: [[TMP0:%.]] = load <32 x i8>, <32 x i8> [[__V:%.*]], align 32
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i8> [[TMP0]], <32 x i8> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = call i8 @llvm.experimental.vector.reduce.add.v32i8(<32 x i8> [[TMP0]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <32 x i8> [[TMP0]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i8> [[BIN_RDX]], <32 x i8> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <32 x i8> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i8> [[BIN_RDX2]], <32 x i8> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <32 x i8> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i8> [[BIN_RDX4]], <32 x i8> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = add <32 x i8> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i8> [[BIN_RDX6]], <32 x i8> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <32 x i8> [[BIN_RDX6]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <32 x i8> [[BIN_RDX8]], i32 0
	; CHECK-NEXT: ret i8 [[TMP1]]			; CHECK-NEXT: ret i8 [[TMP1]]
	;			;
	entry:			entry:
	%0 = load <32 x i8>, <32 x i8>* %__v, align 32			%0 = load <32 x i8>, <32 x i8>* %__v, align 32
	%vecext.i.i.i = extractelement <32 x i8> %0, i64 0			%vecext.i.i.i = extractelement <32 x i8> %0, i64 0
	%vecext.i.i.1.i = extractelement <32 x i8> %0, i64 1			%vecext.i.i.1.i = extractelement <32 x i8> %0, i64 1
	%add.i.1.i = add i8 %vecext.i.i.1.i, %vecext.i.i.i			%add.i.1.i = add i8 %vecext.i.i.1.i, %vecext.i.i.i
	%vecext.i.i.2.i = extractelement <32 x i8> %0, i64 2			%vecext.i.i.2.i = extractelement <32 x i8> %0, i64 2
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines

	define i32 @horiz_max_multiple_uses([32 x i32]* %x, i32* %p) {			define i32 @horiz_max_multiple_uses([32 x i32]* %x, i32* %p) {
	; CHECK-LABEL: @horiz_max_multiple_uses(			; CHECK-LABEL: @horiz_max_multiple_uses(
	; CHECK-NEXT: [[X0:%.]] = getelementptr [32 x i32], [32 x i32] [[X:%.*]], i64 0, i64 0			; CHECK-NEXT: [[X0:%.]] = getelementptr [32 x i32], [32 x i32] [[X:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[X4:%.]] = getelementptr [32 x i32], [32 x i32] [[X]], i64 0, i64 4			; CHECK-NEXT: [[X4:%.]] = getelementptr [32 x i32], [32 x i32] [[X]], i64 0, i64 4
	; CHECK-NEXT: [[X5:%.]] = getelementptr [32 x i32], [32 x i32] [[X]], i64 0, i64 5			; CHECK-NEXT: [[X5:%.]] = getelementptr [32 x i32], [32 x i32] [[X]], i64 0, i64 5
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[X0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[X0]] to <4 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[T4:%.]] = load i32, i32 [[X4]]			; CHECK-NEXT: [[T4:%.]] = load i32, i32 [[X4]], align 4
	; CHECK-NEXT: [[T5:%.]] = load i32, i32 [[X5]]			; CHECK-NEXT: [[T5:%.]] = load i32, i32 [[X5]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP2]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP2]], <4 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], [[T4]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], [[T4]]
	; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP3]], i32 [[T4]]			; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP3]], i32 [[T4]]
	; CHECK-NEXT: [[C012345:%.*]] = icmp sgt i32 [[TMP5]], [[T5]]			; CHECK-NEXT: [[C012345:%.*]] = icmp sgt i32 [[TMP5]], [[T5]]
	; CHECK-NEXT: [[T17:%.*]] = select i1 [[C012345]], i32 [[TMP5]], i32 [[T5]]			; CHECK-NEXT: [[T17:%.*]] = select i1 [[C012345]], i32 [[TMP5]], i32 [[T5]]
	; CHECK-NEXT: [[THREE_OR_FOUR:%.*]] = select i1 [[TMP4]], i32 3, i32 4			; CHECK-NEXT: [[THREE_OR_FOUR:%.*]] = select i1 [[TMP4]], i32 3, i32 4
	; CHECK-NEXT: store i32 [[THREE_OR_FOUR]], i32* [[P:%.*]], align 8			; CHECK-NEXT: store i32 [[THREE_OR_FOUR]], i32* [[P:%.*]], align 8
	; CHECK-NEXT: ret i32 [[T17]]			; CHECK-NEXT: ret i32 [[T17]]
	;			;
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction_loads.ll

	Show All 29 Lines
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = mul <8 x i32> [[TMP1]], <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>			; CHECK-NEXT: [[TMP2:%.*]] = mul <8 x i32> [[TMP1]], <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[SUM]]			; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[SUM]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP4]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]			; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[REORDER_SHUFFLE]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[REORDER_SHUFFLE]], [[TMP3]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP4]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]			; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll

	Show All 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = add i32 %1, %0			%mul.18 = add i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	Show All 31 Lines
	; AVX-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; AVX-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; AVX-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; AVX-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; AVX-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; AVX-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; AVX-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; AVX-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; AVX-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; AVX-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; AVX-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; AVX-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; AVX-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; AVX-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP2:%.*]] = call i32 @llvm.experimental.vector.reduce.mul.v8i32(<8 x i32> [[TMP1]])
	; AVX-NEXT: [[BIN_RDX:%.*]] = mul <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[BIN_RDX2:%.*]] = mul <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[BIN_RDX4:%.*]] = mul <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; AVX-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; AVX-NEXT: ret i32 [[TMP2]]			; AVX-NEXT: ret i32 [[TMP2]]
	;			;
	; SSE-LABEL: @test_mul(			; SSE-LABEL: @test_mul(
	; SSE-NEXT: entry:			; SSE-NEXT: entry:
	; SSE-NEXT: [[TMP0:%.]] = load i32, i32 [[P:%.*]], align 4			; SSE-NEXT: [[TMP0:%.]] = load i32, i32 [[P:%.*]], align 4
	; SSE-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1			; SSE-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
	; SSE-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX_1]], align 4			; SSE-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX_1]], align 4
	; SSE-NEXT: [[MUL_18:%.*]] = mul i32 [[TMP1]], [[TMP0]]			; SSE-NEXT: [[MUL_18:%.*]] = mul i32 [[TMP1]], [[TMP0]]
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.experimental.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = and <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = and i32 %1, %0			%mul.18 = and i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	Show All 31 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.experimental.vector.reduce.or.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = or <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = or <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = or i32 %1, %0			%mul.18 = or i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	Show All 31 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.experimental.vector.reduce.xor.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = xor <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = xor <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = xor <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = xor i32 %1, %0			%mul.18 = xor i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	Show All 23 Lines
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> [[SELF:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> [[SELF:%.*]], align 16
	; CHECK-NEXT: [[TMP1:%.*]] = shl <4 x i32> [[TMP0]], <i32 6, i32 2, i32 13, i32 3>			; CHECK-NEXT: [[TMP1:%.*]] = shl <4 x i32> [[TMP0]], <i32 6, i32 2, i32 13, i32 3>
	; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.*]] = lshr <4 x i32> [[TMP2]], <i32 13, i32 27, i32 21, i32 12>			; CHECK-NEXT: [[TMP3:%.*]] = lshr <4 x i32> [[TMP2]], <i32 13, i32 27, i32 21, i32 12>
	; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP0]], <i32 -2, i32 -8, i32 -16, i32 -128>			; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP0]], <i32 -2, i32 -8, i32 -16, i32 -128>
	; CHECK-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP4]], <i32 18, i32 2, i32 7, i32 13>			; CHECK-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP4]], <i32 18, i32 2, i32 7, i32 13>
	; CHECK-NEXT: [[TMP6:%.*]] = xor <4 x i32> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = xor <4 x i32> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[SELF]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[SELF]], align 16
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.vector.reduce.xor.v4i32(<4 x i32> [[TMP6]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = xor <4 x i32> [[TMP6]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = xor <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: ret i32 [[TMP7]]			; CHECK-NEXT: ret i32 [[TMP7]]
	;			;
	entry:			entry:
	%0 = load <4 x i32>, <4 x i32>* %self, align 16			%0 = load <4 x i32>, <4 x i32>* %self, align 16
	%1 = shl <4 x i32> %0, <i32 6, i32 2, i32 13, i32 3>			%1 = shl <4 x i32> %0, <i32 6, i32 2, i32 13, i32 3>
	%2 = xor <4 x i32> %1, %0			%2 = xor <4 x i32> %1, %0
	%3 = lshr <4 x i32> %2, <i32 13, i32 27, i32 21, i32 12>			%3 = lshr <4 x i32> %2, <i32 13, i32 27, i32 21, i32 12>
	%4 = and <4 x i32> %0, <i32 -2, i32 -8, i32 -16, i32 -128>			%4 = and <4 x i32> %0, <i32 -2, i32 -8, i32 -16, i32 -128>
	Show All 12 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_horcost.ll

	Show All 32 Lines
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*
	; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4			; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1			; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1
	; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2			; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2
	; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3			; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP13]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP13]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP13]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @hoge() {			define void @hoge() {
	; CHECK-LABEL: define {{[^@]+}}@hoge(			; CHECK-LABEL: @hoge(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br i1 undef, label [[BB1:%.]], label [[BB2:%.]]			; CHECK-NEXT: br i1 undef, label [[BB1:%.]], label [[BB2:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15			; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> undef, i16 [[T]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> undef, i16 [[T]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 undef, i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 undef, i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i16> [[TMP1]] to <2 x i32>			; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i16> [[TMP1]] to <2 x i32>
	; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> <i32 63, i32 undef>, [[REORDER_SHUFFLE]]			; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> <i32 63, i32 undef>, [[REORDER_SHUFFLE]]
	; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP3]], undef
	; CHECK-NEXT: [[SHUFFLE8:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE5:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[SHUFFLE8]], <i32 undef, i32 15, i32 31, i32 47>			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[SHUFFLE5]], <i32 undef, i32 15, i32 31, i32 47>
	; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt <4 x i32> [[TMP5]], [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP10]], <4 x i32> [[TMP5]], <4 x i32> [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_SHUF12:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT11]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP13:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT11]], [[RDX_SHUF12]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT14:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP13]], <4 x i32> [[RDX_MINMAX_SELECT11]], <4 x i32> [[RDX_SHUF12]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT14]], i32 0
	; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP6]], i32 undef			; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP6]], i32 undef
	; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63			; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63
	; CHECK-NEXT: [[TMP7:%.*]] = sub nsw <2 x i32> undef, [[TMP2]]			; CHECK-NEXT: [[TMP7:%.*]] = sub nsw <2 x i32> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP7]], undef			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP7]], undef
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>			; CHECK-NEXT: [[TMP9:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP9]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp slt <4 x i32> [[TMP9]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP9]], <4 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp slt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = icmp slt i32 [[TMP10]], undef			; CHECK-NEXT: [[TMP11:%.*]] = icmp slt i32 [[TMP10]], undef
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef			; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
	; CHECK-NEXT: [[TMP12:%.*]] = icmp slt i32 [[OP_EXTRA]], undef			; CHECK-NEXT: [[TMP12:%.*]] = icmp slt i32 [[OP_EXTRA]], undef
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = select i1 [[TMP12]], i32 [[OP_EXTRA]], i32 undef			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[TMP12]], i32 [[OP_EXTRA]], i32 undef
	; CHECK-NEXT: [[TMP13:%.*]] = icmp slt i32 [[OP_EXTRA4]], undef			; CHECK-NEXT: [[TMP13:%.*]] = icmp slt i32 [[OP_EXTRA1]], undef
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA4]], i32 undef			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA1]], i32 undef
	; CHECK-NEXT: [[TMP14:%.*]] = icmp slt i32 [[OP_EXTRA5]], undef			; CHECK-NEXT: [[TMP14:%.*]] = icmp slt i32 [[OP_EXTRA2]], undef
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = select i1 [[TMP14]], i32 [[OP_EXTRA5]], i32 undef			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = select i1 [[TMP14]], i32 [[OP_EXTRA2]], i32 undef
	; CHECK-NEXT: [[TMP15:%.*]] = icmp slt i32 [[OP_EXTRA6]], undef			; CHECK-NEXT: [[TMP15:%.*]] = icmp slt i32 [[OP_EXTRA3]], undef
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[TMP15]], i32 [[OP_EXTRA6]], i32 undef			; CHECK-NEXT: [[OP_EXTRA4:%.*]] = select i1 [[TMP15]], i32 [[OP_EXTRA3]], i32 undef
	; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_EXTRA7]]			; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_EXTRA4]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	bb:			bb:
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1: ; preds = %bb			bb1: ; preds = %bb
	ret void			ret void

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reverse_extract_elements.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s

	define float @dotf(<4 x float> %x, <4 x float> %y) {			define float @dotf(<4 x float> %x, <4 x float> %y) {
	; CHECK-LABEL: @dotf(			; CHECK-LABEL: @dotf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = fmul fast <4 x float> [[X:%.]], [[Y:%.*]]			; CHECK-NEXT: [[TMP0:%.]] = fmul fast <4 x float> [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP0]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP0]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: ret float [[TMP1]]			; CHECK-NEXT: ret float [[TMP1]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %x, i32 0			%vecext = extractelement <4 x float> %x, i32 0
	%vecext1 = extractelement <4 x float> %y, i32 0			%vecext1 = extractelement <4 x float> %y, i32 0
	%mul = fmul fast float %vecext, %vecext1			%mul = fmul fast float %vecext, %vecext1
	%vecext.1 = extractelement <4 x float> %x, i32 1			%vecext.1 = extractelement <4 x float> %x, i32 1
	%vecext1.1 = extractelement <4 x float> %y, i32 1			%vecext1.1 = extractelement <4 x float> %y, i32 1
	Show All 11 Lines
	}			}

	define double @dotd(<4 x double>* byval nocapture readonly align 32, <4 x double>* byval nocapture readonly align 32) {			define double @dotd(<4 x double>* byval nocapture readonly align 32, <4 x double>* byval nocapture readonly align 32) {
	; CHECK-LABEL: @dotd(			; CHECK-LABEL: @dotd(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[X:%.]] = load <4 x double>, <4 x double> [[TMP0:%.*]], align 32			; CHECK-NEXT: [[X:%.]] = load <4 x double>, <4 x double> [[TMP0:%.*]], align 32
	; CHECK-NEXT: [[Y:%.]] = load <4 x double>, <4 x double> [[TMP1:%.*]], align 32			; CHECK-NEXT: [[Y:%.]] = load <4 x double>, <4 x double> [[TMP1:%.*]], align 32
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x double> [[X]], [[Y]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x double> [[X]], [[Y]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = call fast double @llvm.experimental.vector.reduce.v2.fadd.f64.v4f64(double 0.000000e+00, <4 x double> [[TMP2]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x double> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x double> [[BIN_RDX]], <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x double> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: ret double [[TMP3]]			; CHECK-NEXT: ret double [[TMP3]]
	;			;
	entry:			entry:
	%x = load <4 x double>, <4 x double>* %0, align 32			%x = load <4 x double>, <4 x double>* %0, align 32
	%y = load <4 x double>, <4 x double>* %1, align 32			%y = load <4 x double>, <4 x double>* %1, align 32
	%vecext = extractelement <4 x double> %x, i32 0			%vecext = extractelement <4 x double> %x, i32 0
	%vecext1 = extractelement <4 x double> %y, i32 0			%vecext1 = extractelement <4 x double> %y, i32 0
	%mul = fmul fast double %vecext, %vecext1			%mul = fmul fast double %vecext, %vecext1
	Show All 13 Lines
	}			}

	define float @dotfq(<4 x float>* nocapture readonly %x, <4 x float>* nocapture readonly %y) {			define float @dotfq(<4 x float>* nocapture readonly %x, <4 x float>* nocapture readonly %y) {
	; CHECK-LABEL: @dotfq(			; CHECK-LABEL: @dotfq(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[X:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[X:%.*]], align 16
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[Y:%.*]], align 16			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[Y:%.*]], align 16
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float 0.000000e+00, <4 x float> [[TMP2]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: ret float [[TMP3]]			; CHECK-NEXT: ret float [[TMP3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %x, align 16			%0 = load <4 x float>, <4 x float>* %x, align 16
	%1 = load <4 x float>, <4 x float>* %y, align 16			%1 = load <4 x float>, <4 x float>* %y, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%vecext1 = extractelement <4 x float> %1, i32 0			%vecext1 = extractelement <4 x float> %1, i32 0
	%mul = fmul fast float %vecext1, %vecext			%mul = fmul fast float %vecext1, %vecext
	Show All 13 Lines
	}			}

	define double @dotdq(<4 x double>* nocapture readonly %x, <4 x double>* nocapture readonly %y) {			define double @dotdq(<4 x double>* nocapture readonly %x, <4 x double>* nocapture readonly %y) {
	; CHECK-LABEL: @dotdq(			; CHECK-LABEL: @dotdq(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x double>, <4 x double> [[X:%.*]], align 32			; CHECK-NEXT: [[TMP0:%.]] = load <4 x double>, <4 x double> [[X:%.*]], align 32
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[Y:%.*]], align 32			; CHECK-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> [[Y:%.*]], align 32
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x double> [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x double> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = call fast double @llvm.experimental.vector.reduce.v2.fadd.f64.v4f64(double 0.000000e+00, <4 x double> [[TMP2]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x double> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x double> [[BIN_RDX]], <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x double> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: ret double [[TMP3]]			; CHECK-NEXT: ret double [[TMP3]]
	;			;
	entry:			entry:
	%0 = load <4 x double>, <4 x double>* %x, align 32			%0 = load <4 x double>, <4 x double>* %x, align 32
	%1 = load <4 x double>, <4 x double>* %y, align 32			%1 = load <4 x double>, <4 x double>* %y, align 32
	%vecext = extractelement <4 x double> %0, i32 0			%vecext = extractelement <4 x double> %0, i32 0
	%vecext1 = extractelement <4 x double> %1, i32 0			%vecext1 = extractelement <4 x double> %1, i32 0
	%mul = fmul fast double %vecext1, %vecext			%mul = fmul fast double %vecext1, %vecext
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/scheduling.ll

	Show All 31 Lines
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*
	; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4			; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1			; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1
	; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2			; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2
	; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3			; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP13]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP13]])
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP13]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[ARRAYDECAY:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 0			; CHECK-NEXT: [[ARRAYDECAY:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 0
	; CHECK-NEXT: call void @ff([8 x i32]* [[ARRAYDECAY]])			; CHECK-NEXT: call void @ff([8 x i32]* [[ARRAYDECAY]])
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/undef_vect.ll

	Show All 10 Lines
	; CHECK-NEXT: [[DOTSROA_CAST_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 1
	; CHECK-NEXT: [[DOTSROA_CAST_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 1
	; CHECK-NEXT: [[DOTSROA_CAST_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[DOTSROA_CAST_4]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[DOTSROA_CAST_4]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP1]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 [[TMP2]], undef
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP3]], i32 [[TMP2]], i32 undef			; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP3]], i32 [[TMP2]], i32 undef
	; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[OP_EXTRA]], undef			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[OP_EXTRA]], undef
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[TMP4]], i32 [[OP_EXTRA]], i32 undef			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[TMP4]], i32 [[OP_EXTRA]], i32 undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_EXTRA7]]			; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_EXTRA1]]
	; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef			; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	for.body.lr.ph:			for.body.lr.ph:
	%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 0			%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 0
	%retval.sroa.0.0.copyload.i5.4 = load i32, i32* %.sroa_cast.4, align 4			%retval.sroa.0.0.copyload.i5.4 = load i32, i32* %.sroa_cast.4, align 4
	%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 1			%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 1
	%retval.sroa.0.0.copyload.i7.4 = load i32, i32* %.sroa_raw_idx.4, align 4			%retval.sroa.0.0.copyload.i7.4 = load i32, i32* %.sroa_raw_idx.4, align 4
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/used-reduced-op.ll

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <4 x i32> undef, i32 [[TMP30]], i32 0			; CHECK-NEXT: [[TMP33:%.*]] = insertelement <4 x i32> undef, i32 [[TMP30]], i32 0
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x i32> [[TMP33]], i32 [[TMP30]], i32 1			; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x i32> [[TMP33]], i32 [[TMP30]], i32 1
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <4 x i32> [[TMP34]], i32 [[TMP30]], i32 2			; CHECK-NEXT: [[TMP35:%.*]] = insertelement <4 x i32> [[TMP34]], i32 [[TMP30]], i32 2
	; CHECK-NEXT: [[TMP36:%.*]] = insertelement <4 x i32> [[TMP35]], i32 [[TMP30]], i32 3			; CHECK-NEXT: [[TMP36:%.*]] = insertelement <4 x i32> [[TMP35]], i32 [[TMP30]], i32 3
	; CHECK-NEXT: [[TMP37:%.*]] = sub <4 x i32> [[TMP36]], [[TMP1]]			; CHECK-NEXT: [[TMP37:%.*]] = sub <4 x i32> [[TMP36]], [[TMP1]]
	; CHECK-NEXT: [[TMP38:%.*]] = icmp slt <4 x i32> [[TMP37]], zeroinitializer			; CHECK-NEXT: [[TMP38:%.*]] = icmp slt <4 x i32> [[TMP37]], zeroinitializer
	; CHECK-NEXT: [[TMP39:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP37]]			; CHECK-NEXT: [[TMP39:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP37]]
	; CHECK-NEXT: [[TMP40:%.*]] = select <4 x i1> [[TMP38]], <4 x i32> [[TMP39]], <4 x i32> [[TMP37]]			; CHECK-NEXT: [[TMP40:%.*]] = select <4 x i1> [[TMP38]], <4 x i32> [[TMP39]], <4 x i32> [[TMP37]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP40]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP41:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP40]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp slt <4 x i32> [[TMP40]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP40]], <4 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp slt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP41:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
	; CHECK-NEXT: [[TMP42:%.*]] = icmp slt i32 [[TMP41]], [[TMP32]]			; CHECK-NEXT: [[TMP42:%.*]] = icmp slt i32 [[TMP41]], [[TMP32]]
	; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP42]], i32 [[TMP41]], i32 [[TMP32]]			; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP42]], i32 [[TMP41]], i32 [[TMP32]]
	; CHECK-NEXT: [[TMP44:%.*]] = icmp slt i32 [[TMP43]], [[B_0]]			; CHECK-NEXT: [[TMP44:%.*]] = icmp slt i32 [[TMP43]], [[B_0]]
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP44]], i32 [[TMP43]], i32 [[B_0]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP44]], i32 [[TMP43]], i32 [[B_0]]
	; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP30]], [[TMP2]]			; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP30]], [[TMP2]]
	; CHECK-NEXT: [[TMP45:%.*]] = icmp slt i32 [[SUB_1_1]], 0			; CHECK-NEXT: [[TMP45:%.*]] = icmp slt i32 [[SUB_1_1]], 0
	; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]			; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]
	; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP45]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]			; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP45]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]
	▲ Show 20 Lines • Show All 457 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

	Show All 12 Lines
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.experimental.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	%add2 = add i32 %0, %a2			%add2 = add i32 %0, %a2
	%add4 = add i32 %0, %a3			%add4 = add i32 %0, %a3
	Show All 34 Lines
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.experimental.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2			%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	Show All 38 Lines
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.experimental.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2			%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] form reduction intrinsics over raw IRClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 268860

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll

llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

llvm/test/Transforms/LoopVectorize/X86/pr35432.ll

llvm/test/Transforms/LoopVectorize/X86/pr42674.ll

llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll

llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll

llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-expanded.ll

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35628_1.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/PR40310.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll

llvm/test/Transforms/SLPVectorizer/X86/reassociated-loads.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction_loads.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_horcost.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

llvm/test/Transforms/SLPVectorizer/X86/reverse_extract_elements.ll

llvm/test/Transforms/SLPVectorizer/X86/scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/undef_vect.ll

llvm/test/Transforms/SLPVectorizer/X86/used-reduced-op.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

[x86] form reduction intrinsics over raw IR
ClosedPublic