This is an archive of the discontinued LLVM Phabricator instance.

[SLP] allow matching integer min/max intrinsics as reduction ops
ClosedPublic

Authored by spatel on Mar 19 2021, 12:53 PM.

Download Raw Diff

Details

Reviewers

ABataev
nikic
vdmitrie
RKSimon
mkazantsev

Commits

rGda381cf7ce05: [SLP] allow matching integer min/max intrinsics as reduction ops
rG3c8473ba534d: [SLP] allow matching integer min/max intrinsics as reduction ops

Summary

As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics.
Most of the real work to make this possible was in 7202f47508 .
But I am posting this for review just in case anyone sees or knows of other problems that may result from the switch to intrinsics.

I suspect that we will need to adjust the cost models or tests because the PhaseOrdering test in D98152 still doesn't vectorize with only x86 SSE2 (it does change if I add an attribute for SSE4.1).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Mar 19 2021, 12:53 PM

Herald added subscribers: pengfei, hiraditya, mcrosier. · View Herald TranscriptMar 19 2021, 12:53 PM

spatel requested review of this revision.Mar 19 2021, 12:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2021, 12:53 PM

Harbormaster completed remote builds in B94778: Diff 331981.Mar 19 2021, 3:21 PM

I suspect that we will need to adjust the cost models or tests because the PhaseOrdering test in D98152 still doesn't vectorize with only x86 SSE2 (it does change if I add an attribute for SSE4.1).

Taking a closer look at that example, and I think this will actually be an improvement (ie, we should adjust the test, not the cost model).

Currently, we are favoring vectorization based on x86 SSE2 costs, but it seems wrong...

Without vectorizing we have a chain of cmov:

movl	(%rdi), %eax
movl	4(%rdi), %ecx
cmpl	%eax, %ecx
cmovll	%ecx, %eax
movl	8(%rdi), %ecx
cmpl	%eax, %ecx
cmovll	%ecx, %eax
movl	12(%rdi), %ecx
cmpl	%eax, %ecx
cmovll	%ecx, %eax

With vectorization (but without the expected min/max instructions or even blendv), we have more code + transfer from xmm to GPR:

movdqu	(%rdi), %xmm0
pshufd	$238, %xmm0, %xmm1              # xmm1 = xmm0[2,3,2,3]
movdqa	%xmm1, %xmm2
pcmpgtd	%xmm0, %xmm2
pand	%xmm2, %xmm0
pandn	%xmm1, %xmm2
por	%xmm0, %xmm2
pshufd	$85, %xmm2, %xmm0               # xmm0 = xmm2[1,1,1,1]
movdqa	%xmm0, %xmm1
pcmpgtd	%xmm2, %xmm1
pand	%xmm1, %xmm2
pandn	%xmm0, %xmm1
por	%xmm2, %xmm1
movd	%xmm1, %eax

LGTM - I agree pre-SSE41 4i32 min/max patterns aren't particularly great so you should probably just ensure we have SSE4.1 + later test coverage in D98152

This revision is now accepted and ready to land.Mar 22 2021, 7:15 AM

Closed by commit rG3c8473ba534d: [SLP] allow matching integer min/max intrinsics as reduction ops (authored by spatel). · Explain WhyMar 23 2021, 5:58 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG3c8473ba534d: [SLP] allow matching integer min/max intrinsics as reduction ops.

spatel mentioned this in rG9d45daf4656e: [PhaseOrdering] add AVX attribute to make test less fragile; NFC.Mar 23 2021, 8:35 AM

FYI: this is causing bug https://bugs.llvm.org/show_bug.cgi?id=49730. It seems that there is more than one place scattered over the code that match exactly select instruction, and they need some update. I propose to revert this and re-enable after all such places are updated properly.

spatel added a reverting change: rGa26312f9d4f2: Revert "[SLP] allow matching integer min/max intrinsics as reduction ops".Mar 26 2021, 7:00 AM

Reopening - the original commit was reverted because it could crash ( https://llvm.org/PR49730 ).

This revision is now accepted and ready to land.Mar 26 2021, 11:41 AM

Patch updated:
This version includes a change to create min/max intrinsics only if we started by matching min/max intrinsics. We continue to create cmp+select if the original code matched that pattern. This avoids the crashing seen in PR49730. Hopefully, we can remove all of the select matching/creation after we canonicalize to the intrinsics.

We could pull the new part of this patch into an NFC-preliminary commit, but I could not find a way to show a test difference from that part alone.

Harbormaster completed remote builds in B95913: Diff 333602.Mar 26 2021, 12:24 PM

I confirm that with this version of patch the original failure is gone. For me it looks fine, but I'm not a compitent reviewer to approve it because I don't know SLP well enough. Thanks!

LGTM

In D98981#2654940, @mkazantsev wrote:

I confirm that with this version of patch the original failure is gone. For me it looks fine, but I'm not a compitent reviewer to approve it because I don't know SLP well enough. Thanks!

Thanks again for fuzz testing this! I can't say if there are more corner-cases that I haven't accounted for.
If we can't get this right, an alternative approach would be to give up on trying to handle both forms (cmp+select and intrinsic) of min/max. We could change SLP to only recognize min/max intrinsics simultaneously with the patch for instcombine that canonicalizes to those forms.

Closed by commit rGda381cf7ce05: [SLP] allow matching integer min/max intrinsics as reduction ops (authored by spatel). · Explain WhyMar 29 2021, 6:40 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGda381cf7ce05: [SLP] allow matching integer min/max intrinsics as reduction ops.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

90 lines

test/

Transforms/

SLPVectorizer/

X86/

horizontal-minmax.ll

129 lines

horizontal-smax.ll

83 lines

slp-umax-rdx-matcher-crash.ll

18 lines

Diff 333849

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,564 Lines • ▼ Show 20 Lines	if (ExtraArgs.count(ParentStackElem.first)) {
// We ran into something like:		// We ran into something like:
// ParentStackElem.first += ... + ExtraArg + ...		// ParentStackElem.first += ... + ExtraArg + ...
ExtraArgs[ParentStackElem.first] = ExtraArg;		ExtraArgs[ParentStackElem.first] = ExtraArg;
}		}
}		}

/// Creates reduction operation with the current opcode.		/// Creates reduction operation with the current opcode.
static Value createOp(IRBuilder<> &Builder, RecurKind Kind, Value LHS,		static Value createOp(IRBuilder<> &Builder, RecurKind Kind, Value LHS,
Value *RHS, const Twine &Name) {		Value *RHS, const Twine &Name, bool UseSelect) {
unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);		unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);
switch (Kind) {		switch (Kind) {
case RecurKind::Add:		case RecurKind::Add:
case RecurKind::Mul:		case RecurKind::Mul:
case RecurKind::Or:		case RecurKind::Or:
case RecurKind::And:		case RecurKind::And:
case RecurKind::Xor:		case RecurKind::Xor:
case RecurKind::FAdd:		case RecurKind::FAdd:
case RecurKind::FMul:		case RecurKind::FMul:
return Builder.CreateBinOp((Instruction::BinaryOps)RdxOpcode, LHS, RHS,		return Builder.CreateBinOp((Instruction::BinaryOps)RdxOpcode, LHS, RHS,
Name);		Name);
case RecurKind::FMax:		case RecurKind::FMax:
return Builder.CreateBinaryIntrinsic(Intrinsic::maxnum, LHS, RHS);		return Builder.CreateBinaryIntrinsic(Intrinsic::maxnum, LHS, RHS);
case RecurKind::FMin:		case RecurKind::FMin:
return Builder.CreateBinaryIntrinsic(Intrinsic::minnum, LHS, RHS);		return Builder.CreateBinaryIntrinsic(Intrinsic::minnum, LHS, RHS);
		case RecurKind::SMax:
case RecurKind::SMax: {		if (UseSelect) {
Value *Cmp = Builder.CreateICmpSGT(LHS, RHS, Name);		Value *Cmp = Builder.CreateICmpSGT(LHS, RHS, Name);
return Builder.CreateSelect(Cmp, LHS, RHS, Name);		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
}		}
case RecurKind::SMin: {		return Builder.CreateBinaryIntrinsic(Intrinsic::smax, LHS, RHS);
		case RecurKind::SMin:
		if (UseSelect) {
Value *Cmp = Builder.CreateICmpSLT(LHS, RHS, Name);		Value *Cmp = Builder.CreateICmpSLT(LHS, RHS, Name);
return Builder.CreateSelect(Cmp, LHS, RHS, Name);		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
}		}
case RecurKind::UMax: {		return Builder.CreateBinaryIntrinsic(Intrinsic::smin, LHS, RHS);
		case RecurKind::UMax:
		if (UseSelect) {
Value *Cmp = Builder.CreateICmpUGT(LHS, RHS, Name);		Value *Cmp = Builder.CreateICmpUGT(LHS, RHS, Name);
return Builder.CreateSelect(Cmp, LHS, RHS, Name);		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
}		}
case RecurKind::UMin: {		return Builder.CreateBinaryIntrinsic(Intrinsic::umax, LHS, RHS);
		case RecurKind::UMin:
		if (UseSelect) {
Value *Cmp = Builder.CreateICmpULT(LHS, RHS, Name);		Value *Cmp = Builder.CreateICmpULT(LHS, RHS, Name);
return Builder.CreateSelect(Cmp, LHS, RHS, Name);		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
}		}
		return Builder.CreateBinaryIntrinsic(Intrinsic::umin, LHS, RHS);
default:		default:
llvm_unreachable("Unknown reduction operation.");		llvm_unreachable("Unknown reduction operation.");
}		}
}		}

/// Creates reduction operation with the current opcode with the IR flags		/// Creates reduction operation with the current opcode with the IR flags
/// from \p ReductionOps.		/// from \p ReductionOps.
static Value createOp(IRBuilder<> &Builder, RecurKind RdxKind, Value LHS,		static Value createOp(IRBuilder<> &Builder, RecurKind RdxKind, Value LHS,
Value *RHS, const Twine &Name,		Value *RHS, const Twine &Name,
const ReductionOpsListType &ReductionOps) {		const ReductionOpsListType &ReductionOps) {
Value *Op = createOp(Builder, RdxKind, LHS, RHS, Name);		bool UseSelect = ReductionOps.size() == 2;
		assert((!UseSelect \|\| isa<SelectInst>(ReductionOps[1][0])) &&
		"Expected cmp + select pairs for reduction");
		Value *Op = createOp(Builder, RdxKind, LHS, RHS, Name, UseSelect);
if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(RdxKind)) {		if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(RdxKind)) {
if (auto *Sel = dyn_cast<SelectInst>(Op))		if (auto *Sel = dyn_cast<SelectInst>(Op)) {
propagateIRFlags(Sel->getCondition(), ReductionOps[0]);		propagateIRFlags(Sel->getCondition(), ReductionOps[0]);
propagateIRFlags(Op, ReductionOps[1]);		propagateIRFlags(Op, ReductionOps[1]);
return Op;		return Op;
}		}
		}
propagateIRFlags(Op, ReductionOps[0]);		propagateIRFlags(Op, ReductionOps[0]);
return Op;		return Op;
}		}
/// Creates reduction operation with the current opcode with the IR flags		/// Creates reduction operation with the current opcode with the IR flags
/// from \p I.		/// from \p I.
static Value createOp(IRBuilder<> &Builder, RecurKind RdxKind, Value LHS,		static Value createOp(IRBuilder<> &Builder, RecurKind RdxKind, Value LHS,
Value RHS, const Twine &Name, Instruction I) {		Value RHS, const Twine &Name, Instruction I) {
Value *Op = createOp(Builder, RdxKind, LHS, RHS, Name);		auto *SelI = dyn_cast<SelectInst>(I);
		Value *Op = createOp(Builder, RdxKind, LHS, RHS, Name, SelI != nullptr);
if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(RdxKind)) {		if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(RdxKind)) {
if (auto *Sel = dyn_cast<SelectInst>(Op))		if (auto *Sel = dyn_cast<SelectInst>(Op))
if (auto *SelI = dyn_cast<SelectInst>(I))
propagateIRFlags(Sel->getCondition(), SelI->getCondition());		propagateIRFlags(Sel->getCondition(), SelI->getCondition());
}		}
propagateIRFlags(Op, I);		propagateIRFlags(Op, I);
return Op;		return Op;
}		}

static RecurKind getRdxKind(Instruction *I) {		static RecurKind getRdxKind(Instruction *I) {
assert(I && "Expected instruction for reduction matching");		assert(I && "Expected instruction for reduction matching");
Show All 13 Lines	static RecurKind getRdxKind(Instruction *I) {
if (match(I, m_FMul(m_Value(), m_Value())))		if (match(I, m_FMul(m_Value(), m_Value())))
return RecurKind::FMul;		return RecurKind::FMul;

if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value())))		if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value())))
return RecurKind::FMax;		return RecurKind::FMax;
if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())))		if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())))
return RecurKind::FMin;		return RecurKind::FMin;

if (auto *Select = dyn_cast<SelectInst>(I)) {		// This matches either cmp+select or intrinsics. SLP is expected to handle
// These would also match llvm.{u,s}{min,max} intrinsic call		// either form.
// if were not guarded by the SelectInst check above.		// TODO: If we are canonicalizing to intrinsics, we can remove several
		// special-case paths that deal with selects.
if (match(I, m_SMax(m_Value(), m_Value())))		if (match(I, m_SMax(m_Value(), m_Value())))
return RecurKind::SMax;		return RecurKind::SMax;
if (match(I, m_SMin(m_Value(), m_Value())))		if (match(I, m_SMin(m_Value(), m_Value())))
return RecurKind::SMin;		return RecurKind::SMin;
if (match(I, m_UMax(m_Value(), m_Value())))		if (match(I, m_UMax(m_Value(), m_Value())))
return RecurKind::UMax;		return RecurKind::UMax;
if (match(I, m_UMin(m_Value(), m_Value())))		if (match(I, m_UMin(m_Value(), m_Value())))
return RecurKind::UMin;		return RecurKind::UMin;

		if (auto *Select = dyn_cast<SelectInst>(I)) {
// Try harder: look for min/max pattern based on instructions producing		// Try harder: look for min/max pattern based on instructions producing
// same values such as: select ((cmp Inst1, Inst2), Inst1, Inst2).		// same values such as: select ((cmp Inst1, Inst2), Inst1, Inst2).
// During the intermediate stages of SLP, it's very common to have		// During the intermediate stages of SLP, it's very common to have
// pattern like this (since optimizeGatherSequence is run only once		// pattern like this (since optimizeGatherSequence is run only once
// at the end):		// at the end):
// %1 = extractelement <2 x i32> %a, i32 0		// %1 = extractelement <2 x i32> %a, i32 0
// %2 = extractelement <2 x i32> %a, i32 1		// %2 = extractelement <2 x i32> %a, i32 1
// %cond = icmp sgt i32 %1, %2		// %cond = icmp sgt i32 %1, %2
▲ Show 20 Lines • Show All 697 Lines • ▼ Show 20 Lines

static bool matchRdxBop(Instruction I, Value &V0, Value *&V1) {		static bool matchRdxBop(Instruction I, Value &V0, Value *&V1) {
if (match(I, m_BinOp(m_Value(V0), m_Value(V1))))		if (match(I, m_BinOp(m_Value(V0), m_Value(V1))))
return true;		return true;
if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(V0), m_Value(V1))))		if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(V0), m_Value(V1))))
return true;		return true;
if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(V0), m_Value(V1))))		if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(V0), m_Value(V1))))
return true;		return true;
		if (match(I, m_Intrinsic<Intrinsic::smax>(m_Value(V0), m_Value(V1))))
		return true;
		if (match(I, m_Intrinsic<Intrinsic::smin>(m_Value(V0), m_Value(V1))))
		return true;
		if (match(I, m_Intrinsic<Intrinsic::umax>(m_Value(V0), m_Value(V1))))
		return true;
		if (match(I, m_Intrinsic<Intrinsic::umin>(m_Value(V0), m_Value(V1))))
		return true;
return false;		return false;
}		}

/// Attempt to reduce a horizontal reduction.		/// Attempt to reduce a horizontal reduction.
/// If it is legal to match a horizontal reduction feeding the phi node \a P		/// If it is legal to match a horizontal reduction feeding the phi node \a P
/// with reduction operators \a Root (or one of its operands) in a basic block		/// with reduction operators \a Root (or one of its operands) in a basic block
/// \a BB, then check if it can be done. If horizontal reduction is not found		/// \a BB, then check if it can be done. If horizontal reduction is not found
/// and root instruction is a binary operation, vectorization of the operands is		/// and root instruction is a binary operation, vectorization of the operands is
▲ Show 20 Lines • Show All 439 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

Show First 20 Lines • Show All 1,012 Lines • ▼ Show 20 Lines
; CHECK-LABEL: @smax_intrinsic_rdx_v8i32(		; CHECK-LABEL: @smax_intrinsic_rdx_v8i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 4		; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 5		; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 6		; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 7		; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 7
; CHECK-NEXT: [[T0:%.]] = load i32, i32 [[P0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <8 x i32>*
; CHECK-NEXT: [[T1:%.]] = load i32, i32 [[P1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load i32, i32 [[P2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load i32, i32 [[P3]], align 4		; CHECK-NEXT: ret i32 [[TMP3]]
; CHECK-NEXT: [[T4:%.]] = load i32, i32 [[P4]], align 4
; CHECK-NEXT: [[T5:%.]] = load i32, i32 [[P5]], align 4
; CHECK-NEXT: [[T6:%.]] = load i32, i32 [[P6]], align 4
; CHECK-NEXT: [[T7:%.]] = load i32, i32 [[P7]], align 4
; CHECK-NEXT: [[M10:%.*]] = tail call i32 @llvm.smax.i32(i32 [[T1]], i32 [[T0]])
; CHECK-NEXT: [[M32:%.*]] = tail call i32 @llvm.smax.i32(i32 [[T3]], i32 [[T2]])
; CHECK-NEXT: [[M54:%.*]] = tail call i32 @llvm.smax.i32(i32 [[T5]], i32 [[T4]])
; CHECK-NEXT: [[M76:%.*]] = tail call i32 @llvm.smax.i32(i32 [[T7]], i32 [[T6]])
; CHECK-NEXT: [[M3210:%.*]] = tail call i32 @llvm.smax.i32(i32 [[M32]], i32 [[M10]])
; CHECK-NEXT: [[M7654:%.*]] = tail call i32 @llvm.smax.i32(i32 [[M76]], i32 [[M54]])
; CHECK-NEXT: [[M:%.*]] = tail call i32 @llvm.smax.i32(i32 [[M7654]], i32 [[M3210]])
; CHECK-NEXT: ret i32 [[M]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%p2 = getelementptr inbounds i32, i32* %p0, i64 2		%p2 = getelementptr inbounds i32, i32* %p0, i64 2
%p3 = getelementptr inbounds i32, i32* %p0, i64 3		%p3 = getelementptr inbounds i32, i32* %p0, i64 3
%p4 = getelementptr inbounds i32, i32* %p0, i64 4		%p4 = getelementptr inbounds i32, i32* %p0, i64 4
%p5 = getelementptr inbounds i32, i32* %p0, i64 5		%p5 = getelementptr inbounds i32, i32* %p0, i64 5
%p6 = getelementptr inbounds i32, i32* %p0, i64 6		%p6 = getelementptr inbounds i32, i32* %p0, i64 6
%p7 = getelementptr inbounds i32, i32* %p0, i64 7		%p7 = getelementptr inbounds i32, i32* %p0, i64 7
Show All 19 Lines
; CHECK-LABEL: @smin_intrinsic_rdx_v8i16(		; CHECK-LABEL: @smin_intrinsic_rdx_v8i16(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; CHECK-NEXT: [[T0:%.]] = load i16, i16 [[P0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; CHECK-NEXT: [[T1:%.]] = load i16, i16 [[P1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load i16, i16 [[P2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load i16, i16 [[P3]], align 4		; CHECK-NEXT: ret i16 [[TMP3]]
; CHECK-NEXT: [[T4:%.]] = load i16, i16 [[P4]], align 4
; CHECK-NEXT: [[T5:%.]] = load i16, i16 [[P5]], align 4
; CHECK-NEXT: [[T6:%.]] = load i16, i16 [[P6]], align 4
; CHECK-NEXT: [[T7:%.]] = load i16, i16 [[P7]], align 4
; CHECK-NEXT: [[M10:%.*]] = tail call i16 @llvm.smin.i16(i16 [[T1]], i16 [[T0]])
; CHECK-NEXT: [[M32:%.*]] = tail call i16 @llvm.smin.i16(i16 [[T3]], i16 [[T2]])
; CHECK-NEXT: [[M54:%.*]] = tail call i16 @llvm.smin.i16(i16 [[T5]], i16 [[T4]])
; CHECK-NEXT: [[M76:%.*]] = tail call i16 @llvm.smin.i16(i16 [[T7]], i16 [[T6]])
; CHECK-NEXT: [[M3210:%.*]] = tail call i16 @llvm.smin.i16(i16 [[M32]], i16 [[M10]])
; CHECK-NEXT: [[M7654:%.*]] = tail call i16 @llvm.smin.i16(i16 [[M76]], i16 [[M54]])
; CHECK-NEXT: [[M:%.*]] = tail call i16 @llvm.smin.i16(i16 [[M7654]], i16 [[M3210]])
; CHECK-NEXT: ret i16 [[M]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%p4 = getelementptr inbounds i16, i16* %p0, i64 4		%p4 = getelementptr inbounds i16, i16* %p0, i64 4
%p5 = getelementptr inbounds i16, i16* %p0, i64 5		%p5 = getelementptr inbounds i16, i16* %p0, i64 5
%p6 = getelementptr inbounds i16, i16* %p0, i64 6		%p6 = getelementptr inbounds i16, i16* %p0, i64 6
%p7 = getelementptr inbounds i16, i16* %p0, i64 7		%p7 = getelementptr inbounds i16, i16* %p0, i64 7
Show All 11 Lines	;
%m76 = tail call i16 @llvm.smin.i16(i16 %t7, i16 %t6)		%m76 = tail call i16 @llvm.smin.i16(i16 %t7, i16 %t6)
%m3210 = tail call i16 @llvm.smin.i16(i16 %m32, i16 %m10)		%m3210 = tail call i16 @llvm.smin.i16(i16 %m32, i16 %m10)
%m7654 = tail call i16 @llvm.smin.i16(i16 %m76, i16 %m54)		%m7654 = tail call i16 @llvm.smin.i16(i16 %m76, i16 %m54)
%m = tail call i16 @llvm.smin.i16(i16 %m7654, i16 %m3210)		%m = tail call i16 @llvm.smin.i16(i16 %m7654, i16 %m3210)
ret i16 %m		ret i16 %m
}		}

define i64 @umax_intrinsic_rdx_v4i64(i64* %p0) {		define i64 @umax_intrinsic_rdx_v4i64(i64* %p0) {
; CHECK-LABEL: @umax_intrinsic_rdx_v4i64(		; DEFAULT-LABEL: @umax_intrinsic_rdx_v4i64(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i64, i64 [[P0:%.*]], i64 1		; DEFAULT-NEXT: [[P1:%.]] = getelementptr inbounds i64, i64 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i64, i64 [[P0]], i64 2		; DEFAULT-NEXT: [[P2:%.]] = getelementptr inbounds i64, i64 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i64, i64 [[P0]], i64 3		; DEFAULT-NEXT: [[P3:%.]] = getelementptr inbounds i64, i64 [[P0]], i64 3
; CHECK-NEXT: [[T0:%.]] = load i64, i64 [[P0]], align 4		; DEFAULT-NEXT: [[T0:%.]] = load i64, i64 [[P0]], align 4
; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[P1]], align 4		; DEFAULT-NEXT: [[T1:%.]] = load i64, i64 [[P1]], align 4
; CHECK-NEXT: [[T2:%.]] = load i64, i64 [[P2]], align 4		; DEFAULT-NEXT: [[T2:%.]] = load i64, i64 [[P2]], align 4
; CHECK-NEXT: [[T3:%.]] = load i64, i64 [[P3]], align 4		; DEFAULT-NEXT: [[T3:%.]] = load i64, i64 [[P3]], align 4
; CHECK-NEXT: [[M10:%.*]] = tail call i64 @llvm.umax.i64(i64 [[T1]], i64 [[T0]])		; DEFAULT-NEXT: [[M10:%.*]] = tail call i64 @llvm.umax.i64(i64 [[T1]], i64 [[T0]])
; CHECK-NEXT: [[M32:%.*]] = tail call i64 @llvm.umax.i64(i64 [[T3]], i64 [[T2]])		; DEFAULT-NEXT: [[M32:%.*]] = tail call i64 @llvm.umax.i64(i64 [[T3]], i64 [[T2]])
; CHECK-NEXT: [[M:%.*]] = tail call i64 @llvm.umax.i64(i64 [[M32]], i64 [[M10]])		; DEFAULT-NEXT: [[M:%.*]] = tail call i64 @llvm.umax.i64(i64 [[M32]], i64 [[M10]])
; CHECK-NEXT: ret i64 [[M]]		; DEFAULT-NEXT: ret i64 [[M]]
		;
		; THRESH-LABEL: @umax_intrinsic_rdx_v4i64(
		; THRESH-NEXT: [[P1:%.]] = getelementptr inbounds i64, i64 [[P0:%.*]], i64 1
		; THRESH-NEXT: [[P2:%.]] = getelementptr inbounds i64, i64 [[P0]], i64 2
		; THRESH-NEXT: [[P3:%.]] = getelementptr inbounds i64, i64 [[P0]], i64 3
		; THRESH-NEXT: [[TMP1:%.]] = bitcast i64 [[P0]] to <4 x i64>*
		; THRESH-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> [[TMP1]], align 4
		; THRESH-NEXT: [[TMP3:%.*]] = call i64 @llvm.vector.reduce.umax.v4i64(<4 x i64> [[TMP2]])
		; THRESH-NEXT: ret i64 [[TMP3]]
;		;
%p1 = getelementptr inbounds i64, i64* %p0, i64 1		%p1 = getelementptr inbounds i64, i64* %p0, i64 1
%p2 = getelementptr inbounds i64, i64* %p0, i64 2		%p2 = getelementptr inbounds i64, i64* %p0, i64 2
%p3 = getelementptr inbounds i64, i64* %p0, i64 3		%p3 = getelementptr inbounds i64, i64* %p0, i64 3
%t0 = load i64, i64* %p0, align 4		%t0 = load i64, i64* %p0, align 4
%t1 = load i64, i64* %p1, align 4		%t1 = load i64, i64* %p1, align 4
%t2 = load i64, i64* %p2, align 4		%t2 = load i64, i64* %p2, align 4
%t3 = load i64, i64* %p3, align 4		%t3 = load i64, i64* %p3, align 4
Show All 15 Lines
; CHECK-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; CHECK-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; CHECK-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; CHECK-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; CHECK-NEXT: [[PA:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; CHECK-NEXT: [[PA:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; CHECK-NEXT: [[PB:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; CHECK-NEXT: [[PB:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; CHECK-NEXT: [[PC:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; CHECK-NEXT: [[PC:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; CHECK-NEXT: [[PD:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; CHECK-NEXT: [[PD:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; CHECK-NEXT: [[PE:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; CHECK-NEXT: [[PE:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; CHECK-NEXT: [[PF:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; CHECK-NEXT: [[PF:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; CHECK-NEXT: [[T0:%.]] = load i8, i8 [[P0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; CHECK-NEXT: [[T1:%.]] = load i8, i8 [[P1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load i8, i8 [[P2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call i8 @llvm.vector.reduce.umin.v16i8(<16 x i8> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load i8, i8 [[P3]], align 4		; CHECK-NEXT: ret i8 [[TMP3]]
; CHECK-NEXT: [[T4:%.]] = load i8, i8 [[P4]], align 4
; CHECK-NEXT: [[T5:%.]] = load i8, i8 [[P5]], align 4
; CHECK-NEXT: [[T6:%.]] = load i8, i8 [[P6]], align 4
; CHECK-NEXT: [[T7:%.]] = load i8, i8 [[P7]], align 4
; CHECK-NEXT: [[T8:%.]] = load i8, i8 [[P8]], align 4
; CHECK-NEXT: [[T9:%.]] = load i8, i8 [[P9]], align 4
; CHECK-NEXT: [[TA:%.]] = load i8, i8 [[PA]], align 4
; CHECK-NEXT: [[TB:%.]] = load i8, i8 [[PB]], align 4
; CHECK-NEXT: [[TC:%.]] = load i8, i8 [[PC]], align 4
; CHECK-NEXT: [[TD:%.]] = load i8, i8 [[PD]], align 4
; CHECK-NEXT: [[TE:%.]] = load i8, i8 [[PE]], align 4
; CHECK-NEXT: [[TF:%.]] = load i8, i8 [[PF]], align 4
; CHECK-NEXT: [[M10:%.*]] = tail call i8 @llvm.umin.i8(i8 [[T1]], i8 [[T0]])
; CHECK-NEXT: [[M32:%.*]] = tail call i8 @llvm.umin.i8(i8 [[T3]], i8 [[T2]])
; CHECK-NEXT: [[M54:%.*]] = tail call i8 @llvm.umin.i8(i8 [[T5]], i8 [[T4]])
; CHECK-NEXT: [[M76:%.*]] = tail call i8 @llvm.umin.i8(i8 [[T7]], i8 [[T6]])
; CHECK-NEXT: [[M98:%.*]] = tail call i8 @llvm.umin.i8(i8 [[T9]], i8 [[T8]])
; CHECK-NEXT: [[MBA:%.*]] = tail call i8 @llvm.umin.i8(i8 [[TB]], i8 [[TA]])
; CHECK-NEXT: [[MDC:%.*]] = tail call i8 @llvm.umin.i8(i8 [[TD]], i8 [[TC]])
; CHECK-NEXT: [[MFE:%.*]] = tail call i8 @llvm.umin.i8(i8 [[TF]], i8 [[TE]])
; CHECK-NEXT: [[M3210:%.*]] = tail call i8 @llvm.umin.i8(i8 [[M32]], i8 [[M10]])
; CHECK-NEXT: [[M7654:%.*]] = tail call i8 @llvm.umin.i8(i8 [[M76]], i8 [[M54]])
; CHECK-NEXT: [[MDC98:%.*]] = tail call i8 @llvm.umin.i8(i8 [[MDC]], i8 [[M98]])
; CHECK-NEXT: [[MFEBA:%.*]] = tail call i8 @llvm.umin.i8(i8 [[MFE]], i8 [[MBA]])
; CHECK-NEXT: [[ML:%.*]] = tail call i8 @llvm.umin.i8(i8 [[M3210]], i8 [[M7654]])
; CHECK-NEXT: [[MH:%.*]] = tail call i8 @llvm.umin.i8(i8 [[MFEBA]], i8 [[MDC98]])
; CHECK-NEXT: [[M:%.*]] = tail call i8 @llvm.umin.i8(i8 [[MH]], i8 [[ML]])
; CHECK-NEXT: ret i8 [[M]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 38 Lines	;
%m = tail call i8 @llvm.umin.i8(i8 %mh, i8 %ml)		%m = tail call i8 @llvm.umin.i8(i8 %mh, i8 %ml)
ret i8 %m		ret i8 %m
}		}

; This should not crash.		; This should not crash.

define void @PR49730() {		define void @PR49730() {
; CHECK-LABEL: @PR49730(		; CHECK-LABEL: @PR49730(
; CHECK-NEXT: [[T:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 2)		; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
; CHECK-NEXT: [[T1:%.*]] = sub nsw i32 undef, [[T]]		; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
; CHECK-NEXT: [[T2:%.*]] = call i32 @llvm.umin.i32(i32 undef, i32 [[T1]])
; CHECK-NEXT: [[T3:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 2)
; CHECK-NEXT: [[T4:%.*]] = sub nsw i32 undef, [[T3]]
; CHECK-NEXT: [[T5:%.*]] = call i32 @llvm.umin.i32(i32 [[T2]], i32 [[T4]])
; CHECK-NEXT: [[T6:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 1)
; CHECK-NEXT: [[T7:%.*]] = sub nuw nsw i32 undef, [[T6]]
; CHECK-NEXT: [[T8:%.*]] = call i32 @llvm.umin.i32(i32 [[T5]], i32 [[T7]])
; CHECK-NEXT: [[T9:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 1)
; CHECK-NEXT: [[T10:%.*]] = sub nsw i32 undef, [[T9]]
; CHECK-NEXT: [[T11:%.*]] = call i32 @llvm.umin.i32(i32 [[T8]], i32 [[T10]])
; CHECK-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef		; CHECK-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
; CHECK-NEXT: [[T13:%.*]] = call i32 @llvm.umin.i32(i32 [[T11]], i32 [[T12]])		; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
; CHECK-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[T13]], i32 93)		; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])
		; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)
		; CHECK-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%t = call i32 @llvm.smin.i32(i32 undef, i32 2)		%t = call i32 @llvm.smin.i32(i32 undef, i32 2)
%t1 = sub nsw i32 undef, %t		%t1 = sub nsw i32 undef, %t
%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)		%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)
%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)		%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)
%t4 = sub nsw i32 undef, %t3		%t4 = sub nsw i32 undef, %t3
%t5 = call i32 @llvm.umin.i32(i32 %t2, i32 %t4)		%t5 = call i32 @llvm.umin.i32(i32 %t2, i32 %t4)
Show All 11 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown-linux -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown-linux -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
	; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX
	; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX

	@arr = local_unnamed_addr global [32 x i32] zeroinitializer, align 16			@arr = local_unnamed_addr global [32 x i32] zeroinitializer, align 16

	declare i32 @llvm.smax.i32(i32, i32)			declare i32 @llvm.smax.i32(i32, i32)

	define i32 @smax_v2i32(i32) {			define i32 @smax_v2i32(i32) {
	; CHECK-LABEL: @smax_v2i32(			; CHECK-LABEL: @smax_v2i32(
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP2]], i32 [[TMP3]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP2]], i32 [[TMP3]])
	; CHECK-NEXT: ret i32 [[TMP4]]			; CHECK-NEXT: ret i32 [[TMP4]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = call i32 @llvm.smax.i32(i32 %2, i32 %3)			%4 = call i32 @llvm.smax.i32(i32 %2, i32 %3)
	ret i32 %4			ret i32 %4
	}			}

	define i32 @smax_v4i32(i32) {			define i32 @smax_v4i32(i32) {
	; CHECK-LABEL: @smax_v4i32(			; SSE-LABEL: @smax_v4i32(
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; SSE-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			; SSE-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4			; SSE-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP2]], i32 [[TMP3]])			; SSE-NEXT: [[TMP6:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP2]], i32 [[TMP3]])
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP6]], i32 [[TMP4]])			; SSE-NEXT: [[TMP7:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP6]], i32 [[TMP4]])
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP7]], i32 [[TMP5]])			; SSE-NEXT: [[TMP8:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP7]], i32 [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP8]]			; SSE-NEXT: ret i32 [[TMP8]]
				;
				; AVX-LABEL: @smax_v4i32(
				; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([32 x i32]* @arr to <4 x i32>*), align 16
				; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP2]])
				; AVX-NEXT: ret i32 [[TMP3]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	%5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4			%5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
	%6 = call i32 @llvm.smax.i32(i32 %2, i32 %3)			%6 = call i32 @llvm.smax.i32(i32 %2, i32 %3)
	%7 = call i32 @llvm.smax.i32(i32 %6, i32 %4)			%7 = call i32 @llvm.smax.i32(i32 %6, i32 %4)
	%8 = call i32 @llvm.smax.i32(i32 %7, i32 %5)			%8 = call i32 @llvm.smax.i32(i32 %7, i32 %5)
	ret i32 %8			ret i32 %8
	}			}

	define i32 @smax_v8i32(i32) {			define i32 @smax_v8i32(i32) {
	; CHECK-LABEL: @smax_v8i32(			; CHECK-LABEL: @smax_v8i32(
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			; CHECK-NEXT: ret i32 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
	; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
	; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP2]], i32 [[TMP3]])
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP10]], i32 [[TMP4]])
	; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP11]], i32 [[TMP5]])
	; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP12]], i32 [[TMP6]])
	; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP13]], i32 [[TMP7]])
	; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP14]], i32 [[TMP8]])
	; CHECK-NEXT: [[TMP16:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP15]], i32 [[TMP9]])
	; CHECK-NEXT: ret i32 [[TMP16]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	%5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4			%5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
	%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16			%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
	%7 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4			%7 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
	%8 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			%8 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	%9 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			%9 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	%10 = call i32 @llvm.smax.i32(i32 %2, i32 %3)			%10 = call i32 @llvm.smax.i32(i32 %2, i32 %3)
	%11 = call i32 @llvm.smax.i32(i32 %10, i32 %4)			%11 = call i32 @llvm.smax.i32(i32 %10, i32 %4)
	%12 = call i32 @llvm.smax.i32(i32 %11, i32 %5)			%12 = call i32 @llvm.smax.i32(i32 %11, i32 %5)
	%13 = call i32 @llvm.smax.i32(i32 %12, i32 %6)			%13 = call i32 @llvm.smax.i32(i32 %12, i32 %6)
	%14 = call i32 @llvm.smax.i32(i32 %13, i32 %7)			%14 = call i32 @llvm.smax.i32(i32 %13, i32 %7)
	%15 = call i32 @llvm.smax.i32(i32 %14, i32 %8)			%15 = call i32 @llvm.smax.i32(i32 %14, i32 %8)
	%16 = call i32 @llvm.smax.i32(i32 %15, i32 %9)			%16 = call i32 @llvm.smax.i32(i32 %15, i32 %9)
	ret i32 %16			ret i32 %16
	}			}

	define i32 @smax_v16i32(i32) {			define i32 @smax_v16i32(i32) {
	; CHECK-LABEL: @smax_v16i32(			; CHECK-LABEL: @smax_v16i32(
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v16i32(<16 x i32> [[TMP2]])
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			; CHECK-NEXT: ret i32 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
	; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
	; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
	; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
	; CHECK-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
	; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
	; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
	; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
	; CHECK-NEXT: [[TMP16:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
	; CHECK-NEXT: [[TMP17:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
	; CHECK-NEXT: [[TMP18:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP2]], i32 [[TMP3]])
	; CHECK-NEXT: [[TMP19:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP18]], i32 [[TMP4]])
	; CHECK-NEXT: [[TMP20:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP19]], i32 [[TMP5]])
	; CHECK-NEXT: [[TMP21:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP20]], i32 [[TMP6]])
	; CHECK-NEXT: [[TMP22:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP21]], i32 [[TMP7]])
	; CHECK-NEXT: [[TMP23:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP22]], i32 [[TMP8]])
	; CHECK-NEXT: [[TMP24:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP23]], i32 [[TMP9]])
	; CHECK-NEXT: [[TMP25:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP24]], i32 [[TMP10]])
	; CHECK-NEXT: [[TMP26:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP25]], i32 [[TMP11]])
	; CHECK-NEXT: [[TMP27:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP26]], i32 [[TMP12]])
	; CHECK-NEXT: [[TMP28:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP27]], i32 [[TMP13]])
	; CHECK-NEXT: [[TMP29:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP28]], i32 [[TMP14]])
	; CHECK-NEXT: [[TMP30:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP29]], i32 [[TMP15]])
	; CHECK-NEXT: [[TMP31:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP30]], i32 [[TMP16]])
	; CHECK-NEXT: [[TMP32:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP31]], i32 [[TMP17]])
	; CHECK-NEXT: ret i32 [[TMP32]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	%5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4			%5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
	%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16			%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
	%7 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4			%7 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
	%8 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			%8 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	Show All 26 Lines

llvm/test/Transforms/SLPVectorizer/slp-umax-rdx-matcher-crash.ll

	Show All 37 Lines

	declare i32 @llvm.smin.i32(i32, i32)			declare i32 @llvm.smin.i32(i32, i32)
	declare i32 @llvm.umin.i32(i32, i32)			declare i32 @llvm.umin.i32(i32, i32)

	; Given LLVM IR caused crash in SLP.			; Given LLVM IR caused crash in SLP.
	define void @test2() {			define void @test2() {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SMIN0:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 0)			; CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>)
	; CHECK-NEXT: [[SMIN1:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 1)			; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <4 x i32> undef, [[TMP0]]
	; CHECK-NEXT: [[SMIN2:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 2)			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP1]])
	; CHECK-NEXT: [[SMIN3:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 3)			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP2]], i32 77)
	; CHECK-NEXT: [[A:%.*]] = sub nsw i32 undef, [[SMIN0]]			; CHECK-NEXT: [[E:%.*]] = icmp ugt i32 [[TMP3]], 1
	; CHECK-NEXT: [[B:%.*]] = sub nsw i32 undef, [[SMIN1]]
	; CHECK-NEXT: [[C:%.*]] = sub nsw i32 undef, [[SMIN2]]
	; CHECK-NEXT: [[D:%.*]] = sub nsw i32 undef, [[SMIN3]]
	; CHECK-NEXT: [[UMIN0:%.*]] = call i32 @llvm.umin.i32(i32 [[D]], i32 [[C]])
	; CHECK-NEXT: [[UMIN1:%.*]] = call i32 @llvm.umin.i32(i32 [[UMIN0]], i32 [[B]])
	; CHECK-NEXT: [[UMIN2:%.*]] = call i32 @llvm.umin.i32(i32 [[UMIN1]], i32 [[A]])
	; CHECK-NEXT: [[UMIN3:%.*]] = call i32 @llvm.umin.i32(i32 [[UMIN2]], i32 77)
	; CHECK-NEXT: [[E:%.*]] = icmp ugt i32 [[UMIN3]], 1
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%smin0 = call i32 @llvm.smin.i32(i32 undef, i32 0)			%smin0 = call i32 @llvm.smin.i32(i32 undef, i32 0)
	%smin1 = call i32 @llvm.smin.i32(i32 undef, i32 1)			%smin1 = call i32 @llvm.smin.i32(i32 undef, i32 1)
	%smin2 = call i32 @llvm.smin.i32(i32 undef, i32 2)			%smin2 = call i32 @llvm.smin.i32(i32 undef, i32 2)
	%smin3 = call i32 @llvm.smin.i32(i32 undef, i32 3)			%smin3 = call i32 @llvm.smin.i32(i32 undef, i32 3)
	%a = sub nsw i32 undef, %smin0			%a = sub nsw i32 undef, %smin0
	Show All 10 Lines