This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
SLPVectorizer.cpp
-
test/Transforms/
-
Transforms/
-
PhaseOrdering/X86/
-
X86/
7/7
vector-reductions-logical.ll
-
SLPVectorizer/X86/
-
X86/
1/2
reduction-logical.ll

Differential D105730

[SLP] match logical and/or as reduction candidates
ClosedPublic

Authored by spatel on Jul 9 2021, 1:41 PM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
lebedev.ri

Commits

rG25ee55c0baff: [SLP] match logical and/or as reduction candidates

Summary

This has been a work-in-progress for a long time...we finally have all of the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and InstCombine to the poison-safe (select) forms of the logic ops, so now we need to have SLP recognize those patterns and insert a freeze op to make a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests show that we still need adjustments to get the ideal IR in some or all of the motivating cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Jul 9 2021, 1:41 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptJul 9 2021, 1:41 PM

spatel requested review of this revision.Jul 9 2021, 1:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2021, 1:41 PM

cc @aqjune @nikic @nlopes - if I got this right, it's a nice win from the poison and freeze efforts. :)

Matt added a subscriber: Matt.Jul 9 2021, 2:00 PM

RKSimon added inline comments.Jul 9 2021, 2:08 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

It doesn't have to be part of this - but should we be trying to fold these patterns to a reduction intrinsic ?

; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt <4 x float> [[T:%.*]], zeroinitializer
; CHECK-NEXT:    [[TMP1:%.*]] = freeze <4 x i1> [[TMP0]]
; CHECK-NEXT:    [[TMP3:%.*]] = call i1 llvm.vector.reduce.and.v4i1([[TMP1]])

RKSimon added inline comments.Jul 9 2021, 2:10 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448	any idea why we only match one of the reduction chains?

Harbormaster completed remote builds in B113279: Diff 357615.Jul 9 2021, 3:35 PM

dtemirbulatov added a subscriber: dtemirbulatov.Jul 9 2021, 4:46 PM

In D105730#2868072, @spatel wrote:

cc @aqjune @nikic @nlopes - if I got this right, it's a nice win from the poison and freeze efforts. :)

+1, thanks. :)

spatel added inline comments.Jul 10 2021, 6:00 AM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
13	We are forming a reduction intrinsic in SLP as we can see in the SLP-only tests. In this case, we have -O2, so a subsequent InstCombine turns it into bitcast+cmp via: https://github.com/llvm/llvm-project/blob/d919bca87556548555af0a7aa1239ea64ba4f3e8/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1966 Still need to check what (if any) difference that makes for codegen.
448	I haven't stepped through yet. We did make some adjustments for sorting the reduction ops in previous patches, but I doubt that extended to creating multiple reductions and/or re-running analysis after forming a reduction.

spatel marked 2 inline comments as done.Jul 10 2021, 6:15 AM

spatel added inline comments.

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448	To be more specific, this test should be adapted into an SLP-only test - the enhancement will need to happen within SLP to handle mapping reduction ops into multiple reductions in some way.

• post.kadirselcuk added a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 10 2021, 5:55 PM

• post.kadirselcuk added a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 8:06 PM

craig.topper removed a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 9:47 PM

In D105730#2868072, @spatel wrote:

cc @aqjune @nikic @nlopes - if I got this right, it's a nice win from the poison and freeze efforts. :)

very nice, thank you! 🚀

ABataev added inline comments.Jul 12 2021, 5:51 AM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448	Investigated it, looks like inefficiency in multi-node analysis (we're forming mixed operand nodes, like `{extract 0, extract 1, extract 2, extract 3, 0.0, 0.0, 0.0, 0.0}` and `{1.0, 1.0, 1.0, 1.0, extract 0, extract 1, extract 2, extract 3}`, which are considered as gathers). I hope this can be fixed by D101109.

spatel marked an inline comment as done.Jul 12 2021, 7:09 AM

spatel added inline comments.

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448	Thanks for checking. I'll add a basic SLP test to this patch, so we have minimal test coverage, and we can add more as needed.

Patch updated:

Added SLP-only test ("clamp") to show enhancement opportunity.
Rebased after D105392 (freeze is now hoisted higher in the PhaseOrdering tests).

Harbormaster completed remote builds in B113503: Diff 357938.Jul 12 2021, 7:16 AM

spatel mentioned this in rG0d17b5d0af6f: [SLP] add test for multiple logical reductions; NFC.Jul 12 2021, 7:20 AM

LGTM - happy for logical_and_icmp_clamp to be handled in a follow up - any more comments?

LGTM - cheers

This revision is now accepted and ready to land.Jul 14 2021, 2:18 AM

This revision was landed with ongoing or failed builds.Jul 14 2021, 6:04 AM

Closed by commit rG25ee55c0baff: [SLP] match logical and/or as reduction candidates (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG25ee55c0baff: [SLP] match logical and/or as reduction candidates.

efriedma added a subscriber: efriedma.Jul 15 2021, 11:15 AM

efriedma added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
187	Alive2 apparently doesn't like this. See https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=047d8ce24c780675&test=Transforms%2FSLPVectorizer%2FX86%2Freduction-logical.ll

spatel added inline comments.Jul 15 2021, 11:37 AM

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
187	Yes - @nlopes just noted this on the commit page/thread too ( https://reviews.llvm.org/rG25ee55c0baff ). I didn't notice that we had replaced select insts with logic, so we either need to prevent, restore, or freeze our way out of that.

spatel mentioned this in rG81ce3aa30cc2: [SLP] avoid leaking poison in reduction of safe boolean logic ops.Jul 15 2021, 2:34 PM

spatel mentioned this in rGd9abb15774c5: [SLP] add tests for poison-safe bool logic reductions; NFC.Jul 16 2021, 5:51 AM

The partial vectorization poison-leak is hopefully fixed with:
81ce3aa30cc2
And I added more test coverage:
d9abb15774c5
That includes a test where the reduction differs based on costs, and both variants passed Alive2 when I checked.

efriedma removed a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 17 2021, 3:02 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

36 lines

test/

Transforms/

PhaseOrdering/

X86/

vector-reductions-logical.ll

281 lines

SLPVectorizer/

X86/

reduction-logical.ll

118 lines

Diff 358575

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,289 Lines • ▼ Show 20 Lines	class HorizontalReduction {

const unsigned INVALID_OPERAND_INDEX = std::numeric_limits<unsigned>::max();		const unsigned INVALID_OPERAND_INDEX = std::numeric_limits<unsigned>::max();

static bool isCmpSelMinMax(Instruction *I) {		static bool isCmpSelMinMax(Instruction *I) {
return match(I, m_Select(m_Cmp(), m_Value(), m_Value())) &&		return match(I, m_Select(m_Cmp(), m_Value(), m_Value())) &&
RecurrenceDescriptor::isMinMaxRecurrenceKind(getRdxKind(I));		RecurrenceDescriptor::isMinMaxRecurrenceKind(getRdxKind(I));
}		}

		// And/or are potentially poison-safe logical patterns like:
		// select x, y, false
		// select x, true, y
		static bool isBoolLogicOp(Instruction *I) {
		return match(I, m_LogicalAnd(m_Value(), m_Value())) \|\|
		match(I, m_LogicalOr(m_Value(), m_Value()));
		}

/// Checks if instruction is associative and can be vectorized.		/// Checks if instruction is associative and can be vectorized.
static bool isVectorizable(RecurKind Kind, Instruction *I) {		static bool isVectorizable(RecurKind Kind, Instruction *I) {
if (Kind == RecurKind::None)		if (Kind == RecurKind::None)
return false;		return false;
if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(Kind))
		// Integer ops that map to select instructions or intrinsics are fine.
		if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(Kind) \|\|
		isBoolLogicOp(I))
return true;		return true;

if (Kind == RecurKind::FMax \|\| Kind == RecurKind::FMin) {		if (Kind == RecurKind::FMax \|\| Kind == RecurKind::FMin) {
// FP min/max are associative except for NaN and -0.0. We do not		// FP min/max are associative except for NaN and -0.0. We do not
// have to rule out -0.0 here because the intrinsic semantics do not		// have to rule out -0.0 here because the intrinsic semantics do not
// specify a fixed result for it.		// specify a fixed result for it.
return I->getFastMathFlags().noNaNs();		return I->getFastMathFlags().noNaNs();
}		}

return I->isAssociative();		return I->isAssociative();
}		}

		static Value getRdxOperand(Instruction I, unsigned Index) {
		// Poison-safe 'or' takes the form: select X, true, Y
		// To make that work with the normal operand processing, we skip the
		// true value operand.
		// TODO: Change the code and data structures to handle this without a hack.
		if (getRdxKind(I) == RecurKind::Or && isa<SelectInst>(I) && Index == 1)
		return I->getOperand(2);
		return I->getOperand(Index);
		}

/// Checks if the ParentStackElem.first should be marked as a reduction		/// Checks if the ParentStackElem.first should be marked as a reduction
/// operation with an extra argument or as extra argument itself.		/// operation with an extra argument or as extra argument itself.
void markExtraArg(std::pair<Instruction *, unsigned> &ParentStackElem,		void markExtraArg(std::pair<Instruction *, unsigned> &ParentStackElem,
Value *ExtraArg) {		Value *ExtraArg) {
if (ExtraArgs.count(ParentStackElem.first)) {		if (ExtraArgs.count(ParentStackElem.first)) {
ExtraArgs[ParentStackElem.first] = nullptr;		ExtraArgs[ParentStackElem.first] = nullptr;
// We ran into something like:		// We ran into something like:
// ParentStackElem.first = ExtraArgs[ParentStackElem.first] + ExtraArg.		// ParentStackElem.first = ExtraArgs[ParentStackElem.first] + ExtraArg.
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	class HorizontalReduction {

static RecurKind getRdxKind(Instruction *I) {		static RecurKind getRdxKind(Instruction *I) {
assert(I && "Expected instruction for reduction matching");		assert(I && "Expected instruction for reduction matching");
TargetTransformInfo::ReductionFlags RdxFlags;		TargetTransformInfo::ReductionFlags RdxFlags;
if (match(I, m_Add(m_Value(), m_Value())))		if (match(I, m_Add(m_Value(), m_Value())))
return RecurKind::Add;		return RecurKind::Add;
if (match(I, m_Mul(m_Value(), m_Value())))		if (match(I, m_Mul(m_Value(), m_Value())))
return RecurKind::Mul;		return RecurKind::Mul;
if (match(I, m_And(m_Value(), m_Value())))		if (match(I, m_And(m_Value(), m_Value())) \|\|
		match(I, m_LogicalAnd(m_Value(), m_Value())))
return RecurKind::And;		return RecurKind::And;
if (match(I, m_Or(m_Value(), m_Value())))		if (match(I, m_Or(m_Value(), m_Value())) \|\|
		match(I, m_LogicalOr(m_Value(), m_Value())))
return RecurKind::Or;		return RecurKind::Or;
if (match(I, m_Xor(m_Value(), m_Value())))		if (match(I, m_Xor(m_Value(), m_Value())))
return RecurKind::Xor;		return RecurKind::Xor;
if (match(I, m_FAdd(m_Value(), m_Value())))		if (match(I, m_FAdd(m_Value(), m_Value())))
return RecurKind::FAdd;		return RecurKind::FAdd;
if (match(I, m_FMul(m_Value(), m_Value())))		if (match(I, m_FMul(m_Value(), m_Value())))
return RecurKind::FMul;		return RecurKind::FMul;

▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	while (!Stack.empty()) {
addReductionOps(TreeN);		addReductionOps(TreeN);
}		}
// Retract.		// Retract.
Stack.pop_back();		Stack.pop_back();
continue;		continue;
}		}

// Visit operands.		// Visit operands.
Value *EdgeVal = TreeN->getOperand(EdgeToVisit);		Value *EdgeVal = getRdxOperand(TreeN, EdgeToVisit);
auto *EdgeInst = dyn_cast<Instruction>(EdgeVal);		auto *EdgeInst = dyn_cast<Instruction>(EdgeVal);
if (!EdgeInst) {		if (!EdgeInst) {
// Edge value is not a reduction instruction or a leaf instruction.		// Edge value is not a reduction instruction or a leaf instruction.
// (It may be a constant, function argument, or something else.)		// (It may be a constant, function argument, or something else.)
markExtraArg(Stack.back(), EdgeVal);		markExtraArg(Stack.back(), EdgeVal);
continue;		continue;
}		}
RecurKind EdgeRdxKind = getRdxKind(EdgeInst);		RecurKind EdgeRdxKind = getRdxKind(EdgeInst);
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {
// Emit a reduction. If the root is a select (min/max idiom), the insert		// Emit a reduction. If the root is a select (min/max idiom), the insert
// point is the compare condition of that select.		// point is the compare condition of that select.
Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);		Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);
if (isCmpSelMinMax(RdxRootInst))		if (isCmpSelMinMax(RdxRootInst))
Builder.SetInsertPoint(getCmpForMinMaxReduction(RdxRootInst));		Builder.SetInsertPoint(getCmpForMinMaxReduction(RdxRootInst));
else		else
Builder.SetInsertPoint(RdxRootInst);		Builder.SetInsertPoint(RdxRootInst);

		// To prevent poison from leaking across what used to be sequential, safe,
		// scalar boolean logic operations, the reduction operand must be frozen.
		if (isa<SelectInst>(RdxRootInst) && isBoolLogicOp(RdxRootInst))
		VectorizedRoot = Builder.CreateFreeze(VectorizedRoot);

Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);		emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);

if (!VectorizedTree) {		if (!VectorizedTree) {
// Initialize the final value in the reduction.		// Initialize the final value in the reduction.
VectorizedTree = ReducedSubTree;		VectorizedTree = ReducedSubTree;
} else {		} else {
// Update the final value in the reduction.		// Update the final value in the reduction.
▲ Show 20 Lines • Show All 942 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O2 -S < %s \| FileCheck %s			; RUN: opt -O2 -S < %s \| FileCheck %s

	target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64--"			target triple = "x86_64--"

	define float @test_merge_allof_v4sf(<4 x float> %t) {			define float @test_merge_allof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_merge_allof_v4sf(			; CHECK-LABEL: @test_merge_allof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x float> [[T:%.]], i32 0			; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x float> [[T:%.]]
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt float [[VECEXT]], 0.000000e+00			; CHECK-NEXT: [[TMP0:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer
	; CHECK-NEXT: [[VECEXT2:%.*]] = extractelement <4 x float> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt float [[VECEXT2]], 0.000000e+00			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i4 [[TMP1]], -1
				RKSimonUnsubmitted Done Reply Inline Actions It doesn't have to be part of this - but should we be trying to fold these patterns to a reduction intrinsic ? ; CHECK-NEXT: [[TMP0:%.]] = fcmp olt <4 x float> [[T:%.]], zeroinitializer ; CHECK-NEXT: [[TMP1:%.]] = freeze <4 x i1> [[TMP0]] ; CHECK-NEXT: [[TMP3:%.]] = call i1 llvm.vector.reduce.and.v4i1([[TMP1]]) RKSimon: It doesn't have to be part of this - but should we be trying to fold these patterns to a…
				spatelAuthorUnsubmitted Done Reply Inline Actions We are forming a reduction intrinsic in SLP as we can see in the SLP-only tests. In this case, we have -O2, so a subsequent InstCombine turns it into bitcast+cmp via: https://github.com/llvm/llvm-project/blob/d919bca87556548555af0a7aa1239ea64ba4f3e8/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1966 Still need to check what (if any) difference that makes for codegen. spatel: We are forming a reduction intrinsic in SLP as we can see in the SLP-only tests. In this case…
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[CMP4]], i1 false			; CHECK-NEXT: br i1 [[TMP2]], label [[COMMON_RET:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x float> [[T]], i32 2
	; CHECK-NEXT: [[CMP9:%.*]] = fcmp olt float [[VECEXT7]], 0.000000e+00
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 [[CMP9]], i1 false
	; CHECK-NEXT: [[VECEXT12:%.*]] = extractelement <4 x float> [[T]], i32 3
	; CHECK-NEXT: [[CMP14:%.*]] = fcmp olt float [[VECEXT12]], 0.000000e+00
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 [[CMP14]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND2]], label [[COMMON_RET:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK: common.ret:			; CHECK: common.ret:
	; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi float [ [[SPEC_SELECT:%.]], [[LOR_LHS_FALSE]] ], [ 0.000000e+00, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi float [ [[SPEC_SELECT:%.]], [[LOR_LHS_FALSE]] ], [ 0.000000e+00, [[ENTRY:%.*]] ]
	; CHECK-NEXT: ret float [[COMMON_RET_OP]]			; CHECK-NEXT: ret float [[COMMON_RET_OP]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[CMP18:%.*]] = fcmp ogt float [[VECEXT]], 1.000000e+00			; CHECK-NEXT: [[T_FR6:%.*]] = freeze <4 x float> [[T]]
	; CHECK-NEXT: [[CMP23:%.*]] = fcmp ogt float [[VECEXT2]], 1.000000e+00			; CHECK-NEXT: [[TMP3:%.*]] = fcmp ogt <4 x float> [[T_FR6]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[CMP18]], i1 [[CMP23]], i1 false			; CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x i1> [[TMP3]] to i4
	; CHECK-NEXT: [[CMP28:%.*]] = fcmp ogt float [[VECEXT7]], 1.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i4 [[TMP4]], -1
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 [[CMP28]], i1 false			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[T]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[CMP33:%.*]] = fcmp ogt float [[VECEXT12]], 1.000000e+00			; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[SHIFT]], [[T]]
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 [[CMP33]], i1 false			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[VECEXT]], [[VECEXT2]]			; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[TMP5]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[OR_COND5]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: br label [[COMMON_RET]]			; CHECK-NEXT: br label [[COMMON_RET]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %t, i32 0			%vecext = extractelement <4 x float> %t, i32 0
	%conv = fpext float %vecext to double			%conv = fpext float %vecext to double
	%cmp = fcmp olt double %conv, 0.000000e+00			%cmp = fcmp olt double %conv, 0.000000e+00
	br i1 %cmp, label %land.lhs.true, label %lor.lhs.false			br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4sf(<4 x float> %t) {			define float @test_merge_anyof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4sf(			; CHECK-LABEL: @test_merge_anyof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x float> [[T:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x float> [[T:%.]], i32 3
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt float [[VECEXT]], 0.000000e+00			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[T]], i32 2
	; CHECK-NEXT: [[VECEXT2:%.*]] = extractelement <4 x float> [[T]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[T]], i32 1
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt float [[VECEXT2]], 0.000000e+00			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[T]], i32 0
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 true, i1 [[CMP4]]			; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x float> [[T]]
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x float> [[T]], i32 2			; CHECK-NEXT: [[TMP4:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer
	; CHECK-NEXT: [[CMP9:%.*]] = fcmp olt float [[VECEXT7]], 0.000000e+00			; CHECK-NEXT: [[CMP19:%.*]] = fcmp ogt float [[TMP3]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 true, i1 [[CMP9]]			; CHECK-NEXT: [[CMP24:%.*]] = fcmp ogt float [[TMP2]], 1.000000e+00
	; CHECK-NEXT: [[VECEXT12:%.*]] = extractelement <4 x float> [[T]], i32 3			; CHECK-NEXT: [[CMP29:%.*]] = fcmp ogt float [[TMP1]], 1.000000e+00
	; CHECK-NEXT: [[CMP14:%.*]] = fcmp olt float [[VECEXT12]], 0.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 true, i1 [[CMP14]]			; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0
	; CHECK-NEXT: [[CMP19:%.*]] = fcmp ogt float [[VECEXT]], 1.000000e+00			; CHECK-NEXT: [[TMP7:%.*]] = or i1 [[TMP6]], [[CMP19]]
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[OR_COND2]], i1 true, i1 [[CMP19]]			; CHECK-NEXT: [[TMP8:%.*]] = or i1 [[TMP7]], [[CMP24]]
	; CHECK-NEXT: [[CMP24:%.*]] = fcmp ogt float [[VECEXT2]], 1.000000e+00			; CHECK-NEXT: [[TMP9:%.*]] = or i1 [[TMP8]], [[CMP29]]
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP24]]			; CHECK-NEXT: [[CMP34:%.*]] = fcmp ogt float [[TMP0]], 1.000000e+00
	; CHECK-NEXT: [[CMP29:%.*]] = fcmp ogt float [[VECEXT7]], 1.000000e+00			; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[TMP9]], i1 true, i1 [[CMP34]]
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP29]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[CMP34:%.*]] = fcmp ogt float [[VECEXT12]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP34]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[VECEXT]], [[VECEXT2]]
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[ADD]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %t, i32 0			%vecext = extractelement <4 x float> %t, i32 0
	%conv = fpext float %vecext to double			%conv = fpext float %vecext to double
	%cmp = fcmp olt double %conv, 0.000000e+00			%cmp = fcmp olt double %conv, 0.000000e+00
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_separate_allof_v4sf(<4 x float> %t) {			define float @test_separate_allof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_separate_allof_v4sf(			; CHECK-LABEL: @test_separate_allof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x float> [[T:%.]], i32 0			; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x float> [[T:%.]]
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt float [[VECEXT]], 0.000000e+00			; CHECK-NEXT: [[TMP0:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer
	; CHECK-NEXT: [[VECEXT2:%.*]] = extractelement <4 x float> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt float [[VECEXT2]], 0.000000e+00			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i4 [[TMP1]], -1
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[CMP4]], i1 false			; CHECK-NEXT: br i1 [[TMP2]], label [[COMMON_RET:%.]], label [[IF_END:%.]]
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x float> [[T]], i32 2
	; CHECK-NEXT: [[CMP9:%.*]] = fcmp olt float [[VECEXT7]], 0.000000e+00
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 [[CMP9]], i1 false
	; CHECK-NEXT: [[VECEXT12:%.*]] = extractelement <4 x float> [[T]], i32 3
	; CHECK-NEXT: [[CMP14:%.*]] = fcmp olt float [[VECEXT12]], 0.000000e+00
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 [[CMP14]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND2]], label [[COMMON_RET:%.]], label [[IF_END:%.]]
	; CHECK: common.ret:			; CHECK: common.ret:
	; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi float [ [[SPEC_SELECT:%.]], [[IF_END]] ], [ 0.000000e+00, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi float [ [[SPEC_SELECT:%.]], [[IF_END]] ], [ 0.000000e+00, [[ENTRY:%.*]] ]
	; CHECK-NEXT: ret float [[COMMON_RET_OP]]			; CHECK-NEXT: ret float [[COMMON_RET_OP]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[CMP18:%.*]] = fcmp ogt float [[VECEXT]], 1.000000e+00			; CHECK-NEXT: [[T_FR6:%.*]] = freeze <4 x float> [[T]]
	; CHECK-NEXT: [[CMP23:%.*]] = fcmp ogt float [[VECEXT2]], 1.000000e+00			; CHECK-NEXT: [[TMP3:%.*]] = fcmp ogt <4 x float> [[T_FR6]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[CMP18]], i1 [[CMP23]], i1 false			; CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x i1> [[TMP3]] to i4
	; CHECK-NEXT: [[CMP28:%.*]] = fcmp ogt float [[VECEXT7]], 1.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i4 [[TMP4]], -1
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 [[CMP28]], i1 false			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[T]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[CMP33:%.*]] = fcmp ogt float [[VECEXT12]], 1.000000e+00			; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[SHIFT]], [[T]]
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 [[CMP33]], i1 false			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[VECEXT]], [[VECEXT2]]			; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[TMP5]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[OR_COND5]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: br label [[COMMON_RET]]			; CHECK-NEXT: br label [[COMMON_RET]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %t, i32 0			%vecext = extractelement <4 x float> %t, i32 0
	%conv = fpext float %vecext to double			%conv = fpext float %vecext to double
	%cmp = fcmp olt double %conv, 0.000000e+00			%cmp = fcmp olt double %conv, 0.000000e+00
	br i1 %cmp, label %land.lhs.true, label %if.end			br i1 %cmp, label %land.lhs.true, label %if.end

	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ 0.000000e+00, %if.then35 ], [ %add, %if.end36 ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ 0.000000e+00, %if.then35 ], [ %add, %if.end36 ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_separate_anyof_v4sf(<4 x float> %t) {			define float @test_separate_anyof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_separate_anyof_v4sf(			; CHECK-LABEL: @test_separate_anyof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x float> [[T:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x float> [[T:%.]], i32 3
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt float [[VECEXT]], 0.000000e+00			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[T]], i32 2
	; CHECK-NEXT: [[VECEXT2:%.*]] = extractelement <4 x float> [[T]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[T]], i32 1
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt float [[VECEXT2]], 0.000000e+00			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[T]], i32 0
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 true, i1 [[CMP4]]			; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x float> [[T]]
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x float> [[T]], i32 2			; CHECK-NEXT: [[TMP4:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer
	; CHECK-NEXT: [[CMP9:%.*]] = fcmp olt float [[VECEXT7]], 0.000000e+00			; CHECK-NEXT: [[CMP18:%.*]] = fcmp ogt float [[TMP3]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 true, i1 [[CMP9]]			; CHECK-NEXT: [[CMP23:%.*]] = fcmp ogt float [[TMP2]], 1.000000e+00
	; CHECK-NEXT: [[VECEXT12:%.*]] = extractelement <4 x float> [[T]], i32 3			; CHECK-NEXT: [[CMP28:%.*]] = fcmp ogt float [[TMP1]], 1.000000e+00
	; CHECK-NEXT: [[CMP14:%.*]] = fcmp olt float [[VECEXT12]], 0.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 true, i1 [[CMP14]]			; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0
	; CHECK-NEXT: [[CMP18:%.*]] = fcmp ogt float [[VECEXT]], 1.000000e+00			; CHECK-NEXT: [[TMP7:%.*]] = or i1 [[TMP6]], [[CMP18]]
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[OR_COND2]], i1 true, i1 [[CMP18]]			; CHECK-NEXT: [[TMP8:%.*]] = or i1 [[TMP7]], [[CMP23]]
	; CHECK-NEXT: [[CMP23:%.*]] = fcmp ogt float [[VECEXT2]], 1.000000e+00			; CHECK-NEXT: [[TMP9:%.*]] = or i1 [[TMP8]], [[CMP28]]
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP23]]			; CHECK-NEXT: [[CMP33:%.*]] = fcmp ogt float [[TMP0]], 1.000000e+00
	; CHECK-NEXT: [[CMP28:%.*]] = fcmp ogt float [[VECEXT7]], 1.000000e+00			; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[TMP9]], i1 true, i1 [[CMP33]]
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP28]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[CMP33:%.*]] = fcmp ogt float [[VECEXT12]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP33]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[VECEXT]], [[VECEXT2]]
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[ADD]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %t, i32 0			%vecext = extractelement <4 x float> %t, i32 0
	%conv = fpext float %vecext to double			%conv = fpext float %vecext to double
	%cmp = fcmp olt double %conv, 0.000000e+00			%cmp = fcmp olt double %conv, 0.000000e+00
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ 0.000000e+00, %if.then35 ], [ %add, %if.end36 ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ 0.000000e+00, %if.then35 ], [ %add, %if.end36 ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_allof_v4si(<4 x i32> %t) {			define float @test_merge_allof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_merge_allof_v4si(			; CHECK-LABEL: @test_merge_allof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[T:%.]], i32 0			; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x i32> [[T:%.]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[VECEXT]], 1			; CHECK-NEXT: [[TMP0:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[VECEXT1:%.*]] = extractelement <4 x i32> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[VECEXT1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i4 [[TMP1]], -1
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[CMP2]], i1 false			; CHECK-NEXT: br i1 [[TMP2]], label [[RETURN:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK-NEXT: [[VECEXT4:%.*]] = extractelement <4 x i32> [[T]], i32 2
	; CHECK-NEXT: [[CMP5:%.*]] = icmp slt i32 [[VECEXT4]], 1
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 [[CMP5]], i1 false
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x i32> [[T]], i32 3
	; CHECK-NEXT: [[CMP8:%.*]] = icmp slt i32 [[VECEXT7]], 1
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 [[CMP8]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND2]], label [[RETURN:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[CMP10:%.*]] = icmp sgt i32 [[VECEXT]], 255			; CHECK-NEXT: [[T_FR6:%.*]] = freeze <4 x i32> [[T]]
	; CHECK-NEXT: [[CMP13:%.*]] = icmp sgt i32 [[VECEXT1]], 255			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[T_FR6]], <i32 255, i32 255, i32 255, i32 255>
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[CMP10]], i1 [[CMP13]], i1 false			; CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x i1> [[TMP3]] to i4
	; CHECK-NEXT: [[CMP16:%.*]] = icmp sgt i32 [[VECEXT4]], 255			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i4 [[TMP4]], -1
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 [[CMP16]], i1 false			; CHECK-NEXT: br i1 [[TMP5]], label [[RETURN]], label [[IF_END:%.*]]
	; CHECK-NEXT: [[CMP19:%.*]] = icmp sgt i32 [[VECEXT7]], 255
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 [[CMP19]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND5]], label [[RETURN]], label [[IF_END:%.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[VECEXT]], [[VECEXT1]]			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[SHIFT]], [[T]]
				; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP6]], i32 0
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: [[RETVAL_0:%.]] = phi float [ [[CONV]], [[IF_END]] ], [ 0.000000e+00, [[LOR_LHS_FALSE]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[RETVAL_0:%.]] = phi float [ [[CONV]], [[IF_END]] ], [ 0.000000e+00, [[LOR_LHS_FALSE]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4si(<4 x i32> %t) {			define float @test_merge_anyof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4si(			; CHECK-LABEL: @test_merge_anyof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[T:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x i32> [[T:%.]], i32 3
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[VECEXT]], 1			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[T]], i32 2
	; CHECK-NEXT: [[VECEXT1:%.*]] = extractelement <4 x i32> [[T]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[T]], i32 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[VECEXT1]], 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[T]], i32 0
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 true, i1 [[CMP2]]			; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x i32> [[T]]
	; CHECK-NEXT: [[VECEXT4:%.*]] = extractelement <4 x i32> [[T]], i32 2			; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[CMP5:%.*]] = icmp slt i32 [[VECEXT4]], 1			; CHECK-NEXT: [[CMP11:%.*]] = icmp sgt i32 [[TMP3]], 255
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 true, i1 [[CMP5]]			; CHECK-NEXT: [[CMP14:%.*]] = icmp sgt i32 [[TMP2]], 255
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x i32> [[T]], i32 3			; CHECK-NEXT: [[CMP17:%.*]] = icmp sgt i32 [[TMP1]], 255
	; CHECK-NEXT: [[CMP8:%.*]] = icmp slt i32 [[VECEXT7]], 1			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 true, i1 [[CMP8]]			; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0
	; CHECK-NEXT: [[CMP11:%.*]] = icmp sgt i32 [[VECEXT]], 255			; CHECK-NEXT: [[TMP7:%.*]] = or i1 [[TMP6]], [[CMP11]]
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[OR_COND2]], i1 true, i1 [[CMP11]]			; CHECK-NEXT: [[TMP8:%.*]] = or i1 [[TMP7]], [[CMP14]]
	; CHECK-NEXT: [[CMP14:%.*]] = icmp sgt i32 [[VECEXT1]], 255			; CHECK-NEXT: [[TMP9:%.*]] = or i1 [[TMP8]], [[CMP17]]
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP14]]			; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[TMP0]], 255
	; CHECK-NEXT: [[CMP17:%.*]] = icmp sgt i32 [[VECEXT4]], 255			; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[TMP9]], i1 true, i1 [[CMP20]]
				RKSimonUnsubmitted Done Reply Inline Actions any idea why we only match one of the reduction chains? RKSimon: any idea why we only match one of the reduction chains?
				spatelAuthorUnsubmitted Done Reply Inline Actions I haven't stepped through yet. We did make some adjustments for sorting the reduction ops in previous patches, but I doubt that extended to creating multiple reductions and/or re-running analysis after forming a reduction. spatel: I haven't stepped through yet. We did make some adjustments for sorting the reduction ops in…
				spatelAuthorUnsubmitted Done Reply Inline Actions To be more specific, this test should be adapted into an SLP-only test - the enhancement will need to happen within SLP to handle mapping reduction ops into multiple reductions in some way. spatel: To be more specific, this test should be adapted into an SLP-only test - the enhancement will…
				ABataevUnsubmitted Done Reply Inline Actions Investigated it, looks like inefficiency in multi-node analysis (we're forming mixed operand nodes, like `{extract 0, extract 1, extract 2, extract 3, 0.0, 0.0, 0.0, 0.0}` and `{1.0, 1.0, 1.0, 1.0, extract 0, extract 1, extract 2, extract 3}`, which are considered as gathers). I hope this can be fixed by D101109. ABataev: Investigated it, looks like inefficiency in multi-node analysis (we're forming mixed operand…
				spatelAuthorUnsubmitted Done Reply Inline Actions Thanks for checking. I'll add a basic SLP test to this patch, so we have minimal test coverage, and we can add more as needed. spatel: Thanks for checking. I'll add a basic SLP test to this patch, so we have minimal test coverage…
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP17]]			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[VECEXT7]], 255
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP20]]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[VECEXT]], [[VECEXT1]]
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[CONV]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[CONV]]
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	%cmp = icmp slt i32 %vecext, 1			%cmp = icmp slt i32 %vecext, 1
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define i32 @test_separate_allof_v4si(<4 x i32> %t) {			define i32 @test_separate_allof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_separate_allof_v4si(			; CHECK-LABEL: @test_separate_allof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[T:%.]], i32 0			; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x i32> [[T:%.]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[VECEXT]], 1			; CHECK-NEXT: [[TMP0:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[VECEXT1:%.*]] = extractelement <4 x i32> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[VECEXT1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i4 [[TMP1]], -1
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[CMP2]], i1 false			; CHECK-NEXT: br i1 [[TMP2]], label [[COMMON_RET:%.]], label [[IF_END:%.]]
	; CHECK-NEXT: [[VECEXT4:%.*]] = extractelement <4 x i32> [[T]], i32 2
	; CHECK-NEXT: [[CMP5:%.*]] = icmp slt i32 [[VECEXT4]], 1
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 [[CMP5]], i1 false
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x i32> [[T]], i32 3
	; CHECK-NEXT: [[CMP8:%.*]] = icmp slt i32 [[VECEXT7]], 1
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 [[CMP8]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND2]], label [[COMMON_RET:%.]], label [[IF_END:%.]]
	; CHECK: common.ret:			; CHECK: common.ret:
	; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi i32 [ [[SPEC_SELECT:%.]], [[IF_END]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi i32 [ [[SPEC_SELECT:%.]], [[IF_END]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]			; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[CMP10:%.*]] = icmp sgt i32 [[VECEXT]], 255			; CHECK-NEXT: [[T_FR6:%.*]] = freeze <4 x i32> [[T]]
	; CHECK-NEXT: [[CMP13:%.*]] = icmp sgt i32 [[VECEXT1]], 255			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[T_FR6]], <i32 255, i32 255, i32 255, i32 255>
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[CMP10]], i1 [[CMP13]], i1 false			; CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x i1> [[TMP3]] to i4
	; CHECK-NEXT: [[CMP16:%.*]] = icmp sgt i32 [[VECEXT4]], 255			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i4 [[TMP4]], -1
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 [[CMP16]], i1 false			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[CMP19:%.*]] = icmp sgt i32 [[VECEXT7]], 255			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[SHIFT]], [[T]]
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 [[CMP19]], i1 false			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP6]], i32 0
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[VECEXT]], [[VECEXT1]]			; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[TMP5]], i32 0, i32 [[ADD]]
	; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[OR_COND5]], i32 0, i32 [[ADD]]
	; CHECK-NEXT: br label [[COMMON_RET]]			; CHECK-NEXT: br label [[COMMON_RET]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	%cmp = icmp slt i32 %vecext, 1			%cmp = icmp slt i32 %vecext, 1
	br i1 %cmp, label %land.lhs.true, label %if.end			br i1 %cmp, label %land.lhs.true, label %if.end

	land.lhs.true:			land.lhs.true:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi i32 [ 0, %if.then ], [ 0, %if.then20 ], [ %add, %if.end21 ]			%retval.0 = phi i32 [ 0, %if.then ], [ 0, %if.then20 ], [ %add, %if.end21 ]
	ret i32 %retval.0			ret i32 %retval.0
	}			}

	define i32 @test_separate_anyof_v4si(<4 x i32> %t) {			define i32 @test_separate_anyof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_separate_anyof_v4si(			; CHECK-LABEL: @test_separate_anyof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VECEXT:%.]] = extractelement <4 x i32> [[T:%.]], i32 0			; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x i32> [[T:%.]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[VECEXT]], 1			; CHECK-NEXT: [[TMP0:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[VECEXT1:%.*]] = extractelement <4 x i32> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[VECEXT1]], 1			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i4 [[TMP1]], 0
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 true, i1 [[CMP2]]			; CHECK-NEXT: br i1 [[DOTNOT]], label [[IF_END:%.]], label [[COMMON_RET:%.]]
	; CHECK-NEXT: [[VECEXT4:%.*]] = extractelement <4 x i32> [[T]], i32 2
	; CHECK-NEXT: [[CMP5:%.*]] = icmp slt i32 [[VECEXT4]], 1
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 true, i1 [[CMP5]]
	; CHECK-NEXT: [[VECEXT7:%.*]] = extractelement <4 x i32> [[T]], i32 3
	; CHECK-NEXT: [[CMP8:%.*]] = icmp slt i32 [[VECEXT7]], 1
	; CHECK-NEXT: [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 true, i1 [[CMP8]]
	; CHECK-NEXT: br i1 [[OR_COND2]], label [[COMMON_RET:%.]], label [[IF_END:%.]]
	; CHECK: common.ret:			; CHECK: common.ret:
	; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi i32 [ [[SPEC_SELECT:%.]], [[IF_END]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[COMMON_RET_OP:%.]] = phi i32 [ [[SPEC_SELECT:%.]], [[IF_END]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]			; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[CMP10:%.*]] = icmp sgt i32 [[VECEXT]], 255			; CHECK-NEXT: [[T_FR6:%.*]] = freeze <4 x i32> [[T]]
	; CHECK-NEXT: [[CMP13:%.*]] = icmp sgt i32 [[VECEXT1]], 255			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[T_FR6]], <i32 255, i32 255, i32 255, i32 255>
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[CMP10]], i1 true, i1 [[CMP13]]			; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4
	; CHECK-NEXT: [[CMP16:%.*]] = icmp sgt i32 [[VECEXT4]], 255			; CHECK-NEXT: [[DOTNOT7:%.*]] = icmp eq i4 [[TMP3]], 0
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP16]]			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[CMP19:%.*]] = icmp sgt i32 [[VECEXT7]], 255			; CHECK-NEXT: [[TMP4:%.*]] = add nuw nsw <4 x i32> [[SHIFT]], [[T]]
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP19]]			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
	; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[VECEXT]], [[VECEXT1]]			; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[DOTNOT7]], i32 [[ADD]], i32 0
	; CHECK-NEXT: [[SPEC_SELECT]] = select i1 [[OR_COND5]], i32 0, i32 [[ADD]]
	; CHECK-NEXT: br label [[COMMON_RET]]			; CHECK-NEXT: br label [[COMMON_RET]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	%cmp = icmp slt i32 %vecext, 1			%cmp = icmp slt i32 %vecext, 1
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false

	lor.lhs.false:			lor.lhs.false:
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -S \| FileCheck %s

define i1 @logical_and_icmp(<4 x i32> %x) {		define i1 @logical_and_icmp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp(		; CHECK-LABEL: @logical_and_icmp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], zeroinitializer
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: ret i1 [[TMP3]]
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 0
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 0
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 0
; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 0
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
; CHECK-NEXT: ret i1 [[S3]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 0		%c0 = icmp slt i32 %x0, 0
%c1 = icmp slt i32 %x1, 0		%c1 = icmp slt i32 %x1, 0
%c2 = icmp slt i32 %x2, 0		%c2 = icmp slt i32 %x2, 0
%c3 = icmp slt i32 %x3, 0		%c3 = icmp slt i32 %x3, 0
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 %c2, i1 false		%s2 = select i1 %s1, i1 %c2, i1 false
%s3 = select i1 %s2, i1 %c3, i1 false		%s3 = select i1 %s2, i1 %c3, i1 false
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_or_icmp(<4 x i32> %x, <4 x i32> %y) {		define i1 @logical_or_icmp(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: @logical_or_icmp(		; CHECK-LABEL: @logical_or_icmp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP2]])
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: ret i1 [[TMP3]]
; CHECK-NEXT: [[Y0:%.]] = extractelement <4 x i32> [[Y:%.]], i32 0
; CHECK-NEXT: [[Y1:%.*]] = extractelement <4 x i32> [[Y]], i32 1
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i32> [[Y]], i32 2
; CHECK-NEXT: [[Y3:%.*]] = extractelement <4 x i32> [[Y]], i32 3
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], [[Y0]]
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], [[Y1]]
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], [[Y2]]
; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], [[Y3]]
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 true, i1 [[C1]]
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 true, i1 [[C2]]
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 true, i1 [[C3]]
; CHECK-NEXT: ret i1 [[S3]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%y0 = extractelement <4 x i32> %y, i32 0		%y0 = extractelement <4 x i32> %y, i32 0
%y1 = extractelement <4 x i32> %y, i32 1		%y1 = extractelement <4 x i32> %y, i32 1
%y2 = extractelement <4 x i32> %y, i32 2		%y2 = extractelement <4 x i32> %y, i32 2
%y3 = extractelement <4 x i32> %y, i32 3		%y3 = extractelement <4 x i32> %y, i32 3
%c0 = icmp slt i32 %x0, %y0		%c0 = icmp slt i32 %x0, %y0
%c1 = icmp slt i32 %x1, %y1		%c1 = icmp slt i32 %x1, %y1
%c2 = icmp slt i32 %x2, %y2		%c2 = icmp slt i32 %x2, %y2
%c3 = icmp slt i32 %x3, %y3		%c3 = icmp slt i32 %x3, %y3
%s1 = select i1 %c0, i1 true, i1 %c1		%s1 = select i1 %c0, i1 true, i1 %c1
%s2 = select i1 %s1, i1 true, i1 %c2		%s2 = select i1 %s1, i1 true, i1 %c2
%s3 = select i1 %s2, i1 true, i1 %c3		%s3 = select i1 %s2, i1 true, i1 %c3
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_and_fcmp(<4 x float> %x) {		define i1 @logical_and_fcmp(<4 x float> %x) {
; CHECK-LABEL: @logical_and_fcmp(		; CHECK-LABEL: @logical_and_fcmp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = fcmp olt <4 x float> [[X:%.]], zeroinitializer
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x float> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x float> [[X]], i32 3		; CHECK-NEXT: ret i1 [[TMP3]]
; CHECK-NEXT: [[C0:%.*]] = fcmp olt float [[X0]], 0.000000e+00
; CHECK-NEXT: [[C1:%.*]] = fcmp olt float [[X1]], 0.000000e+00
; CHECK-NEXT: [[C2:%.*]] = fcmp olt float [[X2]], 0.000000e+00
; CHECK-NEXT: [[C3:%.*]] = fcmp olt float [[X3]], 0.000000e+00
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
; CHECK-NEXT: ret i1 [[S3]]
;		;
%x0 = extractelement <4 x float> %x, i32 0		%x0 = extractelement <4 x float> %x, i32 0
%x1 = extractelement <4 x float> %x, i32 1		%x1 = extractelement <4 x float> %x, i32 1
%x2 = extractelement <4 x float> %x, i32 2		%x2 = extractelement <4 x float> %x, i32 2
%x3 = extractelement <4 x float> %x, i32 3		%x3 = extractelement <4 x float> %x, i32 3
%c0 = fcmp olt float %x0, 0.0		%c0 = fcmp olt float %x0, 0.0
%c1 = fcmp olt float %x1, 0.0		%c1 = fcmp olt float %x1, 0.0
%c2 = fcmp olt float %x2, 0.0		%c2 = fcmp olt float %x2, 0.0
%c3 = fcmp olt float %x3, 0.0		%c3 = fcmp olt float %x3, 0.0
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 %c2, i1 false		%s2 = select i1 %s1, i1 %c2, i1 false
%s3 = select i1 %s2, i1 %c3, i1 false		%s3 = select i1 %s2, i1 %c3, i1 false
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_or_fcmp(<4 x float> %x) {		define i1 @logical_or_fcmp(<4 x float> %x) {
; CHECK-LABEL: @logical_or_fcmp(		; CHECK-LABEL: @logical_or_fcmp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = fcmp olt <4 x float> [[X:%.]], zeroinitializer
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x float> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP2]])
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x float> [[X]], i32 3		; CHECK-NEXT: ret i1 [[TMP3]]
; CHECK-NEXT: [[C0:%.*]] = fcmp olt float [[X0]], 0.000000e+00
; CHECK-NEXT: [[C1:%.*]] = fcmp olt float [[X1]], 0.000000e+00
; CHECK-NEXT: [[C2:%.*]] = fcmp olt float [[X2]], 0.000000e+00
; CHECK-NEXT: [[C3:%.*]] = fcmp olt float [[X3]], 0.000000e+00
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 true, i1 [[C1]]
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 true, i1 [[C2]]
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 true, i1 [[C3]]
; CHECK-NEXT: ret i1 [[S3]]
;		;
%x0 = extractelement <4 x float> %x, i32 0		%x0 = extractelement <4 x float> %x, i32 0
%x1 = extractelement <4 x float> %x, i32 1		%x1 = extractelement <4 x float> %x, i32 1
%x2 = extractelement <4 x float> %x, i32 2		%x2 = extractelement <4 x float> %x, i32 2
%x3 = extractelement <4 x float> %x, i32 3		%x3 = extractelement <4 x float> %x, i32 3
%c0 = fcmp olt float %x0, 0.0		%c0 = fcmp olt float %x0, 0.0
%c1 = fcmp olt float %x1, 0.0		%c1 = fcmp olt float %x1, 0.0
%c2 = fcmp olt float %x2, 0.0		%c2 = fcmp olt float %x2, 0.0
Show All 30 Lines	;
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 %c2, i1 false		%s2 = select i1 %s1, i1 %c2, i1 false
%s3 = select i1 %s2, i1 %c3, i1 false		%s3 = select i1 %s2, i1 %c3, i1 false
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_and_icmp_diff_const(<4 x i32> %x) {		define i1 @logical_and_icmp_diff_const(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_diff_const(		; CHECK-LABEL: @logical_and_icmp_diff_const(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = icmp sgt <4 x i32> [[X:%.]], <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: ret i1 [[TMP3]]
; CHECK-NEXT: [[C0:%.*]] = icmp sgt i32 [[X0]], 0
; CHECK-NEXT: [[C1:%.*]] = icmp sgt i32 [[X1]], 1
; CHECK-NEXT: [[C2:%.*]] = icmp sgt i32 [[X2]], 2
; CHECK-NEXT: [[C3:%.*]] = icmp sgt i32 [[X3]], 3
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
; CHECK-NEXT: ret i1 [[S3]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp sgt i32 %x0, 0		%c0 = icmp sgt i32 %x0, 0
%c1 = icmp sgt i32 %x1, 1		%c1 = icmp sgt i32 %x1, 1
%c2 = icmp sgt i32 %x2, 2		%c2 = icmp sgt i32 %x2, 2
Show All 30 Lines	;
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 true, i1 %c2		%s2 = select i1 %s1, i1 true, i1 %c2
%s3 = select i1 %s2, i1 %c3, i1 false		%s3 = select i1 %s2, i1 %c3, i1 false
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_and_icmp_clamp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp(		; CHECK-LABEL: @logical_and_icmp_clamp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[X]], i32 0
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42		; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[X]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42		; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[TMP4]], 17
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42		; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[TMP3]], 17
; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42		; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[TMP2]], 17
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17		; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[TMP1]], 17
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17		; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17		; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17		; CHECK-NEXT: [[TMP8:%.*]] = and i1 [[TMP7]], [[D0]]
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false		; CHECK-NEXT: [[TMP9:%.*]] = and i1 [[TMP8]], [[D1]]
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; CHECK-NEXT: [[TMP10:%.*]] = and i1 [[TMP9]], [[D2]]
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false		; CHECK-NEXT: [[S7:%.*]] = select i1 [[TMP10]], i1 [[D3]], i1 false
		efriedmaUnsubmitted Not Done Reply Inline Actions Alive2 apparently doesn't like this. See https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=047d8ce24c780675&test=Transforms%2FSLPVectorizer%2FX86%2Freduction-logical.ll efriedma: Alive2 apparently doesn't like this. See https://web.ist.utl.pt/nuno.lopes/alive2/index.php?
		spatelAuthorUnsubmitted Done Reply Inline Actions Yes - @nlopes just noted this on the commit page/thread too ( https://reviews.llvm.org/rG25ee55c0baff ). I didn't notice that we had replaced select insts with logic, so we either need to prevent, restore, or freeze our way out of that. spatel: Yes - @nlopes just noted this on the commit page/thread too ( https://reviews.llvm.
; CHECK-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]		; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] match logical and/or as reduction candidatesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 358575

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

[SLP] match logical and/or as reduction candidates
ClosedPublic