This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/2
SLPVectorizer.cpp
-
test/Transforms/
-
Transforms/
-
PhaseOrdering/X86/
-
X86/
-
vector-reductions-logical.ll
-
vector-reductions.ll
-
SLPVectorizer/X86/
-
X86/
-
insert-element-build-vector-inseltpoison.ll
-
insert-element-build-vector.ll
3/5
reduction-logical.ll

Differential D114799

[SLP]Improve vectorization of cmp instructions sequences.
ClosedPublic

Authored by ABataev on Nov 30 2021, 6:38 AM.

Download Raw Diff

Details

Reviewers

RKSimon
dtemirbulatov
vporpo
anton-afanasyev

Commits

rGddce6e05612d: [SLP]Improve vectorization of cmp instructions sequences.

Summary

Final attempt to vectorize bundles of comptatible cmp instructions after
all other instructions processing.

Metric: SLP.NumVectorInstructions

Program results results0 diff

   test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    1.00    5.00  400.0%
                         test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test    8.00   11.00   37.5%
               test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test   20.00   26.00   30.0%
           test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00   22.6%
          test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00   22.6%
                         test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test  102.00  124.00   21.6%
           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test  118.00  133.00   12.7%
     test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00    9.9%
      test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00    9.9%
                   test-suite :: MultiSource/Benchmarks/Olden/power/power.test   64.00   70.00    9.4%
      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00    9.2%
      test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test   50.00   54.00    8.0%
                   test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   27.00   29.00    7.4%
        test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00    7.3%
test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  694.00  738.00    6.3%
                   test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test  361.00  382.00    5.8%
                 test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  409.00  430.00    5.1%
test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  140.00  147.00    5.0%
 test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  140.00  147.00    5.0%
        test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00    4.8%
                  test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  966.00 1011.00    4.7%
                      test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   65.00   68.00    4.6%
                       test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00    3.8%
               test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00    3.2%
 test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   62.00   64.00    3.2%
test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   62.00   64.00    3.2%
            test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  852.00  877.00    2.9%
             test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  852.00  877.00    2.9%
                  test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00    2.7%
                    test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test   39.00   40.00    2.6%

test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 613.00 624.00 1.8%

 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  378.00  383.00    1.3%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  293.00  295.00    0.7%
       test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  297.00  299.00    0.7%
 test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00    0.2%
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00    0.2%

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Nov 30 2021, 6:38 AM

Herald added subscribers: dmgreen, hiraditya. · View Herald TranscriptNov 30 2021, 6:38 AM

ABataev requested review of this revision.Nov 30 2021, 6:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2021, 6:38 AM

Harbormaster completed remote builds in B136687: Diff 390692.Nov 30 2021, 7:14 AM

RKSimon requested changes to this revision.Nov 30 2021, 8:32 AM

RKSimon added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9219–9220	Please can do the NFC move of this down to its new location and then rebase so the diffs are more evident?

This revision now requires changes to proceed.Nov 30 2021, 8:32 AM

ABataev added inline comments.Nov 30 2021, 8:33 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9219–9220	Ok

Rebase

Harbormaster completed remote builds in B136725: Diff 390750.Nov 30 2021, 10:37 AM

RKSimon added inline comments.Nov 30 2021, 10:42 AM

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
111	Any idea what happened here?

ABataev added inline comments.Nov 30 2021, 11:00 AM

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
111	The pair of `icmp slt` gets vectorized because the cost model decided that it is profitable. We can't handle these `select`s as reduction because `icmp` instructions have different predicates.

RKSimon added inline comments.Nov 30 2021, 2:46 PM

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
111	But the sgt didn't get swapped and from the TODO at line 9298 I take it we can't do an altopcode/perdicate to handle the ult?

ABataev added inline comments.Nov 30 2021, 3:03 PM

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
111	Sorry, don't quite understand your question. Why sgt is not vectorized with ult? They are not comptaible. Why sgt, ult and 2 slts are not vectorized? 3 different predicates and they are not profitabke for vectorization (ending up with 2 gather nodes). TODO is for extra analysis of cmp operands, to select only cmps with same/alternate operands only (!) or constants only(!).

ABataev added inline comments.Nov 30 2021, 3:06 PM

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
111	PS. Not compatible - also end up with 2 gather nodes

Thanks for the clarification, LGTM

This revision is now accepted and ready to land.Dec 1 2021, 2:41 AM

This revision was landed with ongoing or failed builds.Dec 1 2021, 7:27 AM

Closed by commit rGddce6e05612d: [SLP]Improve vectorization of cmp instructions sequences. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGddce6e05612d: [SLP]Improve vectorization of cmp instructions sequences..

I think this change has performance win on SpecCPU 2017 508.namd_r by 6%. Run using --size train, showing running time.

With this change	Without this change	Diff
52.439	55.842	-6.094%

Hi, we observed ~9% increase in SLP.NumVectorInstructions on SPEC's 508.namd_r with this change, using llvm-test-suite. I noticed the number is not reported here. Curious did you also see the same result? We tested on an Intel Skylake.

In D114799#3226154, @zhuhan0 wrote:

Hi, we observed ~9% increase in SLP.NumVectorInstructions on SPEC's 508.namd_r with this change, using llvm-test-suite. I noticed the number is not reported here. Curious did you also see the same result? We tested on an Intel Skylake.

I ttied with -march=native on Skylake, but did not see wuch results.

In D114799#3226181, @ABataev wrote:

In D114799#3226154, @zhuhan0 wrote:

Hi, we observed ~9% increase in SLP.NumVectorInstructions on SPEC's 508.namd_r with this change, using llvm-test-suite. I noticed the number is not reported here. Curious did you also see the same result? We tested on an Intel Skylake.

I ttied with -march=native on Skylake, but did not see wuch results.

I see. -march=native is the difference. I can reproduce a no change on namd with that flag. However, the number of vector instructions decreased significantly compared with removing that flag, which I found counter-intuitive.

With -march=native

	With this patch	Without this patch
SLP.NumVectorInstructions	4446	4446

Without -march=native

	With this patch	Without this patch
SLP.NumVectorInstructions	5939	5473

In D114799#3227965, @zhuhan0 wrote:

In D114799#3226181, @ABataev wrote:

In D114799#3226154, @zhuhan0 wrote:

Hi, we observed ~9% increase in SLP.NumVectorInstructions on SPEC's 508.namd_r with this change, using llvm-test-suite. I noticed the number is not reported here. Curious did you also see the same result? We tested on an Intel Skylake.

I ttied with -march=native on Skylake, but did not see wuch results.

I see. -march=native is the difference. I can reproduce a no change on namd with that flag. However, the number of vector instructions decreased significantly compared with removing that flag, which I found counter-intuitive.

With -march=native

With this patch Without this patch

SLP.NumVectorInstructions 4446 4446

Without -march=native

With this patch Without this patch

SLP.NumVectorInstructions 5939 5473

Sorry,

In D114799#3227965, @zhuhan0 wrote:

In D114799#3226181, @ABataev wrote:

In D114799#3226154, @zhuhan0 wrote:

Hi, we observed ~9% increase in SLP.NumVectorInstructions on SPEC's 508.namd_r with this change, using llvm-test-suite. I noticed the number is not reported here. Curious did you also see the same result? We tested on an Intel Skylake.

I ttied with -march=native on Skylake, but did not see wuch results.

I see. -march=native is the difference. I can reproduce a no change on namd with that flag. However, the number of vector instructions decreased significantly compared with removing that flag, which I found counter-intuitive.

With -march=native

With this patch Without this patch

SLP.NumVectorInstructions 4446 4446

Without -march=native

With this patch Without this patch

SLP.NumVectorInstructions 5939 5473

Sorry, the number of vector instructions decreased or increased?

What I meant is, the number of vector instructions decreased after adding -march=native, compared with not adding -march=native, with or without this patch. In terms of this patch, the number increased significantly in our test which didn't add -march=native. See the above table for numbers.

In D114799#3227965, @zhuhan0 wrote:

In D114799#3226181, @ABataev wrote:

In D114799#3226154, @zhuhan0 wrote:

Hi, we observed ~9% increase in SLP.NumVectorInstructions on SPEC's 508.namd_r with this change, using llvm-test-suite. I noticed the number is not reported here. Curious did you also see the same result? We tested on an Intel Skylake.

I ttied with -march=native on Skylake, but did not see wuch results.

I see. -march=native is the difference. I can reproduce a no change on namd with that flag. However, the number of vector instructions decreased significantly compared with removing that flag, which I found counter-intuitive.

With -march=native

With this patch Without this patch

SLP.NumVectorInstructions 4446 4446

Without -march=native

With this patch Without this patch

SLP.NumVectorInstructions 5939 5473

Ah, ok, got it. Actually, it is normal. Without -march=native you compile for the generic CPU, while -march=native causes to compile for Skylake with AVX support. It may cause different vectorization (differences in the cost model, different register vector sizes, etc.).

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

57 lines

test/

Transforms/

PhaseOrdering/

X86/

vector-reductions-logical.ll

33 lines

vector-reductions.ll

25 lines

SLPVectorizer/

X86/

insert-element-build-vector-inseltpoison.ll

20 lines

insert-element-build-vector.ll

20 lines

reduction-logical.ll

93 lines

Diff 391019

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,210 Lines • ▼ Show 20 Lines	if (!findBuildAggregate(IEI, TTI, BuildVectorOpds, BuildVectorInsts) \|\|
BuildVectorOpds,		BuildVectorOpds,
[](Value *V) { return isa<ExtractElementInst, UndefValue>(V); }) &&		[](Value *V) { return isa<ExtractElementInst, UndefValue>(V); }) &&
isFixedVectorShuffle(BuildVectorOpds, Mask)))		isFixedVectorShuffle(BuildVectorOpds, Mask)))
return false;		return false;

LLVM_DEBUG(dbgs() << "SLP: array mappable to vector: " << *IEI << "\n");		LLVM_DEBUG(dbgs() << "SLP: array mappable to vector: " << *IEI << "\n");
return tryToVectorizeList(BuildVectorInsts, R);		return tryToVectorizeList(BuildVectorInsts, R);
}		}

template <typename T>		template <typename T>
RKSimonUnsubmitted Not Done Reply Inline Actions Please can do the NFC move of this down to its new location and then rebase so the diffs are more evident? RKSimon: Please can do the NFC move of this down to its new location and then rebase so the diffs are…
ABataevAuthorUnsubmitted Done Reply Inline Actions Ok ABataev: Ok
static bool		static bool
tryToVectorizeSequence(SmallVectorImpl<T *> &Incoming,		tryToVectorizeSequence(SmallVectorImpl<T *> &Incoming,
function_ref<unsigned(T *)> Limit,		function_ref<unsigned(T *)> Limit,
function_ref<bool(T , T )> Comparator,		function_ref<bool(T , T )> Comparator,
function_ref<bool(T , T )> AreCompatible,		function_ref<bool(T , T )> AreCompatible,
function_ref<bool(ArrayRef<T *>, bool)> TryToVectorize,		function_ref<bool(ArrayRef<T *>, bool)> TryToVectorize,
bool LimitForRegisterSize) {		bool LimitForRegisterSize) {
bool Changed = false;		bool Changed = false;
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	for (Instruction *I : PostponedCmps) {
OpsChanged \|= vectorizeRootInstruction(nullptr, Op, BB, R, TTI);		OpsChanged \|= vectorizeRootInstruction(nullptr, Op, BB, R, TTI);
}		}
// Try to vectorize operands as vector bundles.		// Try to vectorize operands as vector bundles.
for (Instruction *I : PostponedCmps) {		for (Instruction *I : PostponedCmps) {
if (R.isDeleted(I))		if (R.isDeleted(I))
continue;		continue;
OpsChanged \|= tryToVectorize(I, R);		OpsChanged \|= tryToVectorize(I, R);
}		}
		// Try to vectorize list of compares.
		// Sort by type, compare predicate, etc.
		// TODO: Add analysis on the operand opcodes (profitable to vectorize
		// instructions with same/alternate opcodes/const values).
		auto &&CompareSorter = [&R](Value V, Value V2) {
		auto *CI1 = cast<CmpInst>(V);
		auto *CI2 = cast<CmpInst>(V2);
		if (R.isDeleted(CI2) \|\| !isValidElementType(CI2->getType()))
		return false;
		if (CI1->getOperand(0)->getType()->getTypeID() <
		CI2->getOperand(0)->getType()->getTypeID())
		return true;
		if (CI1->getOperand(0)->getType()->getTypeID() >
		CI2->getOperand(0)->getType()->getTypeID())
		return false;
		return CI1->getPredicate() < CI2->getPredicate() \|\|
		(CI1->getPredicate() > CI2->getPredicate() &&
		CI1->getPredicate() <
		CmpInst::getSwappedPredicate(CI2->getPredicate()));
		};

		auto &&AreCompatibleCompares = [&R](Value V1, Value V2) {
		if (V1 == V2)
		return true;
		auto *CI1 = cast<CmpInst>(V1);
		auto *CI2 = cast<CmpInst>(V2);
		if (R.isDeleted(CI2) \|\| !isValidElementType(CI2->getType()))
		return false;
		if (CI1->getOperand(0)->getType() != CI2->getOperand(0)->getType())
		return false;
		return CI1->getPredicate() == CI2->getPredicate() \|\|
		CI1->getPredicate() ==
		CmpInst::getSwappedPredicate(CI2->getPredicate());
		};
		auto Limit = [&R](Value *V) {
		unsigned EltSize = R.getVectorElementSize(V);
		return std::max(2U, R.getMaxVecRegSize() / EltSize);
		};

		SmallVector<Value *> Vals(PostponedCmps.begin(), PostponedCmps.end());
		OpsChanged \|= tryToVectorizeSequence<Value>(
		Vals, Limit, CompareSorter, AreCompatibleCompares,
		[this, &R](ArrayRef<Value *> Candidates, bool LimitForRegisterSize) {
		// Exclude possible reductions from other blocks.
		bool ArePossiblyReducedInOtherBlock =
		any_of(Candidates, [](Value *V) {
		return any_of(V->users(), [V](User *U) {
		return isa<SelectInst>(U) &&
		cast<SelectInst>(U)->getParent() !=
		cast<Instruction>(V)->getParent();
		});
		});
		if (ArePossiblyReducedInOtherBlock)
		return false;
		return tryToVectorizeList(Candidates, R, LimitForRegisterSize);
		},
		/LimitForRegisterSize=/true);
Instructions.clear();		Instructions.clear();
} else {		} else {
// Insert in reverse order since the PostponedCmps vector was filled in		// Insert in reverse order since the PostponedCmps vector was filled in
// reverse order.		// reverse order.
Instructions.assign(PostponedCmps.rbegin(), PostponedCmps.rend());		Instructions.assign(PostponedCmps.rbegin(), PostponedCmps.rend());
}		}
return OpsChanged;		return OpsChanged;
}		}
▲ Show 20 Lines • Show All 432 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

	Show First 20 Lines • Show All 414 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4si(<4 x i32> %t) {			define float @test_merge_anyof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4si(			; CHECK-LABEL: @test_merge_anyof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x i32> [[T:%.]], i32 3			; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x i32> [[T:%.]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[T]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[T]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = icmp ne i4 [[TMP1]], 0
	; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x i32> [[T]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[T_FR]], <i32 255, i32 255, i32 255, i32 255>
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i1> [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4			; CHECK-NEXT: [[OR_COND3:%.*]] = or i1 [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i1> [[TMP3]], i32 1
	; CHECK-NEXT: [[CMP11:%.*]] = icmp sgt i32 [[TMP3]], 255			; CHECK-NEXT: [[OR_COND4:%.*]] = or i1 [[OR_COND3]], [[TMP5]]
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[TMP6]], i1 true, i1 [[CMP11]]			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i1> [[TMP3]], i32 2
	; CHECK-NEXT: [[CMP14:%.*]] = icmp sgt i32 [[TMP2]], 255			; CHECK-NEXT: [[OR_COND5:%.*]] = or i1 [[OR_COND4]], [[TMP6]]
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP14]]			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP3]], i32 3
	; CHECK-NEXT: [[CMP17:%.*]] = icmp sgt i32 [[TMP1]], 255			; CHECK-NEXT: [[OR_COND6:%.*]] = or i1 [[OR_COND5]], [[TMP7]]
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP17]]			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T_FR]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[TMP0]], 255			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHIFT]], [[T_FR]]
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP20]]			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP8]], i32 0
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[CONV]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[CONV]]
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	%cmp = icmp slt i32 %vecext, 1			%cmp = icmp slt i32 %vecext, 1
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false
	▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

	Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines

	; PR43745 - https://bugs.llvm.org/show_bug.cgi?id=43745			; PR43745 - https://bugs.llvm.org/show_bug.cgi?id=43745

	; FIXME: this should be vectorized			; FIXME: this should be vectorized
	define i1 @cmp_lt_gt(double %a, double %b, double %c) {			define i1 @cmp_lt_gt(double %a, double %b, double %c) {
	; CHECK-LABEL: @cmp_lt_gt(			; CHECK-LABEL: @cmp_lt_gt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[ADD:%.]] = fsub double [[C:%.]], [[B]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[DIV:%.*]] = fdiv double [[ADD]], [[MUL]]			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[SUB:%.*]] = fsub double [[FNEG]], [[C]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[DIV3:%.*]] = fdiv double [[SUB]], [[MUL]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[DIV]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt double [[DIV3]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
				; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
				; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D
				; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
				; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt double [[TMP9]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[CMP4]], i1 false			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[CMP4]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[CMP5:%.*]] = fcmp ule double [[DIV]], 1.000000e+00			; CHECK-NEXT: [[TMP10:%.*]] = fcmp ule <2 x double> [[TMP7]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[CMP7:%.*]] = fcmp ule double [[DIV3]], 1.000000e+00			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP10]], i32 0
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[CMP5]], i1 true, i1 [[CMP7]]			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP10]], i32 1
				; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[TMP12]], i1 true, i1 [[TMP11]]
	; CHECK-NEXT: br label [[CLEANUP]]			; CHECK-NEXT: br label [[CLEANUP]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: [[RETVAL_0:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[OR_COND1]], [[LOR_LHS_FALSE]] ]			; CHECK-NEXT: [[RETVAL_0:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[OR_COND1]], [[LOR_LHS_FALSE]] ]
	; CHECK-NEXT: ret i1 [[RETVAL_0]]			; CHECK-NEXT: ret i1 [[RETVAL_0]]
	;			;
	entry:			entry:
	%fneg = fneg double %b			%fneg = fneg double %b
	%add = fadd double %fneg, %c			%add = fadd double %fneg, %c
	Show All 31 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	; MINTREESIZE-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0			; MINTREESIZE-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0
	; MINTREESIZE-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1
	; MINTREESIZE-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2
	; MINTREESIZE-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3			; MINTREESIZE-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3
	; MINTREESIZE-NEXT: [[CMP0:%.*]] = icmp ne i32 [[C0]], 0			; MINTREESIZE-NEXT: [[CMP0:%.*]] = icmp ne i32 [[C0]], 0
	; MINTREESIZE-NEXT: [[CMP1:%.*]] = icmp ne i32 [[C1]], 0			; MINTREESIZE-NEXT: [[CMP1:%.*]] = icmp ne i32 [[C1]], 0
	; MINTREESIZE-NEXT: [[CMP2:%.*]] = icmp ne i32 [[C2]], 0			; MINTREESIZE-NEXT: [[CMP2:%.*]] = icmp ne i32 [[C2]], 0
	; MINTREESIZE-NEXT: [[CMP3:%.*]] = icmp ne i32 [[C3]], 0			; MINTREESIZE-NEXT: [[CMP3:%.*]] = icmp ne i32 [[C3]], 0
				; MINTREESIZE-NEXT: [[TMP1:%.*]] = insertelement <4 x i1> poison, i1 [[CMP3]], i32 0
				; MINTREESIZE-NEXT: [[TMP2:%.*]] = insertelement <4 x i1> [[TMP1]], i1 [[CMP2]], i32 1
				; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> [[TMP2]], i1 [[CMP1]], i32 2
				; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[CMP0]], i32 3
	; MINTREESIZE-NEXT: [[S0:%.*]] = select i1 [[CMP0]], float [[A0]], float [[B0]]			; MINTREESIZE-NEXT: [[S0:%.*]] = select i1 [[CMP0]], float [[A0]], float [[B0]]
	; MINTREESIZE-NEXT: [[S1:%.*]] = select i1 [[CMP1]], float [[A1]], float [[B1]]			; MINTREESIZE-NEXT: [[S1:%.*]] = select i1 [[CMP1]], float [[A1]], float [[B1]]
	; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]			; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]
	; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]			; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]
	; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[S0]], i32 0			; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[S0]], i32 0
	; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1			; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1
	; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2			; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2
	; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3			; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3
	; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0			; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0
	; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1			; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1
	; MINTREESIZE-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0			; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[Q1]], i32 1			; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q1]], i32 1
	; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2			; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2
	; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3			; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0			; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[Q3]], i32 1			; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1
	; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]			; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
	; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]			; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
	; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0			; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q5]], i32 1			; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]			; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0			; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q5]], i32 1			; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]			; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
	; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])			; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
	; MINTREESIZE-NEXT: ret <4 x float> undef			; MINTREESIZE-NEXT: ret <4 x float> undef
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%c2 = extractelement <4 x i32> %c, i32 2			%c2 = extractelement <4 x i32> %c, i32 2
	%c3 = extractelement <4 x i32> %c, i32 3			%c3 = extractelement <4 x i32> %c, i32 3
	▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; MINTREESIZE-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0			; MINTREESIZE-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0
	; MINTREESIZE-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1
	; MINTREESIZE-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2
	; MINTREESIZE-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3			; MINTREESIZE-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3
	; MINTREESIZE-NEXT: [[CMP0:%.*]] = icmp ne i32 [[C0]], 0			; MINTREESIZE-NEXT: [[CMP0:%.*]] = icmp ne i32 [[C0]], 0
	; MINTREESIZE-NEXT: [[CMP1:%.*]] = icmp ne i32 [[C1]], 0			; MINTREESIZE-NEXT: [[CMP1:%.*]] = icmp ne i32 [[C1]], 0
	; MINTREESIZE-NEXT: [[CMP2:%.*]] = icmp ne i32 [[C2]], 0			; MINTREESIZE-NEXT: [[CMP2:%.*]] = icmp ne i32 [[C2]], 0
	; MINTREESIZE-NEXT: [[CMP3:%.*]] = icmp ne i32 [[C3]], 0			; MINTREESIZE-NEXT: [[CMP3:%.*]] = icmp ne i32 [[C3]], 0
				; MINTREESIZE-NEXT: [[TMP1:%.*]] = insertelement <4 x i1> poison, i1 [[CMP3]], i32 0
				; MINTREESIZE-NEXT: [[TMP2:%.*]] = insertelement <4 x i1> [[TMP1]], i1 [[CMP2]], i32 1
				; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> [[TMP2]], i1 [[CMP1]], i32 2
				; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[CMP0]], i32 3
	; MINTREESIZE-NEXT: [[S0:%.*]] = select i1 [[CMP0]], float [[A0]], float [[B0]]			; MINTREESIZE-NEXT: [[S0:%.*]] = select i1 [[CMP0]], float [[A0]], float [[B0]]
	; MINTREESIZE-NEXT: [[S1:%.*]] = select i1 [[CMP1]], float [[A1]], float [[B1]]			; MINTREESIZE-NEXT: [[S1:%.*]] = select i1 [[CMP1]], float [[A1]], float [[B1]]
	; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]			; MINTREESIZE-NEXT: [[S2:%.*]] = select i1 [[CMP2]], float [[A2]], float [[B2]]
	; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]			; MINTREESIZE-NEXT: [[S3:%.*]] = select i1 [[CMP3]], float [[A3]], float [[B3]]
	; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[S0]], i32 0			; MINTREESIZE-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[S0]], i32 0
	; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1			; MINTREESIZE-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1
	; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2			; MINTREESIZE-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[S2]], i32 2
	; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3			; MINTREESIZE-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[S3]], i32 3
	; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0			; MINTREESIZE-NEXT: [[Q0:%.*]] = extractelement <4 x float> [[RD]], i32 0
	; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1			; MINTREESIZE-NEXT: [[Q1:%.*]] = extractelement <4 x float> [[RD]], i32 1
	; MINTREESIZE-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0			; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q0]], i32 0
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[Q1]], i32 1			; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q1]], i32 1
	; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2			; MINTREESIZE-NEXT: [[Q2:%.*]] = extractelement <4 x float> [[RD]], i32 2
	; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3			; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0			; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[Q3]], i32 1			; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1
	; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]			; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
	; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]			; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
	; MINTREESIZE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0			; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[Q5]], i32 1			; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]			; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0			; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q5]], i32 1			; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]			; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
	; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])			; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
	; MINTREESIZE-NEXT: ret <4 x float> undef			; MINTREESIZE-NEXT: ret <4 x float> undef
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%c2 = extractelement <4 x i32> %c, i32 2			%c2 = extractelement <4 x i32> %c, i32 2
	%c3 = extractelement <4 x i32> %c, i32 3			%c3 = extractelement <4 x i32> %c, i32 3
	▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -mattr=avx512vl -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -mattr=avx512vl -S \| FileCheck %s --check-prefixes=CHECK,AVX

declare void @use1(i1)		declare void @use1(i1)

define i1 @logical_and_icmp(<4 x i32> %x) {		define i1 @logical_and_icmp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp(		; CHECK-LABEL: @logical_and_icmp(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	;
%c3 = fcmp olt float %x3, 0.0		%c3 = fcmp olt float %x3, 0.0
%s1 = select i1 %c0, i1 true, i1 %c1		%s1 = select i1 %c0, i1 true, i1 %c1
%s2 = select i1 %s1, i1 true, i1 %c2		%s2 = select i1 %s1, i1 true, i1 %c2
%s3 = select i1 %s2, i1 true, i1 %c3		%s3 = select i1 %s2, i1 true, i1 %c3
ret i1 %s3		ret i1 %s3
}		}

define i1 @logical_and_icmp_diff_preds(<4 x i32> %x) {		define i1 @logical_and_icmp_diff_preds(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_diff_preds(		; SSE-LABEL: @logical_and_icmp_diff_preds(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; SSE-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; SSE-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; SSE-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; SSE-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
; CHECK-NEXT: [[C0:%.*]] = icmp ult i32 [[X0]], 0		; SSE-NEXT: [[C0:%.*]] = icmp ult i32 [[X0]], 0
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 0		; SSE-NEXT: [[C2:%.*]] = icmp sgt i32 [[X2]], 0
; CHECK-NEXT: [[C2:%.*]] = icmp sgt i32 [[X2]], 0		; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[X3]], i32 0
; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 0		; SSE-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[X1]], i32 1
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false		; SSE-NEXT: [[TMP3:%.*]] = icmp slt <2 x i32> [[TMP2]], zeroinitializer
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false		; SSE-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[TMP4]], i1 false
; CHECK-NEXT: ret i1 [[S3]]		; SSE-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
		; SSE-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[TMP5]], i1 false
		; SSE-NEXT: ret i1 [[S3]]
		RKSimonUnsubmitted Not Done Reply Inline Actions Any idea what happened here? RKSimon: Any idea what happened here?
		ABataevAuthorUnsubmitted Done Reply Inline Actions The pair of `icmp slt` gets vectorized because the cost model decided that it is profitable. We can't handle these `select`s as reduction because `icmp` instructions have different predicates. ABataev: The pair of `icmp slt` gets vectorized because the cost model decided that it is profitable. We…
		RKSimonUnsubmitted Not Done Reply Inline Actions But the sgt didn't get swapped and from the TODO at line 9298 I take it we can't do an altopcode/perdicate to handle the ult? RKSimon: But the sgt didn't get swapped and from the TODO at line 9298 I take it we can't do an…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sorry, don't quite understand your question. Why sgt is not vectorized with ult? They are not comptaible. Why sgt, ult and 2 slts are not vectorized? 3 different predicates and they are not profitabke for vectorization (ending up with 2 gather nodes). TODO is for extra analysis of cmp operands, to select only cmps with same/alternate operands only (!) or constants only(!). ABataev: Sorry, don't quite understand your question. Why sgt is not vectorized with ult? They are not…
		ABataevAuthorUnsubmitted Done Reply Inline Actions PS. Not compatible - also end up with 2 gather nodes ABataev: PS. Not compatible - also end up with 2 gather nodes
		;
		; AVX-LABEL: @logical_and_icmp_diff_preds(
		; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
		; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
		; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
		; AVX-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
		; AVX-NEXT: [[C0:%.*]] = icmp ult i32 [[X0]], 0
		; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 0
		; AVX-NEXT: [[C2:%.*]] = icmp sgt i32 [[X2]], 0
		; AVX-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 0
		; AVX-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
		; AVX-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
		; AVX-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
		; AVX-NEXT: ret i1 [[S3]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp ult i32 %x0, 0		%c0 = icmp ult i32 %x0, 0
%c1 = icmp slt i32 %x1, 0		%c1 = icmp slt i32 %x1, 0
%c2 = icmp sgt i32 %x2, 0		%c2 = icmp sgt i32 %x2, 0
Show All 22 Lines	;
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 %c2, i1 false		%s2 = select i1 %s1, i1 %c2, i1 false
%s3 = select i1 %s2, i1 %c3, i1 false		%s3 = select i1 %s2, i1 %c3, i1 false
ret i1 %s3		ret i1 %s3
}		}

define i1 @mixed_logical_icmp(<4 x i32> %x) {		define i1 @mixed_logical_icmp(<4 x i32> %x) {
; CHECK-LABEL: @mixed_logical_icmp(		; CHECK-LABEL: @mixed_logical_icmp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = icmp sgt <4 x i32> [[X:%.]], zeroinitializer
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i1> [[TMP1]], i32 0
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i1> [[TMP1]], i32 1
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[S1:%.*]] = select i1 [[TMP2]], i1 [[TMP3]], i1 false
; CHECK-NEXT: [[C0:%.*]] = icmp sgt i32 [[X0]], 0		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2
; CHECK-NEXT: [[C1:%.*]] = icmp sgt i32 [[X1]], 0		; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 true, i1 [[TMP4]]
; CHECK-NEXT: [[C2:%.*]] = icmp sgt i32 [[X2]], 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i1> [[TMP1]], i32 3
; CHECK-NEXT: [[C3:%.*]] = icmp sgt i32 [[X3]], 0		; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[TMP5]], i1 false
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 true, i1 [[C2]]
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
; CHECK-NEXT: ret i1 [[S3]]		; CHECK-NEXT: ret i1 [[S3]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp sgt i32 %x0, 0		%c0 = icmp sgt i32 %x0, 0
%c1 = icmp sgt i32 %x1, 0		%c1 = icmp sgt i32 %x1, 0
Show All 29 Lines
}		}

; TODO: This is better than all-scalar and still safe,		; TODO: This is better than all-scalar and still safe,
; but we want this to be 2 reductions with glue		; but we want this to be 2 reductions with glue
; logic...or a wide reduction?		; logic...or a wide reduction?

define i1 @logical_and_icmp_clamp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp(		; CHECK-LABEL: @logical_and_icmp_clamp(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[X]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP3]])
; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[X]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i1> [[TMP2]], i32 0
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[TMP4]], 17		; CHECK-NEXT: [[S4:%.*]] = select i1 [[TMP4]], i1 [[TMP5]], i1 false
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[TMP3]], 17		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i1> [[TMP2]], i32 1
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[TMP2]], 17		; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[TMP6]], i1 false
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[TMP1]], 17		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP2]], i32 2
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]		; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[TMP7]], i1 false
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])		; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i1> [[TMP2]], i32 3
; CHECK-NEXT: [[S4:%.*]] = select i1 [[TMP7]], i1 [[D0]], i1 false		; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[TMP8]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]		; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines