This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/5
InstCombineSelect.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
-
select-binop-foldable-floating-point.ll
-
LoopVectorize/
-
ARM/
-
mve-selectandorcost.ll
-
reduction-inloop-pred.ll
-
reduction-inloop.ll
-
PhaseOrdering/AArch64/
-
AArch64/
-
hoisting-sinking-required-for-vectorization.ll

Differential D113442

[InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv.
ClosedPublic

Authored by huihuiz on Nov 8 2021, 4:27 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
spatel
efriedma
nikic
RKSimon

Commits

rG9cd7c534e27c: [InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv.

Summary

For FAdd, FMul, FSub and FDiv, fold select into one of the operands to enable
further optimizations, i.e., floating-point reduction detection.

Turn code:

%C = fadd %A, %B
%D = select %cond, %C, %A

into:

%C = select %cond, %B, -0.000000e+00
%D = fadd %A, %C

Alive2 verification (with --disable-undef-input), timed out otherwise.
FAdd - https://alive2.llvm.org/ce/z/eUxN4Y
FMul - https://alive2.llvm.org/ce/z/5SWZz4
FSub - https://alive2.llvm.org/ce/z/Dhj8dU
FDiv - https://alive2.llvm.org/ce/z/Yj_NA2

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

huihuiz created this revision.Nov 8 2021, 4:27 PM

Herald added subscribers: dmgreen, hiraditya. · View Herald TranscriptNov 8 2021, 4:27 PM

huihuiz requested review of this revision.Nov 8 2021, 4:27 PM

Take test.ll attached.

Run: opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-codegen-verify -analyze -polly-scops test.ll
Then you will see reduction is not detected.
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]

-> { Stmt_for_body[i0] -> MemRef_sum_014_reg2mem[0] };

After enable fold select into operand for FAdd, then you will see reduction is detected.
run: opt -S -instcombine test.ll -o test2.ll (run with this patch)
opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-codegen-verify -analyze -polly-scops test2.ll

MustWriteAccess := [Reduction Type: +] [Scalar: 0]

-> { Stmt_for_body[i0] -> MemRef_sum_014_reg2mem[0] };

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-none-linux-gnu"

define float @test(i32 %n, float* noalias nocapture readonly %a) {
entry:
  %sum.014.reg2mem = alloca float, align 4
  %sum.0.lcssa.reg2mem = alloca float, align 4
  br label %entry.split15

entry.split15:                                    ; preds = %entry
  br label %entry.split

entry.split:                                      ; preds = %entry.split15
  %cmp12 = icmp sgt i32 %n, 0
  store float 0.000000e+00, float* %sum.0.lcssa.reg2mem, align 4
  br i1 %cmp12, label %for.body.preheader, label %for.end

for.body.preheader:                               ; preds = %entry.split
  %wide.trip.count = zext i32 %n to i64
  store float 0.000000e+00, float* %sum.014.reg2mem, align 4
  br label %for.body

for.body:                                         ; preds = %for.body.preheader, %for.body
  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
  %sum.014.reload = load float, float* %sum.014.reg2mem, align 4
  %arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
  %0 = load float, float* %arrayidx, align 4
  %cmp1 = fcmp fast ogt float %0, 0.000000e+00
  %add = fadd fast float %0, %sum.014.reload
  %sum.1 = select i1 %cmp1, float %add, float %sum.014.reload
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
  store float %sum.1, float* %sum.014.reg2mem, align 4
  br i1 %exitcond.not, label %for.end.loopexit, label %for.body

for.end.loopexit:                                 ; preds = %for.body
  %1 = load float, float* %sum.014.reg2mem, align 4
  store float %1, float* %sum.0.lcssa.reg2mem, align 4
  br label %for.end

for.end:                                          ; preds = %for.end.loopexit, %entry.split
  %sum.0.lcssa.reload = load float, float* %sum.0.lcssa.reg2mem, align 4
  ret float %sum.0.lcssa.reload
}

Harbormaster completed remote builds in B133134: Diff 385653.Nov 8 2021, 5:06 PM

spatel added inline comments.Nov 9 2021, 5:44 AM

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
257	Why exclude fdiv?
llvm/test/Transforms/InstCombine/select-binop-cmp.ll
307 ↗	(On Diff #385653)	Please pre-commit the baseline tests. Some of these tests should include fast-math-flags on the FP binop, so we can verify that FMF propagates as expected.
330 ↗	(On Diff #385653)	Why is the compare constant (0.0 or -0.0) relevant for this fold? The true/false operands should be swapped so we have coverage for the pattern that replaces the true value with a constant. Similarly for `fmul`, there should be two tests.

spatel added inline comments.Nov 9 2021, 5:50 AM

llvm/test/Transforms/InstCombine/select-binop-cmp.ll
330 ↗	(On Diff #385653)	A better question might be - why is there an fcmp in any of these tests? That isn't part of the minimal pattern is it?

Thanks Sanjay for the comments, I will update unit test as suggested.

I have some concern when I checked with alive2, for fadd https://alive2.llvm.org/ce/z/UjAMM_ alive2 complaints mis-matched outputs.
For integer add this folding seems correct https://alive2.llvm.org/ce/z/oE4UQJ

Let me know if such transformation would be illegal for floating point type as alive2 pointed out, or there is anything I missed before I go fixing the unit test ?

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
257	I checked with alive2, looks like adding fdiv, sdiv and udiv will trigger undefined behavior https://alive2.llvm.org/ce/z/KvFYev

In D113442#3119977, @huihuiz wrote:

I have some concern when I checked with alive2, for fadd https://alive2.llvm.org/ce/z/UjAMM_ alive2 complaints mis-matched outputs.

For fadd, we need to use -0.0 as the binop identity constant. I'm not getting the online instance of Alive2 to verify this without timing out, but I think this should work given enough time:
https://alive2.llvm.org/ce/z/TtGDNR

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
257	I think the integer ops are not safe because we can trigger immediate UB that might have been avoided in the original code, but fdiv should be fine - it doesn't have any different UB characteristics vs. fadd/fmul/fsub in the default FP environment. https://alive2.llvm.org/ce/z/Yj_NA2 (This is timing out on the online version when allowing undef/poison, so please double-check on a local machine.)

huihuiz mentioned this in rG5a4bd07ea464: [InstCombine][NFC] Pre-commit baseline test for D113442..Nov 10 2021, 7:46 PM

Addressed review comments.
Pre-commit baseline test test/Transforms/InstCombine/select-binop-foldable-floating-point.ll

huihuiz retitled this revision from [InstCombine] Enable fold select into operand for FAdd, FMul, and FSub. to [InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv..Nov 10 2021, 7:50 PM

huihuiz edited the summary of this revision. (Show Details)

huihuiz added inline comments.Nov 10 2021, 7:54 PM

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
257	Thanks Sanjay for the explanation. Include 'fdiv' to allow fold on the divisor amount. I did alive-tv run on my local machine with time out increased to ./alive-tv src.ll tgt.ll -smt-to 100000000 But alive did not converge after 1.5 hour. Doing an overnight run now, will update result tomorrow morning.
llvm/test/Transforms/InstCombine/select-binop-cmp.ll
330 ↗	(On Diff #385653)	Removing fcmp instruction. The minimal pattern to fold select into binop operand does not require a fcmp. I am moving this into a separate test. The original test select-binop-cmp.ll checks fcmp ignores the sign of 0.0 (for example test @select_fadd_fcmp), so keep the original test unchanged for now.

Harbormaster completed remote builds in B133650: Diff 386409.Nov 10 2021, 9:44 PM

spatel added subscribers: aqjune, regehr, nlopes.Nov 11 2021, 8:48 AM

spatel added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
257	Hmm...I think it's a good sign if it did not find a failure after 1.5 hours, but that is a long time to spend on 2 instructions. cc @nlopes @regehr @aqjune in case they see something wrong or room for improvement here: https://alive2.llvm.org/ce/z/TtGDNR

Update on overnight run for fdiv

My run below did not return anything or capture any failure after 12 hours. 1000000000 is probably the maximum time out we can set to alive.
But I do agree that for two instructions, alive is taking too long to converge.

Definitely point us out if there is anything wrong ?

./alive-tv src.ll -smt-to 1000000000

----------------------------------------
define half @select_fdiv(i1 %cond, half %A, half %B) {
%0:
  %C = fdiv half %A, %B
  %D = select i1 %cond, half %C, half %A
  ret half %D
}
=>
define half @select_fdiv(i1 %cond, half %A, half %B) {
%0:
  %C = select i1 %cond, half %B, half 15360
  %D = fdiv half %A, %C
  ret half %D
}

Rebased, and a gentle ping ?

Let me know if there are concerns with the current implementation ?

Harbormaster completed remote builds in B134778: Diff 388002.Nov 17 2021, 12:18 PM

LGTM.

You wrote that fdiv ran overnight without completing. I wonder if that opcode is slower than the others. For example, did fadd/fsub also timeout?

This revision is now accepted and ready to land.Nov 22 2021, 6:56 AM

Thanks Sanjay for the review!
I did another local run for fadd, with "--disable-undef-input" it finish within a minute.
When removing "--disable-undef-input", it's taking about an hour now, still not finished.

Probably, the problem is the floating-point type. We hit some limitations of alive2 I assume.

In D113442#3146921, @huihuiz wrote:

Thanks Sanjay for the review!
I did another local run for fadd, with "--disable-undef-input" it finish within a minute.
When removing "--disable-undef-input", it's taking about an hour now, still not finished.

Probably, the problem is the floating-point type. We hit some limitations of alive2 I assume.

Reasoning about floats is already quite expensive, and mixing that with undefs is really hard.
The fact it doesn't find any bug within 1 hour is a good sign though. Right now there's nothing better we can do. We have plans to improve this in the longer term.

This revision was landed with ongoing or failed builds.Nov 22 2021, 3:10 PM

Closed by commit rG9cd7c534e27c: [InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv. (authored by huihuiz). · Explain Why

This revision was automatically updated to reflect the committed changes.

huihuiz added a commit: rG9cd7c534e27c: [InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv..

Just FYI: I found that this change might affect of Ptrdist-ks benchmark from MultiSource suite on AMD Rome cores up to 30%. Don't have much details but leaving it here in case anyone else is affected.

This patch cause regression (https://godbolt.org/z/7WYboTe16) on x86 avx512 target, because avx512 has mask instruction which can match select instruction in ISel pattern match. Move select before fadd cause the pattern match fails. I don't quite understand for the optimization opportunity for this transformation. Could you elaborate an example for it?

This patch has side effect on AVX512 ISel. Another example is https://godbolt.org/z/xdr8xs4cb.

This needs backend undo transformation to be implemented first.

Plesse revert @huihuiz

In D113442#3281729, @LuoYuanke wrote:

This patch cause regression (https://godbolt.org/z/7WYboTe16) on x86 avx512 target, because avx512 has mask instruction which can match select instruction in ISel pattern match. Move select before fadd cause the pattern match fails. I don't quite understand for the optimization opportunity for this transformation. Could you elaborate an example for it?

This patch has side effect on AVX512 ISel. Another example is https://godbolt.org/z/xdr8xs4cb.

Yeah, this patch was commited without checking how this transformation affects x86 codegen. :/ needs to be reverted before LLVM 14.

cc @spatel

In D113442#3281741, @xbolva00 wrote:

In D113442#3281729, @LuoYuanke wrote:

This patch cause regression (https://godbolt.org/z/7WYboTe16) on x86 avx512 target, because avx512 has mask instruction which can match select instruction in ISel pattern match. Move select before fadd cause the pattern match fails. I don't quite understand for the optimization opportunity for this transformation. Could you elaborate an example for it?

This patch has side effect on AVX512 ISel. Another example is https://godbolt.org/z/xdr8xs4cb.

Yeah, this patch was commited without checking how this transformation affects x86 codegen. :/ needs to be reverted before LLVM 14.

cc @spatel

Note that this patch makes IR more consistent (FP ops are treated the same as integer ops). So is the integer codegen also not optimal?
https://godbolt.org/z/jd3q6Yq55

If there's still time for clang 14, can we fix x86 codegen rather than revert? There was already a similar bug filed for compares:
https://github.com/llvm/llvm-project/issues/51842

Note that this patch makes IR more consistent (FP ops are treated the same as integer ops). So is the integer codegen also not optimal?
https://godbolt.org/z/jd3q6Yq55

I think the integer is also NOT optimal with AVX512 target. See https://godbolt.org/z/Wahef3b3v. For _mm_mask_add_epi64() and _mm_mask_sub_epi64(), it eventually generate extra instructions. But for integer mul, or, and, xor and shl in the test case the InstCombinePass doesn't transform it, so it still generate good code.

I still don't understand what's the benefit to move select instruction before the binary operation (add, sub, mul ...). Can someone indicate why it is profitable? If it is profitable for general case, I think we can call TTI interface to ask backend if the transform is profitable and then make the decision to transform. We may add below interfaces in TTI.

+
+  bool isLegalMaskedFAdd(Type *DataTye);
+  bool isLegalMaskedFSub(Type *DataTye);
+  bool isLegalMaskedFMul(Type *DataTye);
+  bool isLegalMaskedFDiv(Type *DataTye);
+

In D113442#3282412, @LuoYuanke wrote:
Note that this patch makes IR more consistent (FP ops are treated the same as integer ops). So is the integer codegen also not optimal?
https://godbolt.org/z/jd3q6Yq55

I think the integer is also NOT optimal with AVX512 target. See https://godbolt.org/z/Wahef3b3v. For _mm_mask_add_epi64() and _mm_mask_sub_epi64(), it eventually generate extra instructions. But for integer mul, or, and, xor and shl in the test case the InstCombinePass doesn't transform it, so it still generate good code.

I still don't understand what's the benefit to move select instruction before the binary operation (add, sub, mul ...). Can someone indicate why it is profitable? If it is profitable for general case, I think we can call TTI interface to ask backend if the transform is profitable and then make the decision to transform. We may add below interfaces in TTI.
+
+  bool isLegalMaskedFAdd(Type *DataTye);
+  bool isLegalMaskedFSub(Type *DataTye);
+  bool isLegalMaskedFMul(Type *DataTye);
+  bool isLegalMaskedFDiv(Type *DataTye);
+

I believe at least one reason for the int combine was because it reduces the number of uses of the input.

ARM and RISCV both reverse it in the backend for scalars. Though they should probably be using FREEZE to do it.

X86 could probably reverse it for vectors.

For vectors - Arm MVE has to reverse these transforms (as in https://reviews.llvm.org/rGd9af9c2c5a53c9ba6aa0255240a2a40e8bea27aa). It is simpler in general to match vselect cc, (add x, y), x) as a predicated-add, than it is with a folded select with an identity element. But we do manage to reverse that at the moment, and I've not seen any cases of it folding to something we couldn't convert back. It can be quite important for performance in places, for example where the vectorizer produces a predicated reduction (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L4158, that is only enabled for Arm MVE at the moment).

SVE doesn't seem to attempt to convert add+select into predicate-add yet. https://godbolt.org/z/TPbE95h5x

Maybe worth to mention:
https://reviews.llvm.org/D90113

spatel mentioned this in D118644: [x86] invert a vector select IR canonicalization with a binop identity constant.Jan 31 2022, 12:50 PM

In D113442#3282819, @xbolva00 wrote:

Maybe worth to mention:
https://reviews.llvm.org/D90113

Thanks for the link. I posted a much more limited x86-only patch -- D118644 -- to deal with the regression noted here.

Thank you guys for looking into this!

One of the optimization enabled through this patch is reduction detection. The same applies to it's integer equivalent.

Take the example test.ll attached in the very first comment
run:
opt -S -instcombine test.ll -o test2.ll
opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-codegen-verify -analyze -polly-scops test2.ll

Then you will see reduction getting detected. Eventually allow loops to get vectorized.
MustWriteAccess := [Reduction Type: +] [Scalar: 0]

The folding in particular reduce the user of %sum.014.reload to 1. So that we don't need to over-complicate the data-flow analysis algorithm used by vectorizer. Also don't need to combine multiple reduction operators.

before

%cmp1 = fcmp fast ogt float %0, 0.000000e+00
%add = fadd fast float %0, %sum.014.reload
%sum.1 = select i1 %cmp1, float %add, float %sum.014.reload

after

%.inv = fcmp fast ole float %0, -0.000000e+00
%1 = select fast i1 %.inv, float -0.000000e+00, float %0
%sum.1 = fadd fast float %sum.014.reload, %1

In D113442#3206100, @goncharov wrote:

Just FYI: I found that this change might affect of Ptrdist-ks benchmark from MultiSource suite on AMD Rome cores up to 30%. Don't have much details but leaving it here in case anyone else is affected.

This is huge regression. Does @spatel ´s patch fix it?

Any performance measurements of this patch alone? Or phoronix sill surprise (good/bad?) us again

In D113442#3287665, @xbolva00 wrote:

In D113442#3206100, @goncharov wrote:

Just FYI: I found that this change might affect of Ptrdist-ks benchmark from MultiSource suite on AMD Rome cores up to 30%. Don't have much details but leaving it here in case anyone else is affected.

This is huge regression. Does @spatel ´s patch fix it?

D118644 only handles vectors and AVX512 targets so this regression wouldn't be fixed

spatel mentioned this in rG6592bcecd4ff: [x86] invert a vector select IR canonicalization with a binop identity constant.Feb 2 2022, 5:18 AM

spatel mentioned this in D90113: [DAGCombiner] Fold BinOp into Select containing identity constant.Feb 8 2022, 5:16 AM

Following-up:
https://github.com/llvm/llvm-project/issues/53866 lists the commits that are in main/branch to avoid the AVX512 FP regressions.

Further work to invert this transform in the backend is ongoing. For example:
D119654

This has to be done in relatively small steps to avoid regressions on x86 - it's not clear without looking at asm if the transformed code will be better (may depend on operation, data type, and subtarget features).

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineSelect.cpp

4 lines

test/

Transforms/

InstCombine/

select-binop-foldable-floating-point.ll

64 lines

LoopVectorize/

ARM/

mve-selectandorcost.ll

4 lines

reduction-inloop-pred.ll

8 lines

reduction-inloop.ll

48 lines

PhaseOrdering/

AArch64/

hoisting-sinking-required-for-vectorization.ll

4 lines

Diff 389053

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	/// %D = or %A, %C			/// %D = or %A, %C
	///			///
	/// Assuming that the specified instruction is an operand to the select, return			/// Assuming that the specified instruction is an operand to the select, return
	/// a bitmask indicating which operands of this instruction are foldable if they			/// a bitmask indicating which operands of this instruction are foldable if they
	/// equal the other incoming value of the select.			/// equal the other incoming value of the select.
	static unsigned getSelectFoldableOperands(BinaryOperator *I) {			static unsigned getSelectFoldableOperands(BinaryOperator *I) {
	switch (I->getOpcode()) {			switch (I->getOpcode()) {
	case Instruction::Add:			case Instruction::Add:
				case Instruction::FAdd:
	case Instruction::Mul:			case Instruction::Mul:
				case Instruction::FMul:
	case Instruction::And:			case Instruction::And:
	case Instruction::Or:			case Instruction::Or:
	case Instruction::Xor:			case Instruction::Xor:
	return 3; // Can fold through either operand.			return 3; // Can fold through either operand.
	case Instruction::Sub: // Can only fold on the amount subtracted.			case Instruction::Sub: // Can only fold on the amount subtracted.
				case Instruction::FSub:
				spatelUnsubmitted Not Done Reply Inline Actions Why exclude fdiv? spatel: Why exclude fdiv?
				huihuizAuthorUnsubmitted Done Reply Inline Actions I checked with alive2, looks like adding fdiv, sdiv and udiv will trigger undefined behavior https://alive2.llvm.org/ce/z/KvFYev huihuiz: I checked with alive2, looks like adding fdiv, sdiv and udiv will trigger undefined behavior…
				spatelUnsubmitted Not Done Reply Inline Actions I think the integer ops are not safe because we can trigger immediate UB that might have been avoided in the original code, but fdiv should be fine - it doesn't have any different UB characteristics vs. fadd/fmul/fsub in the default FP environment. https://alive2.llvm.org/ce/z/Yj_NA2 (This is timing out on the online version when allowing undef/poison, so please double-check on a local machine.) spatel: I think the integer ops are not safe because we can trigger immediate UB that might have been…
				huihuizAuthorUnsubmitted Done Reply Inline Actions Thanks Sanjay for the explanation. Include 'fdiv' to allow fold on the divisor amount. I did alive-tv run on my local machine with time out increased to ./alive-tv src.ll tgt.ll -smt-to 100000000 But alive did not converge after 1.5 hour. Doing an overnight run now, will update result tomorrow morning. huihuiz: Thanks Sanjay for the explanation. Include 'fdiv' to allow fold on the divisor amount. I did…
				spatelUnsubmitted Not Done Reply Inline Actions Hmm...I think it's a good sign if it did not find a failure after 1.5 hours, but that is a long time to spend on 2 instructions. cc @nlopes @regehr @aqjune in case they see something wrong or room for improvement here: https://alive2.llvm.org/ce/z/TtGDNR spatel: Hmm...I think it's a good sign if it did not find a failure after 1.5 hours, but that is a long…
				case Instruction::FDiv: // Can only fold on the divisor amount.
	case Instruction::Shl: // Can only fold on the shift amount.			case Instruction::Shl: // Can only fold on the shift amount.
	case Instruction::LShr:			case Instruction::LShr:
	case Instruction::AShr:			case Instruction::AShr:
	return 1;			return 1;
	default:			default:
	return 0; // Cannot fold			return 0; // Cannot fold
	}			}
	}			}
	▲ Show 20 Lines • Show All 3,071 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/select-binop-foldable-floating-point.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	define float @select_fadd(i1 %cond, float %A, float %B) {			define float @select_fadd(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fadd(			; CHECK-LABEL: @select_fadd(
	; CHECK-NEXT: [[C:%.]] = fadd float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float -0.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fadd float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fadd float %A, %B			%C = fadd float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fadd_swapped(i1 %cond, float %A, float %B) {			define float @select_fadd_swapped(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fadd_swapped(			; CHECK-LABEL: @select_fadd_swapped(
	; CHECK-NEXT: [[C:%.]] = fadd float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float -0.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fadd float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fadd float %A, %B			%C = fadd float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	define float @select_fadd_fast_math(i1 %cond, float %A, float %B) {			define float @select_fadd_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fadd_fast_math(			; CHECK-LABEL: @select_fadd_fast_math(
	; CHECK-NEXT: [[C:%.]] = fadd fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float -0.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fadd fast float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fadd fast float %A, %B			%C = fadd fast float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fadd_swapped_fast_math(i1 %cond, float %A, float %B) {			define float @select_fadd_swapped_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fadd_swapped_fast_math(			; CHECK-LABEL: @select_fadd_swapped_fast_math(
	; CHECK-NEXT: [[C:%.]] = fadd fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float -0.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fadd fast float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fadd fast float %A, %B			%C = fadd fast float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	define float @select_fmul(i1 %cond, float %A, float %B) {			define float @select_fmul(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fmul(			; CHECK-LABEL: @select_fmul(
	; CHECK-NEXT: [[C:%.]] = fmul float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float 1.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fmul float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fmul float %A, %B			%C = fmul float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fmul_swapped(i1 %cond, float %A, float %B) {			define float @select_fmul_swapped(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fmul_swapped(			; CHECK-LABEL: @select_fmul_swapped(
	; CHECK-NEXT: [[C:%.]] = fmul float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float 1.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fmul float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fmul float %A, %B			%C = fmul float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	define float @select_fmul_fast_math(i1 %cond, float %A, float %B) {			define float @select_fmul_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fmul_fast_math(			; CHECK-LABEL: @select_fmul_fast_math(
	; CHECK-NEXT: [[C:%.]] = fmul fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float 1.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fmul fast float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fmul fast float %A, %B			%C = fmul fast float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fmul_swapped_fast_math(i1 %cond, float %A, float %B) {			define float @select_fmul_swapped_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fmul_swapped_fast_math(			; CHECK-LABEL: @select_fmul_swapped_fast_math(
	; CHECK-NEXT: [[C:%.]] = fmul fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float 1.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fmul fast float [[C]], [[A:%.]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fmul fast float %A, %B			%C = fmul fast float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	define float @select_fsub(i1 %cond, float %A, float %B) {			define float @select_fsub(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fsub(			; CHECK-LABEL: @select_fsub(
	; CHECK-NEXT: [[C:%.]] = fsub float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float 0.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fsub float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fsub float %A, %B			%C = fsub float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fsub_swapped(i1 %cond, float %A, float %B) {			define float @select_fsub_swapped(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fsub_swapped(			; CHECK-LABEL: @select_fsub_swapped(
	; CHECK-NEXT: [[C:%.]] = fsub float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float 0.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fsub float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fsub float %A, %B			%C = fsub float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	define float @select_fsub_fast_math(i1 %cond, float %A, float %B) {			define float @select_fsub_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fsub_fast_math(			; CHECK-LABEL: @select_fsub_fast_math(
	; CHECK-NEXT: [[C:%.]] = fsub fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float 0.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fsub fast float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fsub fast float %A, %B			%C = fsub fast float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fsub_swapped_fast_math(i1 %cond, float %A, float %B) {			define float @select_fsub_swapped_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fsub_swapped_fast_math(			; CHECK-LABEL: @select_fsub_swapped_fast_math(
	; CHECK-NEXT: [[C:%.]] = fsub fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float 0.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fsub fast float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fsub fast float %A, %B			%C = fsub fast float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	; 'fsub' can only fold on the amount subtracted.			; 'fsub' can only fold on the amount subtracted.
	define float @select_fsub_invalid(i1 %cond, float %A, float %B) {			define float @select_fsub_invalid(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fsub_invalid(			; CHECK-LABEL: @select_fsub_invalid(
	; CHECK-NEXT: [[C:%.]] = fsub float [[B:%.]], [[A:%.*]]			; CHECK-NEXT: [[C:%.]] = fsub float [[B:%.]], [[A:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fsub float %B, %A			%C = fsub float %B, %A
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fdiv(i1 %cond, float %A, float %B) {			define float @select_fdiv(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fdiv(			; CHECK-LABEL: @select_fdiv(
	; CHECK-NEXT: [[C:%.]] = fdiv float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float 1.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fdiv float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fdiv float %A, %B			%C = fdiv float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fdiv_swapped(i1 %cond, float %A, float %B) {			define float @select_fdiv_swapped(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fdiv_swapped(			; CHECK-LABEL: @select_fdiv_swapped(
	; CHECK-NEXT: [[C:%.]] = fdiv float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float 1.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fdiv float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fdiv float %A, %B			%C = fdiv float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	define float @select_fdiv_fast_math(i1 %cond, float %A, float %B) {			define float @select_fdiv_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fdiv_fast_math(			; CHECK-LABEL: @select_fdiv_fast_math(
	; CHECK-NEXT: [[C:%.]] = fdiv fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float [[B:%.*]], float 1.000000e+00
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[C]], float [[A]]			; CHECK-NEXT: [[D:%.]] = fdiv fast float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fdiv fast float %A, %B			%C = fdiv fast float %A, %B
	%D = select i1 %cond, float %C, float %A			%D = select i1 %cond, float %C, float %A
	ret float %D			ret float %D
	}			}

	define float @select_fdiv_swapped_fast_math(i1 %cond, float %A, float %B) {			define float @select_fdiv_swapped_fast_math(i1 %cond, float %A, float %B) {
	; CHECK-LABEL: @select_fdiv_swapped_fast_math(			; CHECK-LABEL: @select_fdiv_swapped_fast_math(
	; CHECK-NEXT: [[C:%.]] = fdiv fast float [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[C:%.]] = select i1 [[COND:%.]], float 1.000000e+00, float [[B:%.*]]
	; CHECK-NEXT: [[D:%.]] = select i1 [[COND:%.]], float [[A]], float [[C]]			; CHECK-NEXT: [[D:%.]] = fdiv fast float [[A:%.]], [[C]]
	; CHECK-NEXT: ret float [[D]]			; CHECK-NEXT: ret float [[D]]
	;			;
	%C = fdiv fast float %A, %B			%C = fdiv fast float %A, %B
	%D = select i1 %cond, float %A, float %C			%D = select i1 %cond, float %A, float %C
	ret float %D			ret float %D
	}			}

	; 'fdiv' can only fold on the divisor amount.			; 'fdiv' can only fold on the divisor amount.
	Show All 10 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-selectandorcost.ll

	Show All 37 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = fcmp fast oeq <4 x float> [[WIDE_LOAD6]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fcmp fast oeq <4 x float> [[WIDE_LOAD6]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i1> [[TMP3]], <4 x i1> zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i1> [[TMP3]], <4 x i1> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP5:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP6:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[WIDE_LOAD6]])			; CHECK-NEXT: [[TMP6:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[WIDE_LOAD6]])
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <4 x float> [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <4 x float> [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD6]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD6]]
	; CHECK-NEXT: [[TMP9:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP8]])			; CHECK-NEXT: [[TMP9:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP8]])
	; CHECK-NEXT: [[TMP10:%.*]] = fdiv fast <4 x float> [[TMP9]], [[TMP7]]			; CHECK-NEXT: [[TMP10:%.*]] = fdiv fast <4 x float> [[TMP9]], [[TMP7]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <4 x float> [[TMP10]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP11:%.*]] = select <4 x i1> [[TMP4]], <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> [[TMP10]]
	; CHECK-NEXT: [[PREDPHI]] = select <4 x i1> [[TMP4]], <4 x float> [[VEC_PHI]], <4 x float> [[TMP11]]			; CHECK-NEXT: [[PREDPHI]] = fadd fast <4 x float> [[VEC_PHI]], [[TMP11]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP13:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[PREDPHI]])			; CHECK-NEXT: [[TMP13:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[PREDPHI]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll

	Show First 20 Lines • Show All 953 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @reduction_fadd(			; CHECK-LABEL: @reduction_fadd(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE6:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE6:%.*]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_LOAD_CONTINUE6]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_LOAD_CONTINUE6]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP45:%.]], [[PRED_LOAD_CONTINUE6]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP43:%.]], [[PRED_LOAD_CONTINUE6]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = icmp ult <4 x i64> [[VEC_IND]], <i64 257, i64 257, i64 257, i64 257>			; CHECK-NEXT: [[TMP0:%.*]] = icmp ult <4 x i64> [[VEC_IND]], <i64 257, i64 257, i64 257, i64 257>
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i1> [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i1> [[TMP0]], i32 0
	; CHECK-NEXT: br i1 [[TMP1]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]			; CHECK-NEXT: br i1 [[TMP1]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; CHECK: pred.load.if:			; CHECK: pred.load.if:
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = load float, float [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load float, float [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3			; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
	; CHECK: pred.load.continue6:			; CHECK: pred.load.continue6:
	; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP40:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP40:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[TMP40]])			; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[TMP40]])
	; CHECK-NEXT: [[TMP42:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP42:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP43:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP41]], <4 x float> [[TMP42]])			; CHECK-NEXT: [[TMP43]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP41]], <4 x float> [[TMP42]])
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP44]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP44]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 330 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP6:%.*]] = fcmp ule <4 x float> [[WIDE_LOAD1]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP6:%.*]] = fcmp ule <4 x float> [[WIDE_LOAD1]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD]], <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>			; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD]], <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = and <4 x i1> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = and <4 x i1> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i1> [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i1> [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i1> [[TMP7]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i1> [[TMP7]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i1> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i1> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i1> [[TMP5]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i1> [[TMP5]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI_V:%.*]] = select <4 x i1> [[TMP9]], <4 x float> [[WIDE_LOAD1]], <4 x float> [[WIDE_LOAD]]			; CHECK-NEXT: [[PREDPHI_V:%.*]] = select <4 x i1> [[TMP9]], <4 x float> [[WIDE_LOAD1]], <4 x float> [[WIDE_LOAD]]
	; CHECK-NEXT: [[PREDPHI:%.*]] = fadd fast <4 x float> [[VEC_PHI]], [[PREDPHI_V]]
	; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP12]], <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> [[TMP11]]			; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP12]], <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> [[TMP11]]
	; CHECK-NEXT: [[PREDPHI3]] = select <4 x i1> [[TMP13]], <4 x float> [[VEC_PHI]], <4 x float> [[PREDPHI]]			; CHECK-NEXT: [[PREDPHI2:%.*]] = select <4 x i1> [[TMP13]], <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> [[PREDPHI_V]]
				; CHECK-NEXT: [[PREDPHI3]] = fadd fast <4 x float> [[VEC_PHI]], [[PREDPHI2]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP15:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[PREDPHI3]])			; CHECK-NEXT: [[TMP15:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[PREDPHI3]])
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @reduction_sum(			; CHECK-LABEL: @reduction_sum(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD3:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[VEC_IND2]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[VEC_IND1]])
	; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[TMP4]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[TMP4]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD1]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD3]])
	; CHECK-NEXT: [[TMP9]] = add i32 [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[TMP9]] = add i32 [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT3]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP5:![0-9]+]]
	▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @reduction_prod(			; CHECK-LABEL: @reduction_prod(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 1, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 1, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD3:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[VEC_IND2]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[VEC_IND1]])
	; CHECK-NEXT: [[TMP5:%.*]] = mul i32 [[TMP4]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP5:%.*]] = mul i32 [[TMP4]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP7:%.*]] = mul i32 [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = mul i32 [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[WIDE_LOAD1]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[WIDE_LOAD3]])
	; CHECK-NEXT: [[TMP9]] = mul i32 [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[TMP9]] = mul i32 [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT3]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP9:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP9:![0-9]+]]
	Show All 29 Lines
	; CHECK-LABEL: @reduction_mix(			; CHECK-LABEL: @reduction_mix(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD3:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[WIDE_LOAD1]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[WIDE_LOAD3]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[VEC_IND2]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[VEC_IND1]])
	; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[TMP8]] = add i32 [[TMP7]], [[TMP6]]			; CHECK-NEXT: [[TMP8]] = add i32 [[TMP7]], [[TMP6]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT3]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP11:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP11:![0-9]+]]
	▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
	define float @reduction_fadd(float* nocapture %A, float* nocapture %B) {			define float @reduction_fadd(float* nocapture %A, float* nocapture %B) {
	; CHECK-LABEL: @reduction_fadd(			; CHECK-LABEL: @reduction_fadd(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP4]], <4 x float> [[WIDE_LOAD1]])			; CHECK-NEXT: [[TMP5]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP4]], <4 x float> [[WIDE_LOAD1]])
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP6:%.*]] = fcmp ule <4 x float> [[WIDE_LOAD1]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP6:%.*]] = fcmp ule <4 x float> [[WIDE_LOAD1]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD]], <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>			; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD]], <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = and <4 x i1> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = and <4 x i1> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i1> [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i1> [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i1> [[TMP7]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i1> [[TMP7]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i1> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i1> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i1> [[TMP5]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i1> [[TMP5]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI_V:%.*]] = select <4 x i1> [[TMP9]], <4 x float> [[WIDE_LOAD1]], <4 x float> [[WIDE_LOAD]]			; CHECK-NEXT: [[PREDPHI_V:%.*]] = select <4 x i1> [[TMP9]], <4 x float> [[WIDE_LOAD1]], <4 x float> [[WIDE_LOAD]]
	; CHECK-NEXT: [[PREDPHI:%.*]] = fadd fast <4 x float> [[VEC_PHI]], [[PREDPHI_V]]
	; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP12]], <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> [[TMP11]]			; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP12]], <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> [[TMP11]]
	; CHECK-NEXT: [[PREDPHI3]] = select <4 x i1> [[TMP13]], <4 x float> [[VEC_PHI]], <4 x float> [[PREDPHI]]			; CHECK-NEXT: [[PREDPHI2:%.*]] = select <4 x i1> [[TMP13]], <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> [[PREDPHI_V]]
				; CHECK-NEXT: [[PREDPHI3]] = fadd fast <4 x float> [[VEC_PHI]], [[PREDPHI2]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP15:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[PREDPHI3]])			; CHECK-NEXT: [[TMP15:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[PREDPHI3]])
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @reduction_predicated(			; CHECK-LABEL: @reduction_predicated(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD3:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[VEC_IND2]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[VEC_IND1]])
	; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[TMP4]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[TMP4]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD1]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[WIDE_LOAD3]])
	; CHECK-NEXT: [[TMP9]] = add i32 [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[TMP9]] = add i32 [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT3]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP34:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP34:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP35:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP35:![0-9]+]]
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll

	Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 20, i32 20, i32 20, i32 20>			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 20, i32 20, i32 20, i32 20>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4, !alias.scope !11			; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4, !alias.scope !11
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[WIDE_LOAD14]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[WIDE_LOAD14]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr float, float [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr float, float [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4, !alias.scope !13, !noalias !15			; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4, !alias.scope !13, !noalias !15
	; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x float> [[TMP7]], [[WIDE_LOAD15]]			; CHECK-NEXT: [[TMP10:%.*]] = select <4 x i1> [[TMP4]], <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> [[WIDE_LOAD15]]
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x float> [[TMP7]], <4 x float> [[TMP10]]			; CHECK-NEXT: [[PREDPHI:%.*]] = fadd <4 x float> [[TMP7]], [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[PREDPHI]], <4 x float>* [[TMP11]], align 4, !alias.scope !13, !noalias !15			; CHECK-NEXT: store <4 x float> [[PREDPHI]], <4 x float>* [[TMP11]], align 4, !alias.scope !13, !noalias !15
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
	; CHECK-NEXT: br i1 [[TMP12]], label [[EXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[EXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: loop.body:			; CHECK: loop.body:
	; CHECK-NEXT: [[IV1:%.]] = phi i64 [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[IV1:%.]] = phi i64 [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[C_GEP:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[IV1]]			; CHECK-NEXT: [[C_GEP:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[IV1]]
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 389053

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

llvm/test/Transforms/InstCombine/select-binop-foldable-floating-point.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-selectandorcost.ll

llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll

llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

llvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll

[InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv.
ClosedPublic