This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
5/30
SLPVectorizer.cpp
-
test/Transforms/
-
Transforms/
-
PhaseOrdering/X86/
-
X86/
-
scalarization-inseltpoison.ll
-
scalarization.ll
-
SLPVectorizer/
-
AArch64/
-
buildvector-reduce.ll
-
X86/
-
PR39774.ll
-
buildvector_splat_extractvalue.ll
-
crash_reordering_undefs.ll
-
extract-scalar-from-undef.ll
-
float-min-max.ll
-
gather-extractelements-different-bbs.ll
-
horizontal-list.ll
-
malformed_phis.ll
-
reduced-gathered-vectorized.ll
-
reduction-value-in-tree.ll
-
reorder_repeated_ops.ll
-
revectorized_rdx_crash.ll
-
scalarization-overhead.ll
-
slp-schedule-use-order.ll
-
undef_vect.ll

Differential D132261

[SLP]Do not reduce repeated values, use scalar red ops instead.
ClosedPublic

Authored by ABataev on Aug 19 2022, 1:58 PM.

Download Raw Diff

Details

Reviewers

RKSimon
vdmitrie

Commits

rGe03d254bbd54: [SLP]Do not reduce repeated values, use scalar red ops instead.

Summary

Metric: size..text

size..text                 results     results0    diff

SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0%

For all tests some extra code was optimized, GCC-C-execute has some more
inlining after

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Aug 19 2022, 1:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 1:58 PM

Herald added subscribers: vporpo, hiraditya. · View Herald Transcript

ABataev requested review of this revision.Aug 19 2022, 1:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 1:58 PM

Harbormaster completed remote builds in B182294: Diff 454105.Aug 19 2022, 3:00 PM

Rebase

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptSep 2 2022, 7:07 AM

Harbormaster completed remote builds in B184816: Diff 457586.Sep 2 2022, 7:33 AM

Rebase, ping

Harbormaster completed remote builds in B188032: Diff 461984.Sep 21 2022, 12:34 PM

Ping!

Harbormaster completed remote builds in B188700: Diff 462899.Sep 26 2022, 8:28 AM

The patch is seems trying to move SLP vectorizer into InstCombine territory. I'm not sure why we should do that. Did you try to analyze which specific patterns helped in for example GCC-C-execute test?
Can these represent any cases where instcombine could be improved instead? It is difficult to access this patch based on test changes as a lot of SLP vectorizer tests were not designed as capable to run through instcombine.
If you run "-instcombine -slp-vectorizer" instead of just -slp-vectorizer then how many of the affected LIT tests would still benefit from the patch?

In D132261#3821385, @vdmitrie wrote:

The patch is seems trying to move SLP vectorizer into InstCombine territory. I'm not sure why we should do that. Did you try to analyze which specific patterns helped in for example GCC-C-execute test?
Can these represent any cases where instcombine could be improved instead? It is difficult to access this patch based on test changes as a lot of SLP vectorizer tests were not designed as capable to run through instcombine.
If you run "-instcombine -slp-vectorizer" instead of just -slp-vectorizer then how many of the affected LIT tests would still benefit from the patch?

Most of these test will be optimized for sure, since they are pretty simple. But looks like there are some other places, where reduction analysis in SLP is better than the similar analysis in instcombiner. Also, if we can do some optimization here, it shall reduce compile time, since instcombiner consumes lots of time

Rebase

Now the difference is even more:

            test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   431225.00   431288.00  0.0%
       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2198241.00  2198465.00  0.0%
                      test-suite :: MultiSource/Applications/SPASS/SPASS.test   530608.00   530640.00  0.0%
        test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1135953.00  1135937.00 -0.0%
               test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   651440.00   651360.00 -0.0%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test    48631.00    48311.00 -0.7%

InstCombine can simplify most of the currently not optimized stuff, but requires extra passes (like ReassociatePass) and extra compile time, while we can do it easily in SLP vectorizer, since we already have all required data. PhaseOrdering tests are the proves.

Harbormaster completed remote builds in B204691: Diff 484989.Dec 22 2022, 5:55 PM

Ping

A couple of general thoughts. Can you please add a knob that allows to turn off the optimization? And can some sort of debug tracing be added? Such as values that has been optimized away?

Address comments

Harbormaster completed remote builds in B208093: Diff 489611.Jan 16 2023, 12:01 PM

Rebase, ping.

Harbormaster completed remote builds in B208989: Diff 490845.Jan 20 2023, 8:49 AM

I'll try to look at this closely next week (but not earlier than Monday). Just a quick note about terminology used. The option name "slp-same-scalars-reduction" does not actually tell much about what it actually controls.
Since you basically are trying to optimize away identity operations in reduction sequences I'd suggest you to use name for option and across the code that better reflect that.
The option name could be "-slp-optimize-identity-hor-reduction-ops=true|false" for example.

Rebase, renamed option, added a test run for false option value

Harbormaster completed remote builds in B209064: Diff 490963.Jan 20 2023, 3:16 PM

Rebase, ping!

Harbormaster completed remote builds in B209882: Diff 492126.Jan 25 2023, 9:45 AM

Rebase, ping!

Harbormaster completed remote builds in B210846: Diff 493402.Jan 30 2023, 2:15 PM

Ping!

Harbormaster completed remote builds in B211741: Diff 494656.Feb 3 2023, 9:33 AM

RKSimon added inline comments.Feb 4 2023, 1:08 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
12430	Use allConstant() ?
12550	allConstant(Candidates)

Address comments

Harbormaster completed remote builds in B212086: Diff 495111.Feb 6 2023, 7:59 AM

Rebase

Harbormaster completed remote builds in B213069: Diff 496491.Feb 10 2023, 9:20 AM

I'm sorry for the delay. Was bit overloaded with an internal stuff.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
131	Would the following description "Allow optimization of original scalar identity operations on matched horizontal reductions." sound better? Same question wrt AllowSameScalarReduction variable name. Will AllowHorRdxIdenityOptimization fit better? It probably makes sense to add a note (as a comment above or into the commit message): will the optimization run if we match a reduction but do not vectorize at the end? Based on test updates it looks like it will run. IMO that should be clearly stated that the pass outcome might not be a vectorized code.
12244	nit: noticed by chance while was refreshing with the code. This call to getRdxKind can be safely replaced with RdxKind.
12421	nit: maybe make it unsigned too (+ deduction rule)?
12557–12566	This code pattern is seen multiple times. Any chance to outline it? + (for better readability) swap then/else and remove '!' from condition?
12569	We don't need to collect repeating counters if IsSupportedreusedRdxOp is false.
12573–12574	This code shall form a member of HorizontalReduction class and emitReusedOps() need to assert it can handle a case by calling the former before doing any processing.
12575	This needs a comment. Are you trying to catch a case for special processing when each value used same times? Please describe what is the benefit of catching such cases? Why are you trying to calculate it this early?
12581	drop_front() ?
12698	Are you trying to repurpose the data? To be honest, I'd refrain from doing that.
12699–12708	Fuse these loops? ` for (unsigned Cnt = 0; Cnt < NumReducedVals; ++Cnt) { if (Cnt >= Pos && Cnt < Pos + ReduxWidth) continue; ... }`
12710–12731	Fuse these loops?
13084	Please add a description.

vdmitrie added inline comments.Feb 13 2023, 3:25 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
12581	drop_front() ? This was a "nit" actually. drop_front() needs ArrayRef, so it should look like the following: ArrayRef<std::pair<Value *, unsigned>>(SameValuesCounter.takeVector()).drop_front() probably not worth the effort to save on just one comparison.

vdmitrie added inline comments.Feb 13 2023, 6:29 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
12577	"IsSupportedReusedRdxOp && HasReusedScalars" pattern is the most common use. What about defining it as below `bool OptReusedScalars = IsSupportedReusedRdxOp && SameValuesCounter.size() != NumReducedVals;` and just check for OptReusedScalars everywhere across the code? It only used differently at line 12683 but check for HasReusedScalars can be safely dropped there.
12581	Uhh, I did not realize takeVector clears the map. finally... this one should do the trick: `all_of(drop_begin(SameValuesCounter), [&SameValuesCounter](const std::pair<Value *, unsigned> &P) { return P.second == SameValuesCounter.begin()->second; })`
13188	May be reorganize it a bit to add a return statement? It can produce warning "control reaches end of non-void function [-Wreturn-type]"

ABataev added inline comments.Feb 15 2023, 8:02 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
12569	No, still need it, it is used to produce vector of unique scalars, see line 12577.
12575	It may help to produce better vector ops. Say, we have 8 x aabbccdd. It will require reduction of 8 elements + the built tree still will operate on 4 x element, but the last node will have reshuffle from 4 x vector to 8 x vector + red(8 x ). Instead we may have just red(4 x abcd)*2.

Address comments

Harbormaster completed remote builds in B213911: Diff 497694.Feb 15 2023, 9:54 AM

Rebase

vdmitrie added inline comments.Feb 16 2023, 10:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
12567	This comment line seems to be misplaced.
13095–13113	What I basically see here is that you packed two methods into one and differentiate them by VL argument. May be just split this method in two ? One to handle identity optimization and another for same scale factor optimization. They seems do not share any essential code, only the switch. But sharing the switch does not look right to me as these optimizations do not fully share reduction kinds.
13175	Are you relying on instcombiner to optimize this further? I mean for example mul X, 2 -> add X, X

ABataev added inline comments.Feb 16 2023, 10:34 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
13095–13113	I thought about it, just in many cases we reuse the same code, so I decided to put it into single switch (just (f)add and xor have special processing).
13175	Yes, because there might be some other numbers, say 3, 4, etc.

Harbormaster completed remote builds in B214179: Diff 498049.Feb 16 2023, 10:53 AM

Fix comment

vdmitrie added inline comments.Feb 16 2023, 11:39 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
13095–13113	I don't buy this kind of savings. Ideally we want to assert that we have one of add/fadd/xor reduction kind when we fall into same scale factor optimization. This is why I advocate for splitting it in two. We do not save a lot on switch which will be handing just 3 cases and default one for the second method. Code inside the switch cases is already separate. So it is easy to do from the very beginning.

Address comments

Harbormaster completed remote builds in B214239: Diff 498125.Feb 16 2023, 2:56 PM

Thanks. This revision basically looks good to me. So I'm going to accept it.
@RKSimon , do you have any remarks/comments/concerns?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
13093–13096	okay :-) that works too just a n.i.t. remark: you could reduce arguments to just these if the code was a dedicated method: (Value VectorizedValue, IRBuilderBase &Builder, ArrayRef<Value > VL, unsigned Cnt)

ABataev added inline comments.Feb 16 2023, 4:32 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
13093–13096	It is not possible, both SameValuesCounter and TrackedToOrig are used for each elements of VL in the second switch to build correct constant vector.

vdmitrie added inline comments.Feb 16 2023, 4:51 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
13093–13096	I don't see that. Isn't the following is the only use of them in this part (under condition VL.empty())? ` unsigned Cnt = SameValuesCounter.lookup(TrackedToOrig.find(VectorizedValue)->second); ` Aha, I even overlooked that you don't actually use VL elements in this part too. So here are the only essential arguments for it: (Value *VectorizedValue, IRBuilderBase &Builder, unsigned Cnt)

vdmitrie added inline comments.Feb 16 2023, 4:56 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
13093–13096	Ah, I missed that you said about second switch. I was thinking about first part of the method. May be I wasn't clear enough. I meant to have a separate method for the part which is currently under VL.empty(). Value emitScaleForReusedOps(Value VectorizedValue, IRBuilderBase &Builder, unsigned Scale) { ... here goes the code which is now under VL.empty() ... }

Address comments

Perfect! Looks good. Thanks, Alexey!

This revision is now accepted and ready to land.Feb 16 2023, 5:43 PM

Harbormaster completed remote builds in B214302: Diff 498204.Feb 16 2023, 6:41 PM

Closed by commit rGe03d254bbd54: [SLP]Do not reduce repeated values, use scalar red ops instead. (authored by ABataev). · Explain WhyFeb 17 2023, 7:23 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGe03d254bbd54: [SLP]Do not reduce repeated values, use scalar red ops instead..

@ABataev, please take a look at https://github.com/llvm/llvm-project/issues/61050

@ABataev, one of our internal tests seems to hit an infinite loop in the compiler after this change. I have filed issue 61224 for the issue, can you take a look?

In D132261#4172316, @dyung wrote:

@ABataev, one of our internal tests seems to hit an infinite loop in the compiler after this change. I have filed issue 61224 for the issue, can you take a look?

Will check/fix ASAP.

t.ll5 KBDownload

Hi Alexey,
something isn't quite right with the transformation:
$ opt -passes=slp-vectorizer -slp-optimize-identity-hor-reduction-ops=false -S -o j.ll t.ll && llc -O0 j.ll && clang j.s -o a.out && ./a.out
209
$ opt -passes=slp-vectorizer -slp-optimize-identity-hor-reduction-ops=true -S -o j.ll t.ll && llc -O0 j.ll && clang j.s -o a.out && ./a.out
202

Hi Alexey,

It turned out that in some cases the transformation introduced in this change doesn't work properly. Let's consider this code:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i32 @main() {
entry:
  %sq = alloca [64 x i32], i32 0, align 16
  
  %0 = getelementptr inbounds [64 x i32], ptr %sq, i64 0, i64 1
  %elt_1 = load i32, ptr %0, align 4
  %1 = getelementptr [64 x i32], ptr %sq, i64 0, i64 2
  %elt_2 = load i32, ptr %1, align 8
  %2 = getelementptr [64 x i32], ptr %sq, i64 0, i64 3
  %elt_3 = load i32, ptr %2, align 4
  %3 = getelementptr [64 x i32], ptr %sq, i64 0, i64 4
  %elt_4 = load i32, ptr %3, align 16  
  
  %4 = add i32 %elt_2, %elt_3
  %5 = add i32 %4, %elt_2
  %6 = add i32 %5, %elt_1  
  %7 = add i32 %6, %elt_4
  %8 = add i32 %7, %elt_3
  %9 = add i32 %8, %elt_2
  %10 = add i32 %9, %elt_1
  
  %call = tail call i32 (ptr, ...) null(ptr null, i32 %10)
  ret i32 0
}

It's easy to see that %10 fianl value is 2*%elt_1 + 3*%elt_2 + 2*%elt_3 + %elt_4. But actually we get this code:

$ opt -S test.ll -passes=slp-vectorizer
...
define i32 @main() {
entry:
  %sq = alloca [64 x i32], i32 0, align 16
  %0 = getelementptr inbounds [64 x i32], ptr %sq, i64 0, i64 1
  %1 = load <4 x i32>, ptr %0, align 4
  %2 = mul <4 x i32> %1, <i32 3, i32 2, i32 2, i32 1>
  %3 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %2)
  %call = tail call i32 (ptr, ...) null(ptr null, i32 %3)
  ret i32 0
}

<i32 3, i32 2, i32 2, i32 1> value doesn't look correct, it should be <i32 2, i32 3, i32 2, i32 1>. Running the test with -slp-optimize-identity-hor-reduction-ops=false gives us this code:

$ opt -S test.ll -passes=slp-vectorizer -slp-optimize-identity-hor-reduction-ops=false
...
define i32 @main() {
entry:
  %sq = alloca [64 x i32], i32 0, align 16
  %0 = getelementptr inbounds [64 x i32], ptr %sq, i64 0, i64 1
  %1 = load <4 x i32>, ptr %0, align 4
  %2 = shufflevector <4 x i32> %1, <4 x i32> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 2, i32 2, i32 0, i32 0, i32 3>
  %3 = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %2)
  %call = tail call i32 (ptr, ...) null(ptr null, i32 %3)
  ret i32 0
}

<i32 1, i32 1, i32 1, i32 2, i32 2, i32 0, i32 0, i32 3> consist of two zeros and three ones which is correct which means that the incorrectness was introduced with this change.

I'll investigate all reports, thanks!

calebwat added a subscriber: calebwat.Mar 31 2023, 4:33 PM

To add on to the above, the order of the values in the scale vector is determined by the default order of Candidates, and in these test cases there's a mismatch between that order and the lanes that the scalar values are in in the vectorized value.

Ok, have a fix for the issue, it is very simple. Forgot to take the ordering into account, which can be dropped for the reductions.

Fixed in c1660006b2809a17e9e33034252306a703fdb720

@ABataev May I ask whether the change to constant folding semantics is intended with -slp-optimize-identity-hor-reduction-ops?

Cf. https://godbolt.org/z/G44xhKcTc

I wonder whether this may be connected to the constant folding similar to createOp related to this patch, https://github.com/llvm/llvm-project/issues/61224, https://github.com/llvm/llvm-project/commit/c411965820eb803dd7eac39f80357cad663b7ba0

LLVM version 16.0.0

opt -passes=slp-vectorizer

%mul = fmul reassoc nsz float 0x39B4484C00000000, 0x39B4484C00000000
ret float %mul

LLVM version 17.0.0git

opt -passes=slp-vectorizer -slp-optimize-identity-hor-reduction-ops=0

%mul = fmul reassoc nsz float 0x39B4484C00000000, 0x39B4484C00000000
ret float %mul

LLVM version 17.0.0git

opt -passes=slp-vectorizer

ret float 0.000000e+00

In D132261#4317008, @Matt wrote:
@ABataev May I ask whether the change to constant folding semantics is intended with -slp-optimize-identity-hor-reduction-ops?

Cf. https://godbolt.org/z/G44xhKcTc

I wonder whether this may be connected to the constant folding similar to createOp related to this patch, https://github.com/llvm/llvm-project/issues/61224, https://github.com/llvm/llvm-project/commit/c411965820eb803dd7eac39f80357cad663b7ba0

LLVM version 16.0.0

opt -passes=slp-vectorizer
%mul = fmul reassoc nsz float 0x39B4484C00000000, 0x39B4484C00000000
ret float %mul
LLVM version 17.0.0git

opt -passes=slp-vectorizer -slp-optimize-identity-hor-reduction-ops=0
%mul = fmul reassoc nsz float 0x39B4484C00000000, 0x39B4484C00000000
ret float %mul
LLVM version 17.0.0git

opt -passes=slp-vectorizer
ret float 0.000000e+00

Yes, slp-vectorizer may fold constants as a side effect of the reduction optimization.

Thanks!

This explains some of the behavior I've observed.
Context: Downstream implementation with a custom floating-point trapping behavior which generally disables constant folding for would-be trapping operations (such as mutiplying 1e-30f * 1e-30f which would result FE_UNDERFLOW due to underflow to zero).

The issue I've observed is exactly like in https://github.com/llvm/llvm-project/issues/61224, i.e., an infinite loop.
I suppose this is due to the fact that FMul (as in the example above) is not constant folded.

I think I may be able to deal with this downstream if I can prevent the constant folding alone while not regressing back to the infinite loop as in issue 61224 (I suppose that run into the infinite loop as the pass at this point didn't expect non-constant-folded instructions?).
As a workaround I may be able to disable AllowHorRdxIdenityOptimization whenever the custom trapping is enabled.
For my context, I'm wondering, would you happen to know whether disabling the constant-folding alone without running into an infinite loop issue is even possible?

I've noticed there are 4 occurrences of AllowHorRdxIdenityOptimization and I can try to experiment selective disablement (I presume the allConstant branches are mostly relevant here) but thought I'd ask just in case this is a general issue.

In D132261#4317112, @Matt wrote:

Thanks!

This explains some of the behavior I've observed.
Context: Downstream implementation with a custom floating-point trapping behavior which generally disables constant folding for would-be trapping operations (such as mutiplying 1e-30f * 1e-30f which would result FE_UNDERFLOW due to underflow to zero).

The issue I've observed is exactly like in https://github.com/llvm/llvm-project/issues/61224, i.e., an infinite loop.
I suppose this is due to the fact that FMul (as in the example above) is not constant folded.

I think I may be able to deal with this downstream if I can prevent the constant folding alone while not regressing back to the infinite loop as in issue 61224 (I suppose that run into the infinite loop as the pass at this point didn't expect non-constant-folded instructions?).
As a workaround I may be able to disable AllowHorRdxIdenityOptimization whenever the custom trapping is enabled.
For my context, I'm wondering, would you happen to know whether disabling the constant-folding alone without running into an infinite loop issue is even possible?

I've noticed there are 4 occurrences of AllowHorRdxIdenityOptimization and I can try to experiment selective disablement (I presume the allConstant branches are mostly relevant here) but thought I'd ask just in case this is a general issue.

Yes, use -slp-optimize-identity-hor-reduction-ops=false to disable constant optimizations. Meanwhile I will try to provide a more general fix for the problem.

Thank you!

In D132261#4317118, @Matt wrote:

Thank you!

Try after d726f99d43a5bcbcdb63dbb3e2213733f3ca0259

Just tested with the commit and appears to be working--thank you once again!

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

323 lines

test/

Transforms/

PhaseOrdering/

X86/

scalarization-inseltpoison.ll

19 lines

scalarization.ll

19 lines

SLPVectorizer/

AArch64/

buildvector-reduce.ll

18 lines

X86/

PR39774.ll

64 lines

buildvector_splat_extractvalue.ll

10 lines

crash_reordering_undefs.ll

12 lines

extract-scalar-from-undef.ll

22 lines

float-min-max.ll

7 lines

gather-extractelements-different-bbs.ll

31 lines

horizontal-list.ll

243 lines

malformed_phis.ll

10 lines

reduced-gathered-vectorized.ll

51 lines

reduction-value-in-tree.ll

6 lines

reorder_repeated_ops.ll

15 lines

revectorized_rdx_crash.ll

26 lines

scalarization-overhead.ll

25 lines

slp-schedule-use-order.ll

12 lines

undef_vect.ll

4 lines

Diff 498204

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
ShouldVectorizeHor("slp-vectorize-hor", cl::init(true), cl::Hidden,		ShouldVectorizeHor("slp-vectorize-hor", cl::init(true), cl::Hidden,
cl::desc("Attempt to vectorize horizontal reductions"));		cl::desc("Attempt to vectorize horizontal reductions"));

static cl::opt<bool> ShouldStartVectorizeHorAtStore(		static cl::opt<bool> ShouldStartVectorizeHorAtStore(
"slp-vectorize-hor-store", cl::init(false), cl::Hidden,		"slp-vectorize-hor-store", cl::init(false), cl::Hidden,
cl::desc(		cl::desc(
"Attempt to vectorize horizontal reductions feeding into a store"));		"Attempt to vectorize horizontal reductions feeding into a store"));

		// NOTE: If AllowHorRdxIdenityOptimization is true, the optimization will run
		// even if we match a reduction but do not vectorize in the end.
		static cl::opt<bool> AllowHorRdxIdenityOptimization(
		vdmitrieUnsubmitted Not Done Reply Inline Actions Would the following description "Allow optimization of original scalar identity operations on matched horizontal reductions." sound better? Same question wrt AllowSameScalarReduction variable name. Will AllowHorRdxIdenityOptimization fit better? It probably makes sense to add a note (as a comment above or into the commit message): will the optimization run if we match a reduction but do not vectorize at the end? Based on test updates it looks like it will run. IMO that should be clearly stated that the pass outcome might not be a vectorized code. vdmitrie: Would the following description "Allow optimization of original scalar identity operations on…
		"slp-optimize-identity-hor-reduction-ops", cl::init(true), cl::Hidden,
		cl::desc("Allow optimization of original scalar identity operations on "
		"matched horizontal reductions."));

static cl::opt<int>		static cl::opt<int>
MaxVectorRegSizeOption("slp-max-reg-size", cl::init(128), cl::Hidden,		MaxVectorRegSizeOption("slp-max-reg-size", cl::init(128), cl::Hidden,
cl::desc("Attempt to vectorize for this register size in bits"));		cl::desc("Attempt to vectorize for this register size in bits"));

static cl::opt<unsigned>		static cl::opt<unsigned>
MaxVFOption("slp-max-vf", cl::init(0), cl::Hidden,		MaxVFOption("slp-max-vf", cl::init(0), cl::Hidden,
cl::desc("Maximum SLP vectorization factor (0=unlimited)"));		cl::desc("Maximum SLP vectorization factor (0=unlimited)"));

▲ Show 20 Lines • Show All 11,714 Lines • ▼ Show 20 Lines	class HorizontalReduction {
SmallVector<SmallVector<Value *>> ReducedVals;		SmallVector<SmallVector<Value *>> ReducedVals;
/// Maps reduced value to the corresponding reduction operation.		/// Maps reduced value to the corresponding reduction operation.
DenseMap<Value , SmallVector<Instruction >> ReducedValsToOps;		DenseMap<Value , SmallVector<Instruction >> ReducedValsToOps;
// Use map vector to make stable output.		// Use map vector to make stable output.
MapVector<Instruction , Value > ExtraArgs;		MapVector<Instruction , Value > ExtraArgs;
WeakTrackingVH ReductionRoot;		WeakTrackingVH ReductionRoot;
/// The type of reduction operation.		/// The type of reduction operation.
RecurKind RdxKind;		RecurKind RdxKind;
		/// Checks if the optimization of original scalar identity operations on
		/// matched horizontal reductions is enabled and allowed.
		bool IsSupportedHorRdxIdentityOp = false;

static bool isCmpSelMinMax(Instruction *I) {		static bool isCmpSelMinMax(Instruction *I) {
return match(I, m_Select(m_Cmp(), m_Value(), m_Value())) &&		return match(I, m_Select(m_Cmp(), m_Value(), m_Value())) &&
RecurrenceDescriptor::isMinMaxRecurrenceKind(getRdxKind(I));		RecurrenceDescriptor::isMinMaxRecurrenceKind(getRdxKind(I));
}		}

// And/or are potentially poison-safe logical patterns like:		// And/or are potentially poison-safe logical patterns like:
// select x, y, false		// select x, y, false
▲ Show 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	auto &&CheckOperands = [this, IsCmpSelMinMax,
ExtraArgs.push_back(EdgeVal);		ExtraArgs.push_back(EdgeVal);
continue;		continue;
}		}
// If the edge is not an instruction, or it is different from the main		// If the edge is not an instruction, or it is different from the main
// reduction opcode or has too many uses - possible reduced value.		// reduction opcode or has too many uses - possible reduced value.
if (!EdgeInst \|\| getRdxKind(EdgeInst) != RdxKind \|\|		if (!EdgeInst \|\| getRdxKind(EdgeInst) != RdxKind \|\|
IsCmpSelMinMax != isCmpSelMinMax(EdgeInst) \|\|		IsCmpSelMinMax != isCmpSelMinMax(EdgeInst) \|\|
!hasRequiredNumberOfUses(IsCmpSelMinMax, EdgeInst) \|\|		!hasRequiredNumberOfUses(IsCmpSelMinMax, EdgeInst) \|\|
!isVectorizable(getRdxKind(EdgeInst), EdgeInst)) {		!isVectorizable(RdxKind, EdgeInst)) {
		vdmitrieUnsubmitted Not Done Reply Inline Actions nit: noticed by chance while was refreshing with the code. This call to getRdxKind can be safely replaced with RdxKind. vdmitrie: nit: noticed by chance while was refreshing with the code. This call to getRdxKind can be…
PossibleReducedVals.push_back(EdgeVal);		PossibleReducedVals.push_back(EdgeVal);
continue;		continue;
}		}
ReductionOps.push_back(EdgeInst);		ReductionOps.push_back(EdgeInst);
}		}
};		};
// Try to regroup reduced values so that it gets more profitable to try to		// Try to regroup reduced values so that it gets more profitable to try to
// reduce them. Values are grouped by their value ids, instructions - by		// reduce them. Values are grouped by their value ids, instructions - by
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	public:
Value tryToReduce(BoUpSLP &V, TargetTransformInfo TTI,		Value tryToReduce(BoUpSLP &V, TargetTransformInfo TTI,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI) {
constexpr int ReductionLimit = 4;		constexpr int ReductionLimit = 4;
constexpr unsigned RegMaxNumber = 4;		constexpr unsigned RegMaxNumber = 4;
constexpr unsigned RedValsMaxNumber = 128;		constexpr unsigned RedValsMaxNumber = 128;
// If there are a sufficient number of reduction values, reduce		// If there are a sufficient number of reduction values, reduce
// to a nearby power-of-2. We can safely generate oversized		// to a nearby power-of-2. We can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.		// vectors and rely on the backend to split them to legal sizes.
size_t NumReducedVals =		unsigned NumReducedVals =
std::accumulate(ReducedVals.begin(), ReducedVals.end(), 0,		std::accumulate(ReducedVals.begin(), ReducedVals.end(), 0,
[](size_t Num, ArrayRef<Value *> Vals) {		[](unsigned Num, ArrayRef<Value *> Vals) -> unsigned {
		vdmitrieUnsubmitted Not Done Reply Inline Actions nit: maybe make it unsigned too (+ deduction rule)? vdmitrie: nit: maybe make it unsigned too (+ deduction rule)?
if (!isGoodForReduction(Vals))		if (!isGoodForReduction(Vals))
return Num;		return Num;
return Num + Vals.size();		return Num + Vals.size();
});		});
if (NumReducedVals < ReductionLimit) {		if (NumReducedVals < ReductionLimit &&
		(!AllowHorRdxIdenityOptimization \|\|
		all_of(ReducedVals, [](ArrayRef<Value *> RedV) {
		return RedV.size() < 2 \|\| !allConstant(RedV) \|\| !isSplat(RedV);
		}))) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Use allConstant() ? RKSimon: Use allConstant() ?
for (ReductionOpsType &RdxOps : ReductionOps)		for (ReductionOpsType &RdxOps : ReductionOps)
for (Value *RdxOp : RdxOps)		for (Value *RdxOp : RdxOps)
V.analyzedReductionRoot(cast<Instruction>(RdxOp));		V.analyzedReductionRoot(cast<Instruction>(RdxOp));
return nullptr;		return nullptr;
}		}

IRBuilder<> Builder(cast<Instruction>(ReductionRoot));		IRBuilder<> Builder(cast<Instruction>(ReductionRoot));

Show All 17 Lines	auto &&GetCmpForMinMaxReduction = [](Instruction *RdxRootInst) {
assert(isa<SelectInst>(RdxRootInst) &&		assert(isa<SelectInst>(RdxRootInst) &&
"Expected min/max reduction to have select root instruction");		"Expected min/max reduction to have select root instruction");
Value *ScalarCond = cast<SelectInst>(RdxRootInst)->getCondition();		Value *ScalarCond = cast<SelectInst>(RdxRootInst)->getCondition();
assert(isa<Instruction>(ScalarCond) &&		assert(isa<Instruction>(ScalarCond) &&
"Expected min/max reduction to have compare condition");		"Expected min/max reduction to have compare condition");
return cast<Instruction>(ScalarCond);		return cast<Instruction>(ScalarCond);
};		};

		// Return new VectorizedTree, based on previous value.
		auto GetNewVectorizedTree = [&](Value VectorizedTree, Value Res) {
		if (VectorizedTree) {
		// Update the final value in the reduction.
		Builder.SetCurrentDebugLocation(
		cast<Instruction>(ReductionOps.front().front())->getDebugLoc());
		return createOp(Builder, RdxKind, VectorizedTree, Res, "op.rdx",
		ReductionOps);
		}
		// Initialize the final value in the reduction.
		return Res;
		};
// The reduction root is used as the insertion point for new instructions,		// The reduction root is used as the insertion point for new instructions,
// so set it as externally used to prevent it from being deleted.		// so set it as externally used to prevent it from being deleted.
ExternallyUsedValues[ReductionRoot];		ExternallyUsedValues[ReductionRoot];
SmallDenseSet<Value > IgnoreList(ReductionOps.size()		SmallDenseSet<Value > IgnoreList(ReductionOps.size()
ReductionOps.front().size());		ReductionOps.front().size());
for (ReductionOpsType &RdxOps : ReductionOps)		for (ReductionOpsType &RdxOps : ReductionOps)
for (Value *RdxOp : RdxOps) {		for (Value *RdxOp : RdxOps) {
if (!RdxOp)		if (!RdxOp)
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = ReducedVals.size(); I < E; ++I) {
SmallVector<int> Mask;		SmallVector<int> Mask;
if (isFixedVectorShuffle(CommonCandidates, Mask)) {		if (isFixedVectorShuffle(CommonCandidates, Mask)) {
++I;		++I;
Candidates.swap(CommonCandidates);		Candidates.swap(CommonCandidates);
ShuffledExtracts = true;		ShuffledExtracts = true;
}		}
}		}
}		}

		// Emit code for constant values.
		if (AllowHorRdxIdenityOptimization && Candidates.size() > 1 &&
		allConstant(Candidates)) {
		RKSimonUnsubmitted Not Done Reply Inline Actions allConstant(Candidates) RKSimon: allConstant(Candidates)
		Value *Res = Candidates.front();
		++VectorizedVals.try_emplace(Candidates.front(), 0).first->getSecond();
		for (Value *V : ArrayRef(Candidates).drop_front()) {
		Res = createOp(Builder, RdxKind, Res, V, "const.rdx", ReductionOps);
		++VectorizedVals.try_emplace(V, 0).first->getSecond();
		}
		VectorizedTree = GetNewVectorizedTree(VectorizedTree, Res);
		continue;
		}

unsigned NumReducedVals = Candidates.size();		unsigned NumReducedVals = Candidates.size();
if (NumReducedVals < ReductionLimit)		if (NumReducedVals < ReductionLimit &&
		(NumReducedVals < 2 \|\| !AllowHorRdxIdenityOptimization \|\|
		!isSplat(Candidates)))
continue;		continue;

		vdmitrieUnsubmitted Not Done Reply Inline Actions This code pattern is seen multiple times. Any chance to outline it? + (for better readability) swap then/else and remove '!' from condition? vdmitrie: This code pattern is seen multiple times. Any chance to outline it? + (for better readability)…
		// Check if we support repeated scalar values processing (optimization of
		vdmitrieUnsubmitted Not Done Reply Inline Actions This comment line seems to be misplaced. vdmitrie: This comment line seems to be misplaced.
		// original scalar identity operations on matched horizontal reductions).
		IsSupportedHorRdxIdentityOp =
		vdmitrieUnsubmitted Not Done Reply Inline Actions We don't need to collect repeating counters if IsSupportedreusedRdxOp is false. vdmitrie: We don't need to collect repeating counters if IsSupportedreusedRdxOp is false.
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, still need it, it is used to produce vector of unique scalars, see line 12577. ABataev: No, still need it, it is used to produce vector of unique scalars, see line 12577.
		AllowHorRdxIdenityOptimization && RdxKind != RecurKind::Mul &&
		RdxKind != RecurKind::FMul && RdxKind != RecurKind::FMulAdd;
		// Gather same values.
		MapVector<Value *, unsigned> SameValuesCounter;
		if (IsSupportedHorRdxIdentityOp)
		vdmitrieUnsubmitted Not Done Reply Inline Actions This code shall form a member of HorizontalReduction class and emitReusedOps() need to assert it can handle a case by calling the former before doing any processing. vdmitrie: This code shall form a member of HorizontalReduction class and emitReusedOps() need to assert…
		for (Value *V : Candidates)
		vdmitrieUnsubmitted Not Done Reply Inline Actions This needs a comment. Are you trying to catch a case for special processing when each value used same times? Please describe what is the benefit of catching such cases? Why are you trying to calculate it this early? vdmitrie: This needs a comment. Are you trying to catch a case for special processing when each value…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It may help to produce better vector ops. Say, we have 8 x aabbccdd. It will require reduction of 8 elements + the built tree still will operate on 4 x element, but the last node will have reshuffle from 4 x vector to 8 x vector + red(8 x ). Instead we may have just red(4 x abcd)2. ABataev:* It may help to produce better vector ops. Say, we have 8 x aabbccdd. It will require reduction…
		++SameValuesCounter.insert(std::make_pair(V, 0)).first->second;
		// Used to check if the reduced values used same number of times. In this
		vdmitrieUnsubmitted Not Done Reply Inline Actions "IsSupportedReusedRdxOp && HasReusedScalars" pattern is the most common use. What about defining it as below `bool OptReusedScalars = IsSupportedReusedRdxOp && SameValuesCounter.size() != NumReducedVals;` and just check for OptReusedScalars everywhere across the code? It only used differently at line 12683 but check for HasReusedScalars can be safely dropped there. vdmitrie: "IsSupportedReusedRdxOp && HasReusedScalars" pattern is the most common use. What about…
		// case the compiler may produce better code. E.g. if reduced values are
		// aabbccdd (8 x values), then the first node of the tree will have a node
		// for 4 x abcd + shuffle <4 x abcd>, <0, 0, 1, 1, 2, 2, 3, 3>.
		// Plus, the final reduction will be performed on <8 x aabbccdd>.
		vdmitrieUnsubmitted Not Done Reply Inline Actions drop_front() ? vdmitrie: drop_front() ?
		vdmitrieUnsubmitted Not Done Reply Inline Actions drop_front() ? This was a "nit" actually. drop_front() needs ArrayRef, so it should look like the following: ArrayRef<std::pair<Value , unsigned>>(SameValuesCounter.takeVector()).drop_front() probably not worth the effort to save on just one comparison. vdmitrie:* > drop_front() ? This was a "nit" actually. drop_front() needs ArrayRef, so it should look…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Uhh, I did not realize takeVector clears the map. finally... this one should do the trick: `all_of(drop_begin(SameValuesCounter), [&SameValuesCounter](const std::pair<Value , unsigned> &P) { return P.second == SameValuesCounter.begin()->second; })` vdmitrie:* Uhh, I did not realize takeVector clears the map. finally... this one should do the trick…
		// Instead compiler may build <4 x abcd> tree immediately, + reduction (4
		// x abcd) * 2.
		// Currently it only handles add/fadd/xor. and/or/min/max do not require
		// this analysis, other operations may require an extra estimation of
		// the profitability.
		bool SameScaleFactor = false;
		bool OptReusedScalars = IsSupportedHorRdxIdentityOp &&
		SameValuesCounter.size() != Candidates.size();
		if (OptReusedScalars) {
		SameScaleFactor =
		(RdxKind == RecurKind::Add \|\| RdxKind == RecurKind::FAdd \|\|
		RdxKind == RecurKind::Xor) &&
		all_of(drop_begin(SameValuesCounter),
		[&SameValuesCounter](const std::pair<Value *, unsigned> &P) {
		return P.second == SameValuesCounter.front().second;
		});
		Candidates.resize(SameValuesCounter.size());
		transform(SameValuesCounter, Candidates.begin(),
		[](const auto &P) { return P.first; });
		NumReducedVals = Candidates.size();
		// Have a reduction of the same element.
		if (NumReducedVals == 1) {
		Value *OrigV = TrackedToOrig.find(Candidates.front())->second;
		unsigned Cnt = SameValuesCounter.lookup(OrigV);
		Value *RedVal =
		emitScaleForReusedOps(Candidates.front(), Builder, Cnt);
		VectorizedTree = GetNewVectorizedTree(VectorizedTree, RedVal);
		VectorizedVals.try_emplace(OrigV, Cnt);
		continue;
		}
		}

unsigned MaxVecRegSize = V.getMaxVecRegSize();		unsigned MaxVecRegSize = V.getMaxVecRegSize();
unsigned EltSize = V.getVectorElementSize(Candidates[0]);		unsigned EltSize = V.getVectorElementSize(Candidates[0]);
unsigned MaxElts =		unsigned MaxElts =
RegMaxNumber * llvm::bit_floor(MaxVecRegSize / EltSize);		RegMaxNumber * llvm::bit_floor(MaxVecRegSize / EltSize);

unsigned ReduxWidth = std::min<unsigned>(		unsigned ReduxWidth = std::min<unsigned>(
llvm::bit_floor(NumReducedVals), std::max(RedValsMaxNumber, MaxElts));		llvm::bit_floor(NumReducedVals), std::max(RedValsMaxNumber, MaxElts));
unsigned Start = 0;		unsigned Start = 0;
Show All 13 Lines	for (unsigned I = 0, E = ReducedVals.size(); I < E; ++I) {
}		}
++Pos;		++Pos;
if (Pos < NumReducedVals - ReduxWidth + 1)		if (Pos < NumReducedVals - ReduxWidth + 1)
return IsAnyRedOpGathered;		return IsAnyRedOpGathered;
Pos = Start;		Pos = Start;
ReduxWidth /= 2;		ReduxWidth /= 2;
return IsAnyRedOpGathered;		return IsAnyRedOpGathered;
};		};
		bool AnyVectorized = false;
while (Pos < NumReducedVals - ReduxWidth + 1 &&		while (Pos < NumReducedVals - ReduxWidth + 1 &&
ReduxWidth >= ReductionLimit) {		ReduxWidth >= ReductionLimit) {
// Dependency in tree of the reduction ops - drop this attempt, try		// Dependency in tree of the reduction ops - drop this attempt, try
// later.		// later.
if (CheckForReusedReductionOpsLocal && PrevReduxWidth != ReduxWidth &&		if (CheckForReusedReductionOpsLocal && PrevReduxWidth != ReduxWidth &&
Start == 0) {		Start == 0) {
CheckForReusedReductionOps = true;		CheckForReusedReductionOps = true;
break;		break;
Show All 36 Lines	for (unsigned I = 0, E = ReducedVals.size(); I < E; ++I) {
if (Cnt == I \|\| (ShuffledExtracts && Cnt == I - 1))		if (Cnt == I \|\| (ShuffledExtracts && Cnt == I - 1))
continue;		continue;
for_each(ReducedVals[Cnt],		for_each(ReducedVals[Cnt],
[&LocalExternallyUsedValues, &TrackedVals](Value *V) {		[&LocalExternallyUsedValues, &TrackedVals](Value *V) {
if (isa<Instruction>(V))		if (isa<Instruction>(V))
LocalExternallyUsedValues[TrackedVals[V]];		LocalExternallyUsedValues[TrackedVals[V]];
});		});
}		}
		if (!IsSupportedHorRdxIdentityOp) {
// Number of uses of the candidates in the vector of values.		// Number of uses of the candidates in the vector of values.
SmallDenseMap<Value *, unsigned> NumUses(Candidates.size());		assert(SameValuesCounter.empty() &&
		vdmitrieUnsubmitted Not Done Reply Inline Actions Are you trying to repurpose the data? To be honest, I'd refrain from doing that. vdmitrie: Are you trying to repurpose the data? To be honest, I'd refrain from doing that.
for (unsigned Cnt = 0; Cnt < Pos; ++Cnt) {		"Reused values counter map is not empty");
		for (unsigned Cnt = 0; Cnt < NumReducedVals; ++Cnt) {
		if (Cnt >= Pos && Cnt < Pos + ReduxWidth)
		continue;
Value *V = Candidates[Cnt];		Value *V = Candidates[Cnt];
++NumUses.try_emplace(V, 0).first->getSecond();		Value *OrigV = TrackedToOrig.find(V)->second;
		++SameValuesCounter[OrigV];
}		}
for (unsigned Cnt = Pos + ReduxWidth; Cnt < NumReducedVals; ++Cnt) {
Value *V = Candidates[Cnt];
++NumUses.try_emplace(V, 0).first->getSecond();
}		}
SmallPtrSet<Value *, 4> VLScalars(VL.begin(), VL.end());		SmallPtrSet<Value *, 4> VLScalars(VL.begin(), VL.end());
		vdmitrieUnsubmitted Not Done Reply Inline Actions Fuse these loops? ` for (unsigned Cnt = 0; Cnt < NumReducedVals; ++Cnt) { if (Cnt >= Pos && Cnt < Pos + ReduxWidth) continue; ... }` vdmitrie: Fuse these loops? ` for (unsigned Cnt = 0; Cnt < NumReducedVals; ++Cnt) { if (Cnt >= Pos…
// Gather externally used values.		// Gather externally used values.
SmallPtrSet<Value *, 4> Visited;		SmallPtrSet<Value *, 4> Visited;
for (unsigned Cnt = 0; Cnt < Pos; ++Cnt) {		for (unsigned Cnt = 0; Cnt < NumReducedVals; ++Cnt) {
Value *RdxVal = Candidates[Cnt];		if (Cnt >= Pos && Cnt < Pos + ReduxWidth)
if (!Visited.insert(RdxVal).second)
continue;		continue;
// Check if the scalar was vectorized as part of the vectorization
// tree but not the top node.
if (!VLScalars.contains(RdxVal) && V.isVectorized(RdxVal)) {
LocalExternallyUsedValues[RdxVal];
continue;
}
unsigned NumOps = VectorizedVals.lookup(RdxVal) + NumUses[RdxVal];
if (NumOps != ReducedValsToOps.find(RdxVal)->second.size())
LocalExternallyUsedValues[RdxVal];
}
for (unsigned Cnt = Pos + ReduxWidth; Cnt < NumReducedVals; ++Cnt) {
Value *RdxVal = Candidates[Cnt];		Value *RdxVal = Candidates[Cnt];
if (!Visited.insert(RdxVal).second)		if (!Visited.insert(RdxVal).second)
continue;		continue;
// Check if the scalar was vectorized as part of the vectorization		// Check if the scalar was vectorized as part of the vectorization
// tree but not the top node.		// tree but not the top node.
if (!VLScalars.contains(RdxVal) && V.isVectorized(RdxVal)) {		if (!VLScalars.contains(RdxVal) && V.isVectorized(RdxVal)) {
LocalExternallyUsedValues[RdxVal];		LocalExternallyUsedValues[RdxVal];
continue;		continue;
}		}
unsigned NumOps = VectorizedVals.lookup(RdxVal) + NumUses[RdxVal];		Value *OrigV = TrackedToOrig.find(RdxVal)->second;
		unsigned NumOps =
		VectorizedVals.lookup(RdxVal) + SameValuesCounter[OrigV];
if (NumOps != ReducedValsToOps.find(RdxVal)->second.size())		if (NumOps != ReducedValsToOps.find(RdxVal)->second.size())
LocalExternallyUsedValues[RdxVal];		LocalExternallyUsedValues[RdxVal];
}		}
		// Do not need the list of reused scalars in regular mode anymore.
		if (!IsSupportedHorRdxIdentityOp)
		SameValuesCounter.clear();
		vdmitrieUnsubmitted Not Done Reply Inline Actions Fuse these loops? vdmitrie: Fuse these loops?
for (Value *RdxVal : VL)		for (Value *RdxVal : VL)
if (RequiredExtract.contains(RdxVal))		if (RequiredExtract.contains(RdxVal))
LocalExternallyUsedValues[RdxVal];		LocalExternallyUsedValues[RdxVal];
V.buildExternalUses(LocalExternallyUsedValues);		V.buildExternalUses(LocalExternallyUsedValues);

V.computeMinimumValueSizes();		V.computeMinimumValueSizes();

// Intersect the fast-math-flags from all reduction operations.		// Intersect the fast-math-flags from all reduction operations.
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = ReducedVals.size(); I < E; ++I) {
Builder.SetInsertPoint(InsertPt);		Builder.SetInsertPoint(InsertPt);

// To prevent poison from leaking across what used to be sequential,		// To prevent poison from leaking across what used to be sequential,
// safe, scalar boolean logic operations, the reduction operand must be		// safe, scalar boolean logic operations, the reduction operand must be
// frozen.		// frozen.
if (isBoolLogicOp(RdxRootInst))		if (isBoolLogicOp(RdxRootInst))
VectorizedRoot = Builder.CreateFreeze(VectorizedRoot);		VectorizedRoot = Builder.CreateFreeze(VectorizedRoot);

		// Emit code to correctly handle reused reduced values, if required.
		if (OptReusedScalars && !SameScaleFactor) {
		VectorizedRoot = emitReusedOps(VectorizedRoot, Builder, VL,
		SameValuesCounter, TrackedToOrig);
		}

Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);		emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);

if (!VectorizedTree) {		// Improved analysis for add/fadd/xor reductions with same scale factor
// Initialize the final value in the reduction.		// for all operands of reductions. We can emit scalar ops for them
VectorizedTree = ReducedSubTree;		// instead.
} else {		if (OptReusedScalars && SameScaleFactor)
// Update the final value in the reduction.		ReducedSubTree = emitScaleForReusedOps(
Builder.SetCurrentDebugLocation(		ReducedSubTree, Builder, SameValuesCounter.front().second);
cast<Instruction>(ReductionOps.front().front())->getDebugLoc());
VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,		VectorizedTree = GetNewVectorizedTree(VectorizedTree, ReducedSubTree);
ReducedSubTree, "op.rdx", ReductionOps);
}
// Count vectorized reduced values to exclude them from final reduction.		// Count vectorized reduced values to exclude them from final reduction.
for (Value *RdxVal : VL) {		for (Value *RdxVal : VL) {
++VectorizedVals.try_emplace(TrackedToOrig.find(RdxVal)->second, 0)		Value *OrigV = TrackedToOrig.find(RdxVal)->second;
.first->getSecond();		if (IsSupportedHorRdxIdentityOp) {
		VectorizedVals.try_emplace(OrigV, SameValuesCounter[OrigV]);
		continue;
		}
		++VectorizedVals.try_emplace(OrigV, 0).first->getSecond();
if (!V.isVectorized(RdxVal))		if (!V.isVectorized(RdxVal))
RequiredExtract.insert(RdxVal);		RequiredExtract.insert(RdxVal);
}		}
Pos += ReduxWidth;		Pos += ReduxWidth;
Start = Pos;		Start = Pos;
ReduxWidth = llvm::bit_floor(NumReducedVals - Pos);		ReduxWidth = llvm::bit_floor(NumReducedVals - Pos);
		AnyVectorized = true;
		}
		if (OptReusedScalars && !AnyVectorized) {
		for (const std::pair<Value *, unsigned> &P : SameValuesCounter) {
		Value *RedVal = emitScaleForReusedOps(P.first, Builder, P.second);
		VectorizedTree = GetNewVectorizedTree(VectorizedTree, RedVal);
		Value *OrigV = TrackedToOrig.find(P.first)->second;
		VectorizedVals.try_emplace(OrigV, P.second);
		}
		continue;
}		}
}		}
if (VectorizedTree) {		if (VectorizedTree) {
// Reorder operands of bool logical op in the natural order to avoid		// Reorder operands of bool logical op in the natural order to avoid
// possible problem with poison propagation. If not possible to reorder		// possible problem with poison propagation. If not possible to reorder
// (both operands are originally RHS), emit an extra freeze instruction		// (both operands are originally RHS), emit an extra freeze instruction
// for the LHS operand.		// for the LHS operand.
//I.e., if we have original code like this:		//I.e., if we have original code like this:
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,
assert(isPowerOf2_32(ReduxWidth) &&		assert(isPowerOf2_32(ReduxWidth) &&
"We only handle power-of-two reductions for now");		"We only handle power-of-two reductions for now");
assert(RdxKind != RecurKind::FMulAdd &&		assert(RdxKind != RecurKind::FMulAdd &&
"A call to the llvm.fmuladd intrinsic is not handled yet");		"A call to the llvm.fmuladd intrinsic is not handled yet");

++NumVectorInstructions;		++NumVectorInstructions;
return createSimpleTargetReduction(Builder, TTI, VectorizedValue, RdxKind);		return createSimpleTargetReduction(Builder, TTI, VectorizedValue, RdxKind);
}		}
};

		/// Emits optimized code for unique scalar value reused \p Cnt times.
		vdmitrieUnsubmitted Not Done Reply Inline Actions Please add a description. vdmitrie: Please add a description.
		Value emitScaleForReusedOps(Value VectorizedValue, IRBuilderBase &Builder,
		unsigned Cnt) {
		assert(IsSupportedHorRdxIdentityOp &&
		"The optimization of matched scalar identity horizontal reductions "
		"must be supported.");
		switch (RdxKind) {
		case RecurKind::Add: {
		// res = mul vv, n
		Value *Scale = ConstantInt::get(VectorizedValue->getType(), Cnt);
		LLVM_DEBUG(dbgs() << "SLP: Add (to-mul) " << Cnt << "of "
		<< VectorizedValue << ". (HorRdx)\n");
		return Builder.CreateMul(VectorizedValue, Scale);
		vdmitrieUnsubmitted Not Done Reply Inline Actions okay :-) that works too just a n.i.t. remark: you could reduce arguments to just these if the code was a dedicated method: (Value VectorizedValue, IRBuilderBase &Builder, ArrayRef<Value > VL, unsigned Cnt) vdmitrie: okay :-) that works too just a n.i.t. remark: you could reduce arguments to just these if the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is not possible, both SameValuesCounter and TrackedToOrig are used for each elements of VL in the second switch to build correct constant vector. ABataev: It is not possible, both SameValuesCounter and TrackedToOrig are used for each elements of VL…
		vdmitrieUnsubmitted Not Done Reply Inline Actions I don't see that. Isn't the following is the only use of them in this part (under condition VL.empty())? ` unsigned Cnt = SameValuesCounter.lookup(TrackedToOrig.find(VectorizedValue)->second); ` Aha, I even overlooked that you don't actually use VL elements in this part too. So here are the only essential arguments for it: (Value VectorizedValue, IRBuilderBase &Builder, unsigned Cnt) vdmitrie:* I don't see that. Isn't the following is the only use of them in this part (under condition VL.
		vdmitrieUnsubmitted Not Done Reply Inline Actions Ah, I missed that you said about second switch. I was thinking about first part of the method. May be I wasn't clear enough. I meant to have a separate method for the part which is currently under VL.empty(). Value emitScaleForReusedOps(Value VectorizedValue, IRBuilderBase &Builder, unsigned Scale) { ... here goes the code which is now under VL.empty() ... } vdmitrie: Ah, I missed that you said about second switch. I was thinking about first part of the method.
		}
		case RecurKind::Xor: {
		// res = n % 2 ? 0 : vv
		LLVM_DEBUG(dbgs() << "SLP: Xor " << Cnt << "of " << VectorizedValue
		<< ". (HorRdx)\n");
		if (Cnt % 2 == 0)
		return Constant::getNullValue(VectorizedValue->getType());
		return VectorizedValue;
		}
		case RecurKind::FAdd: {
		// res = fmul v, n
		Value *Scale = ConstantFP::get(VectorizedValue->getType(), Cnt);
		LLVM_DEBUG(dbgs() << "SLP: FAdd (to-fmul) " << Cnt << "of "
		<< VectorizedValue << ". (HorRdx)\n");
		return Builder.CreateFMul(VectorizedValue, Scale);
		}
		case RecurKind::And:
		vdmitrieUnsubmitted Not Done Reply Inline Actions What I basically see here is that you packed two methods into one and differentiate them by VL argument. May be just split this method in two ? One to handle identity optimization and another for same scale factor optimization. They seems do not share any essential code, only the switch. But sharing the switch does not look right to me as these optimizations do not fully share reduction kinds. vdmitrie: What I basically see here is that you packed two methods into one and differentiate them by VL…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I thought about it, just in many cases we reuse the same code, so I decided to put it into single switch (just (f)add and xor have special processing). ABataev: I thought about it, just in many cases we reuse the same code, so I decided to put it into…
		vdmitrieUnsubmitted Not Done Reply Inline Actions I don't buy this kind of savings. Ideally we want to assert that we have one of add/fadd/xor reduction kind when we fall into same scale factor optimization. This is why I advocate for splitting it in two. We do not save a lot on switch which will be handing just 3 cases and default one for the second method. Code inside the switch cases is already separate. So it is easy to do from the very beginning. vdmitrie: I don't buy this kind of savings. Ideally we want to assert that we have one of add/fadd/xor…
		case RecurKind::Or:
		case RecurKind::SMax:
		case RecurKind::SMin:
		case RecurKind::UMax:
		case RecurKind::UMin:
		case RecurKind::FMax:
		case RecurKind::FMin:
		// res = vv
		return VectorizedValue;
		case RecurKind::Mul:
		case RecurKind::FMul:
		case RecurKind::FMulAdd:
		case RecurKind::SelectICmp:
		case RecurKind::SelectFCmp:
		case RecurKind::None:
		llvm_unreachable("Unexpected reduction kind for repeated scalar.");
		}
		return nullptr;
		}

		/// Emits actual operation for the scalar identity values, found during
		/// horizontal reduction analysis.
		Value emitReusedOps(Value VectorizedValue, IRBuilderBase &Builder,
		ArrayRef<Value *> VL,
		const MapVector<Value *, unsigned> &SameValuesCounter,
		const DenseMap<Value , Value > &TrackedToOrig) {
		assert(IsSupportedHorRdxIdentityOp &&
		"The optimization of matched scalar identity horizontal reductions "
		"must be supported.");
		switch (RdxKind) {
		case RecurKind::Add: {
		// root = mul prev_root, <1, 1, n, 1>
		SmallVector<Constant *> Vals;
		for (Value *V : VL) {
		unsigned Cnt = SameValuesCounter.lookup(TrackedToOrig.find(V)->second);
		Vals.push_back(ConstantInt::get(V->getType(), Cnt, /IsSigned=/false));
		}
		auto *Scale = ConstantVector::get(Vals);
		LLVM_DEBUG(dbgs() << "SLP: Add (to-mul) " << Scale << "of "
		<< VectorizedValue << ". (HorRdx)\n");
		return Builder.CreateMul(VectorizedValue, Scale);
		}
		case RecurKind::And:
		case RecurKind::Or:
		// No need for multiple or/and(s).
		LLVM_DEBUG(dbgs() << "SLP: And/or of same " << VectorizedValue
		<< ". (HorRdx)\n");
		return VectorizedValue;
		case RecurKind::SMax:
		case RecurKind::SMin:
		case RecurKind::UMax:
		case RecurKind::UMin:
		case RecurKind::FMax:
		case RecurKind::FMin:
		// No need for multiple min/max(s) of the same value.
		LLVM_DEBUG(dbgs() << "SLP: Max/min of same " << VectorizedValue
		<< ". (HorRdx)\n");
		return VectorizedValue;
		case RecurKind::Xor: {
		// Replace values with even number of repeats with 0, since
		// x xor x = 0.
		// root = shuffle prev_root, zeroinitalizer, <0, 1, 2, vf, 4, vf, 5, 6,
		vdmitrieUnsubmitted Not Done Reply Inline Actions Are you relying on instcombiner to optimize this further? I mean for example mul X, 2 -> add X, X vdmitrie: Are you relying on instcombiner to optimize this further? I mean for example mul X, 2 -> add X…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, because there might be some other numbers, say 3, 4, etc. ABataev: Yes, because there might be some other numbers, say 3, 4, etc.
		// 7>, if elements 4th and 6th elements have even number of repeats.
		SmallVector<int> Mask(
		cast<FixedVectorType>(VectorizedValue->getType())->getNumElements(),
		UndefMaskElem);
		std::iota(Mask.begin(), Mask.end(), 0);
		bool NeedShuffle = false;
		for (unsigned I = 0, VF = VL.size(); I < VF; ++I) {
		Value *V = VL[I];
		unsigned Cnt = SameValuesCounter.lookup(TrackedToOrig.find(V)->second);
		if (Cnt % 2 == 0) {
		Mask[I] = VF;
		NeedShuffle = true;
		}
		vdmitrieUnsubmitted Not Done Reply Inline Actions May be reorganize it a bit to add a return statement? It can produce warning "control reaches end of non-void function [-Wreturn-type]" vdmitrie: May be reorganize it a bit to add a return statement? It can produce warning "control reaches…
		}
		LLVM_DEBUG(dbgs() << "SLP: Xor <"; for (int I
		: Mask) dbgs()
		<< I << " ";
		dbgs() << "> of " << VectorizedValue << ". (HorRdx)\n");
		if (NeedShuffle)
		VectorizedValue = Builder.CreateShuffleVector(
		VectorizedValue,
		ConstantVector::getNullValue(VectorizedValue->getType()), Mask);
		return VectorizedValue;
		}
		case RecurKind::FAdd: {
		// root = fmul prev_root, <1.0, 1.0, n.0, 1.0>
		SmallVector<Constant *> Vals;
		for (Value *V : VL) {
		unsigned Cnt = SameValuesCounter.lookup(TrackedToOrig.find(V)->second);
		Vals.push_back(ConstantFP::get(V->getType(), Cnt));
		}
		auto *Scale = ConstantVector::get(Vals);
		return Builder.CreateFMul(VectorizedValue, Scale);
		}
		case RecurKind::Mul:
		case RecurKind::FMul:
		case RecurKind::FMulAdd:
		case RecurKind::SelectICmp:
		case RecurKind::SelectFCmp:
		case RecurKind::None:
		llvm_unreachable("Unexpected reduction kind for reused scalars.");
		}
		return nullptr;
		}
		};
} // end anonymous namespace		} // end anonymous namespace

static std::optional<unsigned> getAggregateSize(Instruction *InsertInst) {		static std::optional<unsigned> getAggregateSize(Instruction *InsertInst) {
if (auto *IE = dyn_cast<InsertElementInst>(InsertInst))		if (auto *IE = dyn_cast<InsertElementInst>(InsertInst))
return cast<FixedVectorType>(IE->getType())->getNumElements();		return cast<FixedVectorType>(IE->getType())->getNumElements();

unsigned AggregateSize = 1;		unsigned AggregateSize = 1;
auto *IV = cast<InsertValueInst>(InsertInst);		auto *IV = cast<InsertValueInst>(InsertInst);
▲ Show 20 Lines • Show All 983 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/scalarization-inseltpoison.ll

	Show All 12 Lines
	; CHECK-LABEL: @square(			; CHECK-LABEL: @square(
	; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[K:%.]], 2			; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[K:%.]], 2
	; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[P:%.]], 6234			; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[P:%.]], 6234
	; CHECK-NEXT: [[MUL5:%.]] = mul nsw i32 [[H:%.]], 75			; CHECK-NEXT: [[MUL5:%.]] = mul nsw i32 [[H:%.]], 75
	; CHECK-NEXT: [[DIV9:%.]] = sdiv i32 [[J:%.]], 3452			; CHECK-NEXT: [[DIV9:%.]] = sdiv i32 [[J:%.]], 3452
	; CHECK-NEXT: [[MUL13:%.]] = mul nsw i32 [[W:%.]], 53			; CHECK-NEXT: [[MUL13:%.]] = mul nsw i32 [[W:%.]], 53
	; CHECK-NEXT: [[DIV17:%.]] = sdiv i32 [[X:%.]], 820			; CHECK-NEXT: [[DIV17:%.]] = sdiv i32 [[X:%.]], 820
	; CHECK-NEXT: [[MUL21:%.]] = shl nsw i32 [[U:%.]], 2			; CHECK-NEXT: [[MUL21:%.]] = shl nsw i32 [[U:%.]], 2
	; CHECK-NEXT: [[DOTSCALAR:%.]] = add i32 [[Y:%.]], 1			; CHECK-NEXT: [[OP_RDX:%.*]] = add nsw i32 [[DIV17]], 317426
	; CHECK-NEXT: [[DOTSCALAR1:%.*]] = add i32 [[DOTSCALAR]], [[DIV17]]			; CHECK-NEXT: [[OP_RDX9:%.*]] = add nsw i32 [[DIV]], [[DIV9]]
	; CHECK-NEXT: [[DOTSCALAR2:%.*]] = add i32 [[DOTSCALAR1]], [[MUL5]]			; CHECK-NEXT: [[OP_RDX10:%.*]] = add i32 [[MUL5]], [[MUL13]]
	; CHECK-NEXT: [[DOTSCALAR3:%.*]] = add i32 [[DOTSCALAR2]], [[DIV]]			; CHECK-NEXT: [[OP_RDX11:%.*]] = add i32 [[MUL]], [[MUL21]]
	; CHECK-NEXT: [[DOTSCALAR4:%.*]] = add i32 [[DOTSCALAR3]], [[MUL13]]			; CHECK-NEXT: [[OP_RDX12:%.*]] = add i32 [[OP_RDX]], [[OP_RDX9]]
	; CHECK-NEXT: [[DOTSCALAR5:%.*]] = add i32 [[DOTSCALAR4]], [[MUL]]			; CHECK-NEXT: [[OP_RDX13:%.*]] = add i32 [[OP_RDX10]], [[OP_RDX11]]
	; CHECK-NEXT: [[DOTSCALAR6:%.*]] = add i32 [[DOTSCALAR5]], [[DIV9]]			; CHECK-NEXT: [[OP_RDX14:%.*]] = add i32 [[OP_RDX12]], [[OP_RDX13]]
	; CHECK-NEXT: [[DOTSCALAR7:%.*]] = add i32 [[DOTSCALAR6]], [[MUL21]]			; CHECK-NEXT: [[OP_RDX15:%.]] = add i32 [[OP_RDX14]], [[Y:%.]]
	; CHECK-NEXT: [[DOTSCALAR8:%.*]] = add i32 [[DOTSCALAR7]], 317425			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[OP_RDX15]], i64 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[DOTSCALAR8]], i64 0
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[ADD29:%.]] = add <4 x i32> [[TMP2]], [[NUM:%.]]			; CHECK-NEXT: [[ADD29:%.]] = add <4 x i32> [[TMP2]], [[NUM:%.]]
	; CHECK-NEXT: ret <4 x i32> [[ADD29]]			; CHECK-NEXT: ret <4 x i32> [[ADD29]]
	;			;
	%add = add <4 x i32> %num, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %num, <i32 1, i32 1, i32 1, i32 1>
	%div = sdiv i32 %k, 2			%div = sdiv i32 %k, 2
	%splatinsert = insertelement <4 x i32> poison, i32 %div, i32 0			%splatinsert = insertelement <4 x i32> poison, i32 %div, i32 0
	%splat = shufflevector <4 x i32> %splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer			%splat = shufflevector <4 x i32> %splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
	Show All 33 Lines

llvm/test/Transforms/PhaseOrdering/X86/scalarization.ll

	Show All 12 Lines
	; CHECK-LABEL: @square(			; CHECK-LABEL: @square(
	; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[K:%.]], 2			; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[K:%.]], 2
	; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[P:%.]], 6234			; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[P:%.]], 6234
	; CHECK-NEXT: [[MUL5:%.]] = mul nsw i32 [[H:%.]], 75			; CHECK-NEXT: [[MUL5:%.]] = mul nsw i32 [[H:%.]], 75
	; CHECK-NEXT: [[DIV9:%.]] = sdiv i32 [[J:%.]], 3452			; CHECK-NEXT: [[DIV9:%.]] = sdiv i32 [[J:%.]], 3452
	; CHECK-NEXT: [[MUL13:%.]] = mul nsw i32 [[W:%.]], 53			; CHECK-NEXT: [[MUL13:%.]] = mul nsw i32 [[W:%.]], 53
	; CHECK-NEXT: [[DIV17:%.]] = sdiv i32 [[X:%.]], 820			; CHECK-NEXT: [[DIV17:%.]] = sdiv i32 [[X:%.]], 820
	; CHECK-NEXT: [[MUL21:%.]] = shl nsw i32 [[U:%.]], 2			; CHECK-NEXT: [[MUL21:%.]] = shl nsw i32 [[U:%.]], 2
	; CHECK-NEXT: [[DOTSCALAR:%.]] = add i32 [[Y:%.]], 1			; CHECK-NEXT: [[OP_RDX:%.*]] = add nsw i32 [[DIV17]], 317426
	; CHECK-NEXT: [[DOTSCALAR1:%.*]] = add i32 [[DOTSCALAR]], [[DIV17]]			; CHECK-NEXT: [[OP_RDX9:%.*]] = add nsw i32 [[DIV]], [[DIV9]]
	; CHECK-NEXT: [[DOTSCALAR2:%.*]] = add i32 [[DOTSCALAR1]], [[MUL5]]			; CHECK-NEXT: [[OP_RDX10:%.*]] = add i32 [[MUL5]], [[MUL13]]
	; CHECK-NEXT: [[DOTSCALAR3:%.*]] = add i32 [[DOTSCALAR2]], [[DIV]]			; CHECK-NEXT: [[OP_RDX11:%.*]] = add i32 [[MUL]], [[MUL21]]
	; CHECK-NEXT: [[DOTSCALAR4:%.*]] = add i32 [[DOTSCALAR3]], [[MUL13]]			; CHECK-NEXT: [[OP_RDX12:%.*]] = add i32 [[OP_RDX]], [[OP_RDX9]]
	; CHECK-NEXT: [[DOTSCALAR5:%.*]] = add i32 [[DOTSCALAR4]], [[MUL]]			; CHECK-NEXT: [[OP_RDX13:%.*]] = add i32 [[OP_RDX10]], [[OP_RDX11]]
	; CHECK-NEXT: [[DOTSCALAR6:%.*]] = add i32 [[DOTSCALAR5]], [[DIV9]]			; CHECK-NEXT: [[OP_RDX14:%.*]] = add i32 [[OP_RDX12]], [[OP_RDX13]]
	; CHECK-NEXT: [[DOTSCALAR7:%.*]] = add i32 [[DOTSCALAR6]], [[MUL21]]			; CHECK-NEXT: [[OP_RDX15:%.]] = add i32 [[OP_RDX14]], [[Y:%.]]
	; CHECK-NEXT: [[DOTSCALAR8:%.*]] = add i32 [[DOTSCALAR7]], 317425			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> <i32 undef, i32 poison, i32 poison, i32 poison>, i32 [[OP_RDX15]], i64 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> <i32 undef, i32 poison, i32 poison, i32 poison>, i32 [[DOTSCALAR8]], i64 0
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[ADD29:%.]] = add <4 x i32> [[TMP2]], [[NUM:%.]]			; CHECK-NEXT: [[ADD29:%.]] = add <4 x i32> [[TMP2]], [[NUM:%.]]
	; CHECK-NEXT: ret <4 x i32> [[ADD29]]			; CHECK-NEXT: ret <4 x i32> [[ADD29]]
	;			;
	%add = add <4 x i32> %num, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %num, <i32 1, i32 1, i32 1, i32 1>
	%div = sdiv i32 %k, 2			%div = sdiv i32 %k, 2
	%splatinsert = insertelement <4 x i32> undef, i32 %div, i32 0			%splatinsert = insertelement <4 x i32> undef, i32 %div, i32 0
	%splat = shufflevector <4 x i32> %splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer			%splat = shufflevector <4 x i32> %splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
	Show All 33 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=arm64-apple-macosx \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=arm64-apple-macosx \| FileCheck %s
				; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=arm64-apple-macosx -slp-optimize-identity-hor-reduction-ops=false \| FileCheck %s --check-prefix=NO-IDENTITY

	define i8 @test() {			define i8 @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ [[TMP1:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ [[TMP0:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[CALL278:%.*]] = call i32 @fn(i32 [[SUM]])			; CHECK-NEXT: [[CALL278:%.*]] = call i32 @fn(i32 [[SUM]])
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[CALL278]], i32 0			; CHECK-NEXT: [[TMP0]] = mul i32 [[CALL278]], 8
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[SHUFFLE]])
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	;			;
				; NO-IDENTITY-LABEL: @test(
				; NO-IDENTITY-NEXT: entry:
				; NO-IDENTITY-NEXT: br label [[FOR_BODY:%.*]]
				; NO-IDENTITY: for.body:
				; NO-IDENTITY-NEXT: [[SUM:%.]] = phi i32 [ [[TMP2:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
				; NO-IDENTITY-NEXT: [[CALL278:%.*]] = call i32 @fn(i32 [[SUM]])
				; NO-IDENTITY-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[CALL278]], i32 0
				; NO-IDENTITY-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
				; NO-IDENTITY-NEXT: [[TMP2]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP1]])
				; NO-IDENTITY-NEXT: br label [[FOR_BODY]]
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%sum = phi i32 [ %add285.19, %for.body ], [ 0, %entry ]			%sum = phi i32 [ %add285.19, %for.body ], [ 0, %entry ]
	%call278 = call i32 @fn(i32 %sum)			%call278 = call i32 @fn(i32 %sum)
	%add285.13 = add i32 %call278, %call278			%add285.13 = add i32 %call278, %call278
	%add285.14 = add i32 %add285.13, %call278			%add285.14 = add i32 %add285.13, %call278
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; CHECK-NEXT: [[SHUFFLE8:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[SHUFFLE7:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP11:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP2]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE7]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP4]])
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE8]])			; CHECK-NEXT: [[OP_RDX:%.]] = and i32 [[TMP0:%.]], [[TMP5]]
	; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP0]]			; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP0]]			; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]			; CHECK-NEXT: [[TMP11]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX4]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 0
	; CHECK-NEXT: [[SHUFFLE6:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP11:%.*]] = and <2 x i32> [[TMP9]], [[SHUFFLE6]]
	; CHECK-NEXT: [[TMP12:%.*]] = add <2 x i32> [[TMP9]], [[SHUFFLE6]]
	; CHECK-NEXT: [[TMP13]] = shufflevector <2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; FORCE_REDUCTION-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; FORCE_REDUCTION-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP2]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP2]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE6]])			; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP4]])
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE7]])			; FORCE_REDUCTION-NEXT: [[OP_RDX:%.]] = and i32 [[TMP0:%.]], [[TMP5]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]			; FORCE_REDUCTION-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP3]]
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])			; FORCE_REDUCTION-NEXT: [[VAL_43:%.*]] = add i32 [[TMP3]], 14910
	; FORCE_REDUCTION-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX1]], i32 0
	; FORCE_REDUCTION-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[VAL_43]], i32 1
	; FORCE_REDUCTION-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX5:%.*]] = and i32 [[OP_RDX4]], [[TMP4]]
	; FORCE_REDUCTION-NEXT: [[VAL_43:%.*]] = add i32 [[TMP4]], 14910
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX5]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10]] = insertelement <2 x i32> [[TMP9]], i32 [[VAL_43]], i32 1
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]			%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]
	%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]			%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/buildvector_splat_extractvalue.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=x86_64-unknown-linux-gnu --passes=slp-vectorizer -S -o - %s \| FileCheck %s			; RUN: opt -mtriple=x86_64-unknown-linux-gnu --passes=slp-vectorizer -S -o - %s \| FileCheck %s

	define float @test() {			define float @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[DOTOBIT1683:%.*]] = extractvalue { i64, i1 } zeroinitializer, 1			; CHECK-NEXT: [[DOTOBIT1683:%.*]] = extractvalue { i64, i1 } zeroinitializer, 1
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i1> poison, i1 [[DOTOBIT1683]], i32 0			; CHECK-NEXT: [[OP_RDX:%.*]] = or i1 false, [[DOTOBIT1683]]
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: br i1 [[OP_RDX]], label [[EXIT1:%.]], label [[EXIT2:%.]]
	; CHECK-NEXT: [[TMP2:%.*]] = call i1 @llvm.vector.reduce.or.v16i1(<16 x i1> zeroinitializer)
	; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> [[TMP1]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = or i1 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = or i1 [[OP_RDX]], false
	; CHECK-NEXT: [[OP_RDX2:%.*]] = or i1 [[OP_RDX1]], false
	; CHECK-NEXT: br i1 [[OP_RDX2]], label [[EXIT1:%.]], label [[EXIT2:%.]]
	; CHECK: exit2:			; CHECK: exit2:
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	; CHECK: exit1:			; CHECK: exit1:
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	entry:			entry:
	%.obit1683 = extractvalue { i64, i1 } zeroinitializer, 1			%.obit1683 = extractvalue { i64, i1 } zeroinitializer, 1
	%state907 = or i1 %.obit1683, %.obit1683			%state907 = or i1 %.obit1683, %.obit1683
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s

	define i32 @crash_reordering_undefs() {			define i32 @crash_reordering_undefs() {
	; CHECK-LABEL: @crash_reordering_undefs(			; CHECK-LABEL: @crash_reordering_undefs(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[OR0:%.*]] = or i64 undef, undef			; CHECK-NEXT: [[OR0:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i64 undef, [[OR0]]			; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i64 undef, [[OR0]]
	; CHECK-NEXT: [[ADD0:%.*]] = select i1 [[CMP0]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD0:%.*]] = select i1 [[CMP0]], i32 65536, i32 65537
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i64 undef, undef
	; CHECK-NEXT: [[ADD2:%.*]] = select i1 [[CMP1]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD2:%.*]] = select i1 [[CMP1]], i32 65536, i32 65537
	; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i64 undef, undef
	; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[CMP2]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[CMP2]], i32 65536, i32 65537
	; CHECK-NEXT: [[OR1:%.*]] = or i64 undef, undef			; CHECK-NEXT: [[OR1:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i64 undef, [[OR1]]			; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i64 undef, [[OR1]]
	; CHECK-NEXT: [[ADD9:%.*]] = select i1 [[CMP3]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD9:%.*]] = select i1 [[CMP3]], i32 65536, i32 65537
	; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 undef, [[ADD0]]
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP0]], undef			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[ADD2]], [[ADD4]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[ADD0]], [[ADD2]]			; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[OP_RDX1]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[ADD4]], [[ADD9]]			; CHECK-NEXT: [[OP_RDX3:%.*]] = add i32 [[OP_RDX2]], [[ADD9]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = add i32 [[OP_RDX]], [[OP_RDX1]]			; CHECK-NEXT: ret i32 [[OP_RDX3]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = add i32 [[OP_RDX3]], [[OP_RDX2]]
	; CHECK-NEXT: ret i32 [[OP_RDX4]]
	;			;
	entry:			entry:
	%or0 = or i64 undef, undef			%or0 = or i64 undef, undef
	%cmp0 = icmp eq i64 undef, %or0			%cmp0 = icmp eq i64 undef, %or0
	%add0 = select i1 %cmp0, i32 65536, i32 65537			%add0 = select i1 %cmp0, i32 65536, i32 65537
	%add1 = add i32 undef, %add0			%add1 = add i32 undef, %add0
	%cmp1 = icmp eq i64 undef, undef			%cmp1 = icmp eq i64 undef, undef
	%add2 = select i1 %cmp1, i32 65536, i32 65537			%add2 = select i1 %cmp1, i32 65536, i32 65537
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s

	define i64 @foo(i32 %tmp7) {			define i64 @foo(i32 %tmp7) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2
	; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP24:%.*]] = sub i32 undef, 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 undef, i32 6			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 poison, i32 poison, i32 undef, i32 0>, i32 [[TMP24]], i32 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 0, i32 5
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP24]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = add <8 x i32> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = xor <8 x i32> [[TMP7]], zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = add nsw <8 x i32> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP9:%.*]] = add <8 x i32> zeroinitializer, [[TMP8]]
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP9]], [[TMP10]]			; CHECK-NEXT: [[TMP10:%.*]] = xor <8 x i32> [[TMP9]], zeroinitializer
				; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP10]])
				; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP11]], 0
	; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64			; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64
	; CHECK-NEXT: ret i64 [[TMP64]]			; CHECK-NEXT: ret i64 [[TMP64]]
	;			;
	bb:			bb:
	%tmp = sub i32 0, 0			%tmp = sub i32 0, 0
	%tmp2 = sub nsw i32 0, %tmp			%tmp2 = sub nsw i32 0, %tmp
	%tmp3 = add i32 0, %tmp2			%tmp3 = add i32 0, %tmp2
	%tmp4 = xor i32 %tmp3, 0			%tmp4 = xor i32 %tmp3, 0
	Show All 40 Lines

llvm/test/Transforms/SLPVectorizer/X86/float-min-max.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S %s \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"			target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
	target triple = "i386-unknown-linux-gnu"			target triple = "i386-unknown-linux-gnu"

	; Make sure we do not crash while computing the cost for @test.			; Make sure we do not crash while computing the cost for @test.
	define i1 @test(ptr %p1, ptr %p2, ptr %p3, i1 %c) #0 {			define i1 @test(ptr %p1, ptr %p2, ptr %p3, i1 %c) #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[L0:%.]] = icmp ult ptr [[P2:%.]], [[P1:%.*]]			; CHECK-NEXT: [[L0:%.]] = icmp ult ptr [[P2:%.]], [[P1:%.*]]
	; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[L0]], ptr [[P2]], ptr [[P1]]			; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[L0]], ptr [[P2]], ptr [[P1]]
	; CHECK-NEXT: [[SCEVGEP31:%.*]] = getelementptr float, ptr [[P1]], i32 1			; CHECK-NEXT: [[SCEVGEP31:%.*]] = getelementptr float, ptr [[P1]], i32 1
	; CHECK-NEXT: [[L1:%.*]] = icmp ult ptr [[SCEVGEP31]], [[P2]]			; CHECK-NEXT: [[L1:%.*]] = icmp ult ptr [[SCEVGEP31]], [[P2]]
	; CHECK-NEXT: [[UMIN33:%.*]] = select i1 [[L1]], ptr [[SCEVGEP31]], ptr [[P2]]			; CHECK-NEXT: [[UMIN33:%.*]] = select i1 [[L1]], ptr [[SCEVGEP31]], ptr [[P2]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt ptr [[P3:%.]], [[UMIN]]			; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt ptr [[P3:%.]], [[UMIN]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.]] = and i1 [[BOUND0]], [[C:%.]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.]] = and i1 [[BOUND0]], [[C:%.]]
	; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[C]]
	; CHECK-NEXT: [[BOUND042:%.*]] = icmp ugt ptr [[P3]], [[UMIN33]]			; CHECK-NEXT: [[BOUND042:%.*]] = icmp ugt ptr [[P3]], [[UMIN33]]
	; CHECK-NEXT: [[FOUND_CONFLICT44:%.*]] = and i1 [[BOUND042]], [[C]]			; CHECK-NEXT: [[FOUND_CONFLICT44:%.*]] = and i1 [[BOUND042]], [[C]]
	; CHECK-NEXT: [[CONFLICT_RDX45:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT44]]			; CHECK-NEXT: [[OP_RDX:%.*]] = or i1 [[C]], [[FOUND_CONFLICT]]
	; CHECK-NEXT: [[CONFLICT_RDX49:%.*]] = or i1 [[CONFLICT_RDX45]], [[C]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = or i1 [[OP_RDX]], [[FOUND_CONFLICT44]]
	; CHECK-NEXT: ret i1 [[CONFLICT_RDX49]]			; CHECK-NEXT: ret i1 [[OP_RDX1]]
	;			;
	%l0 = icmp ult ptr %p2, %p1			%l0 = icmp ult ptr %p2, %p1
	%umin = select i1 %l0, ptr %p2, ptr %p1			%umin = select i1 %l0, ptr %p2, ptr %p1
	%scevgep31 = getelementptr float, ptr %p1, i32 1			%scevgep31 = getelementptr float, ptr %p1, i32 1
	%l1 = icmp ult ptr %scevgep31, %p2			%l1 = icmp ult ptr %scevgep31, %p2
	%umin33 = select i1 %l1, ptr %scevgep31, ptr %p2			%umin33 = select i1 %l1, ptr %scevgep31, ptr %p2
	%bound0 = icmp ugt ptr %p3, %umin			%bound0 = icmp ugt ptr %p3, %umin
	%found.conflict = and i1 %bound0, %c			%found.conflict = and i1 %bound0, %c
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/gather-extractelements-different-bbs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux -mattr="-avx512pf,+avx512f,+avx512bw" -slp-threshold=-100 -slp-min-tree-size=0 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux -mattr="-avx512pf,+avx512f,+avx512bw" -slp-threshold=-100 -slp-min-tree-size=0 < %s \| FileCheck %s

	define i32 @foo(i32 %a) {			define i32 @foo(i32 %a) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 0, i32 poison>, i32 [[A:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 0, i32 poison>, i32 [[A:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <2 x i32> zeroinitializer, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <2 x i32> zeroinitializer, [[TMP0]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0
	; CHECK-NEXT: br i1 false, label [[BB5:%.]], label [[BB1:%.]]			; CHECK-NEXT: br i1 false, label [[BB5:%.]], label [[BB1:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP3]], i32 1
	; CHECK-NEXT: [[SHUFFLE15:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[TMP6:%.*]] = mul <2 x i32> [[TMP5]], <i32 3, i32 1>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[SHUFFLE15]])			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP6]], i32 0
	; CHECK-NEXT: [[OP_RDX16:%.*]] = add i32 [[TMP6]], 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP6]], i32 1
	; CHECK-NEXT: [[OP_RDX17:%.*]] = add i32 [[OP_RDX16]], 0			; CHECK-NEXT: [[OP_RDX11:%.*]] = add i32 [[TMP7]], [[TMP8]]
				; CHECK-NEXT: [[OP_RDX12:%.*]] = add i32 [[OP_RDX11]], 0
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[P1:%.]] = phi i32 [ [[OP_RDX17]], [[BB1]] ], [ 0, [[BB2:%.]] ]			; CHECK-NEXT: [[P1:%.]] = phi i32 [ [[OP_RDX12]], [[BB1]] ], [ 0, [[BB2:%.]] ]
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[SHUFFLE10:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE8:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE10]]			; CHECK-NEXT: [[TMP10:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE8]]
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP10]])
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])			; CHECK-NEXT: [[OP_RDX9:%.*]] = add i32 [[TMP11]], 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[OP_RDX10:%.*]] = add i32 [[OP_RDX9]], [[TMP2]]
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP9]], i32 1			; CHECK-NEXT: ret i32 [[OP_RDX10]]
	; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i32> [[TMP12]], zeroinitializer
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
	; CHECK-NEXT: [[OP_RDX13:%.*]] = add i32 [[TMP14]], [[TMP15]]
	; CHECK-NEXT: [[OP_RDX14:%.*]] = add i32 [[OP_RDX13]], [[TMP2]]
	; CHECK-NEXT: ret i32 [[OP_RDX14]]
	; CHECK: bb5:			; CHECK: bb5:
	; CHECK-NEXT: br label [[BB4:%.*]]			; CHECK-NEXT: br label [[BB4:%.*]]
	;			;
	entry:			entry:
	%0 = sub nsw i32 0, %a			%0 = sub nsw i32 0, %a
	%local = sub nsw i32 0, 0			%local = sub nsw i32 0, 0
	br i1 false, label %bb5, label %bb1			br i1 false, label %bb5, label %bb1

	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

Show All 10 Lines
; CHECK-LABEL: @baz(		; CHECK-LABEL: @baz(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr @n, align 4		; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3		; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr @arr, align 16		; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr @arr, align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, ptr @arr1, align 16		; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, ptr @arr1, align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP3]])
; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[SHUFFLE]])		; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], 2.000000e+00
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[CONV]]		; CHECK-NEXT: [[TMP6:%.*]] = fmul fast float [[CONV]], 2.000000e+00
; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[CONV]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP5]], [[TMP6]]
; CHECK-NEXT: store float [[OP_RDX1]], ptr @res, align 4		; CHECK-NEXT: store float [[OP_RDX]], ptr @res, align 4
; CHECK-NEXT: ret float [[OP_RDX1]]		; CHECK-NEXT: ret float [[OP_RDX]]
;		;
; THRESHOLD-LABEL: @baz(		; THRESHOLD-LABEL: @baz(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[TMP0:%.*]] = load i32, ptr @n, align 4		; THRESHOLD-NEXT: [[TMP0:%.*]] = load i32, ptr @n, align 4
; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3		; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr @arr, align 16		; THRESHOLD-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr @arr, align 16
; THRESHOLD-NEXT: [[TMP2:%.*]] = load <4 x float>, ptr @arr1, align 16		; THRESHOLD-NEXT: [[TMP2:%.*]] = load <4 x float>, ptr @arr1, align 16
; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]		; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>		; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP3]])
; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[SHUFFLE]])		; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i32 0
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[CONV]]		; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[CONV]], i32 1
; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[CONV]]		; THRESHOLD-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP6]], <float 2.000000e+00, float 2.000000e+00>
; THRESHOLD-NEXT: store float [[OP_RDX1]], ptr @res, align 4		; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
; THRESHOLD-NEXT: ret float [[OP_RDX1]]		; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
		; THRESHOLD-NEXT: store float [[OP_RDX]], ptr @res, align 4
		; THRESHOLD-NEXT: ret float [[OP_RDX]]
;		;
entry:		entry:
%0 = load i32, ptr @n, align 4		%0 = load i32, ptr @n, align 4
%mul = mul nsw i32 %0, 3		%mul = mul nsw i32 %0, 3
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%1 = load float, ptr @arr, align 16		%1 = load float, ptr @arr, align 16
%2 = load float, ptr @arr1, align 16		%2 = load float, ptr @arr1, align 16
%mul4 = fmul fast float %2, %1		%mul4 = fmul fast float %2, %1
▲ Show 20 Lines • Show All 262 Lines • ▼ Show 20 Lines	entry:
%max.0.mul3.2 = select i1 %cmp4.2, float %max.0.mul3.1, float %mul3.2		%max.0.mul3.2 = select i1 %cmp4.2, float %max.0.mul3.1, float %mul3.2
store float %max.0.mul3.2, ptr @res, align 4		store float %max.0.mul3.2, ptr @res, align 4
ret float %max.0.mul3.2		ret float %max.0.mul3.2
}		}

define float @f(ptr nocapture readonly %x) {		define float @f(ptr nocapture readonly %x) {
; CHECK-LABEL: @f(		; CHECK-LABEL: @f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, ptr [[X:%.]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load <32 x float>, ptr [[X:%.]], align 4
; CHECK-NEXT: [[ARRAYIDX_32:%.*]] = getelementptr inbounds float, ptr [[X]], i64 32		; CHECK-NEXT: [[ARRAYIDX_32:%.*]] = getelementptr inbounds float, ptr [[X]], i64 32
; CHECK-NEXT: [[TMP3:%.*]] = load <16 x float>, ptr [[ARRAYIDX_32]], align 4		; CHECK-NEXT: [[TMP1:%.*]] = load <16 x float>, ptr [[ARRAYIDX_32]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])		; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP0]])
; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP3]])		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[TMP3]]
; CHECK-NEXT: ret float [[OP_RDX]]		; CHECK-NEXT: ret float [[OP_RDX]]
;		;
; THRESHOLD-LABEL: @f(		; THRESHOLD-LABEL: @f(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, ptr [[X:%.]], align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load <32 x float>, ptr [[X:%.]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_32:%.*]] = getelementptr inbounds float, ptr [[X]], i64 32		; THRESHOLD-NEXT: [[ARRAYIDX_32:%.*]] = getelementptr inbounds float, ptr [[X]], i64 32
; THRESHOLD-NEXT: [[TMP3:%.*]] = load <16 x float>, ptr [[ARRAYIDX_32]], align 4		; THRESHOLD-NEXT: [[TMP1:%.*]] = load <16 x float>, ptr [[ARRAYIDX_32]], align 4
; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP0]])
; THRESHOLD-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP3]])		; THRESHOLD-NEXT: [[TMP3:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[TMP3]]
; THRESHOLD-NEXT: ret float [[OP_RDX]]		; THRESHOLD-NEXT: ret float [[OP_RDX]]
;		;
entry:		entry:
%0 = load float, ptr %x, align 4		%0 = load float, ptr %x, align 4
%arrayidx.1 = getelementptr inbounds float, ptr %x, i64 1		%arrayidx.1 = getelementptr inbounds float, ptr %x, i64 1
%1 = load float, ptr %arrayidx.1, align 4		%1 = load float, ptr %arrayidx.1, align 4
%add.1 = fadd fast float %1, %0		%add.1 = fadd fast float %1, %0
%arrayidx.2 = getelementptr inbounds float, ptr %x, i64 2		%arrayidx.2 = getelementptr inbounds float, ptr %x, i64 2
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	;
ret float %add.47		ret float %add.47
}		}

define float @f1(ptr nocapture readonly %x, i32 %a, i32 %b) {		define float @f1(ptr nocapture readonly %x, i32 %a, i32 %b) {
; CHECK-LABEL: @f1(		; CHECK-LABEL: @f1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float
; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, ptr [[X:%.]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load <32 x float>, ptr [[X:%.]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP0]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP1]], [[CONV]]
; CHECK-NEXT: ret float [[OP_RDX]]		; CHECK-NEXT: ret float [[OP_RDX]]
;		;
; THRESHOLD-LABEL: @f1(		; THRESHOLD-LABEL: @f1(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float
; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, ptr [[X:%.]], align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load <32 x float>, ptr [[X:%.]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP0]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP1]], [[CONV]]
; THRESHOLD-NEXT: ret float [[OP_RDX]]		; THRESHOLD-NEXT: ret float [[OP_RDX]]
;		;
entry:		entry:
%rem = srem i32 %a, %b		%rem = srem i32 %a, %b
%conv = sitofp i32 %rem to float		%conv = sitofp i32 %rem to float
%0 = load float, ptr %x, align 4		%0 = load float, ptr %x, align 4
%add = fadd fast float %0, %conv		%add = fadd fast float %0, %conv
%arrayidx.1 = getelementptr inbounds float, ptr %x, i64 1		%arrayidx.1 = getelementptr inbounds float, ptr %x, i64 1
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	;
%add.31 = fadd fast float %31, %add.30		%add.31 = fadd fast float %31, %add.30
ret float %add.31		ret float %add.31
}		}

define float @loadadd31(ptr nocapture readonly %x) {		define float @loadadd31(ptr nocapture readonly %x) {
; CHECK-LABEL: @loadadd31(		; CHECK-LABEL: @loadadd31(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[X:%.]], i64 1		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[X:%.]], i64 1
; CHECK-NEXT: [[TMP1:%.*]] = load <16 x float>, ptr [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP0:%.*]] = load <16 x float>, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX_16:%.*]] = getelementptr inbounds float, ptr [[X]], i64 17		; CHECK-NEXT: [[ARRAYIDX_16:%.*]] = getelementptr inbounds float, ptr [[X]], i64 17
; CHECK-NEXT: [[TMP3:%.*]] = load <8 x float>, ptr [[ARRAYIDX_16]], align 4		; CHECK-NEXT: [[TMP1:%.*]] = load <8 x float>, ptr [[ARRAYIDX_16]], align 4
; CHECK-NEXT: [[ARRAYIDX_24:%.*]] = getelementptr inbounds float, ptr [[X]], i64 25		; CHECK-NEXT: [[ARRAYIDX_24:%.*]] = getelementptr inbounds float, ptr [[X]], i64 25
; CHECK-NEXT: [[TMP5:%.*]] = load <4 x float>, ptr [[ARRAYIDX_24]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, ptr [[ARRAYIDX_24]], align 4
; CHECK-NEXT: [[ARRAYIDX_28:%.*]] = getelementptr inbounds float, ptr [[X]], i64 29		; CHECK-NEXT: [[ARRAYIDX_28:%.*]] = getelementptr inbounds float, ptr [[X]], i64 29
; CHECK-NEXT: [[TMP6:%.*]] = load float, ptr [[ARRAYIDX_28]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = load float, ptr [[ARRAYIDX_28]], align 4
; CHECK-NEXT: [[ARRAYIDX_29:%.*]] = getelementptr inbounds float, ptr [[X]], i64 30		; CHECK-NEXT: [[ARRAYIDX_29:%.*]] = getelementptr inbounds float, ptr [[X]], i64 30
; CHECK-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX_29]], align 4		; CHECK-NEXT: [[TMP4:%.*]] = load float, ptr [[ARRAYIDX_29]], align 4
; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])		; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP0]])
; CHECK-NEXT: [[TMP9:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])		; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP5]], [[TMP6]]
; CHECK-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP5]])		; CHECK-NEXT: [[TMP7:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP2]])
; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP7]]
; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], [[TMP6]]		; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], [[TMP3]]
; CHECK-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], [[TMP7]]		; CHECK-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], [[TMP4]]
; CHECK-NEXT: ret float [[OP_RDX3]]		; CHECK-NEXT: ret float [[OP_RDX3]]
;		;
; THRESHOLD-LABEL: @loadadd31(		; THRESHOLD-LABEL: @loadadd31(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[X:%.]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[X:%.]], i64 1
; THRESHOLD-NEXT: [[TMP1:%.*]] = load <16 x float>, ptr [[ARRAYIDX]], align 4		; THRESHOLD-NEXT: [[TMP0:%.*]] = load <16 x float>, ptr [[ARRAYIDX]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_16:%.*]] = getelementptr inbounds float, ptr [[X]], i64 17		; THRESHOLD-NEXT: [[ARRAYIDX_16:%.*]] = getelementptr inbounds float, ptr [[X]], i64 17
; THRESHOLD-NEXT: [[TMP3:%.*]] = load <8 x float>, ptr [[ARRAYIDX_16]], align 4		; THRESHOLD-NEXT: [[TMP1:%.*]] = load <8 x float>, ptr [[ARRAYIDX_16]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_24:%.*]] = getelementptr inbounds float, ptr [[X]], i64 25		; THRESHOLD-NEXT: [[ARRAYIDX_24:%.*]] = getelementptr inbounds float, ptr [[X]], i64 25
; THRESHOLD-NEXT: [[TMP5:%.*]] = load <4 x float>, ptr [[ARRAYIDX_24]], align 4		; THRESHOLD-NEXT: [[TMP2:%.*]] = load <4 x float>, ptr [[ARRAYIDX_24]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_28:%.*]] = getelementptr inbounds float, ptr [[X]], i64 29		; THRESHOLD-NEXT: [[ARRAYIDX_28:%.*]] = getelementptr inbounds float, ptr [[X]], i64 29
; THRESHOLD-NEXT: [[TMP6:%.*]] = load float, ptr [[ARRAYIDX_28]], align 4		; THRESHOLD-NEXT: [[TMP3:%.*]] = load float, ptr [[ARRAYIDX_28]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_29:%.*]] = getelementptr inbounds float, ptr [[X]], i64 30		; THRESHOLD-NEXT: [[ARRAYIDX_29:%.*]] = getelementptr inbounds float, ptr [[X]], i64 30
; THRESHOLD-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX_29]], align 4		; THRESHOLD-NEXT: [[TMP4:%.*]] = load float, ptr [[ARRAYIDX_29]], align 4
; THRESHOLD-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP0]])
; THRESHOLD-NEXT: [[TMP9:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])		; THRESHOLD-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP5]], [[TMP6]]
; THRESHOLD-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP5]])		; THRESHOLD-NEXT: [[TMP7:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP2]])
; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP7]]
; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], [[TMP6]]		; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], [[TMP3]]
; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], [[TMP7]]		; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], [[TMP4]]
; THRESHOLD-NEXT: ret float [[OP_RDX3]]		; THRESHOLD-NEXT: ret float [[OP_RDX3]]
;		;
entry:		entry:
%arrayidx = getelementptr inbounds float, ptr %x, i64 1		%arrayidx = getelementptr inbounds float, ptr %x, i64 1
%0 = load float, ptr %arrayidx, align 4		%0 = load float, ptr %arrayidx, align 4
%arrayidx.1 = getelementptr inbounds float, ptr %x, i64 2		%arrayidx.1 = getelementptr inbounds float, ptr %x, i64 2
%1 = load float, ptr %arrayidx.1, align 4		%1 = load float, ptr %arrayidx.1, align 4
%add.1 = fadd fast float %1, %0		%add.1 = fadd fast float %1, %0
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	;
ret float %add.29		ret float %add.29
}		}

define float @extra_args(ptr nocapture readonly %x, i32 %a, i32 %b) {		define float @extra_args(ptr nocapture readonly %x, i32 %a, i32 %b) {
; CHECK-LABEL: @extra_args(		; CHECK-LABEL: @extra_args(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, ptr [[X:%.]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load <8 x float>, ptr [[X:%.]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP0]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[CONV]], 2.000000e+00
; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[CONV]], 3.000000e+00		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX]], [[OP_RDX1]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], 3.000000e+00
; CHECK-NEXT: ret float [[OP_RDX2]]		; CHECK-NEXT: ret float [[OP_RDX1]]
;		;
; THRESHOLD-LABEL: @extra_args(		; THRESHOLD-LABEL: @extra_args(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, ptr [[X:%.]], align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load <8 x float>, ptr [[X:%.]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP0]])
; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> <float poison, float 3.000000e+00>, float [[TMP2]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast float [[CONV]], 2.000000e+00
; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[CONV]], i32 0		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP1]], [[TMP2]]
; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], 3.000000e+00
; THRESHOLD-NEXT: [[TMP5:%.*]] = fadd fast <2 x float> [[TMP3]], [[SHUFFLE]]		; THRESHOLD-NEXT: ret float [[OP_RDX1]]
; THRESHOLD-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP6]], [[TMP7]]
; THRESHOLD-NEXT: ret float [[OP_RDX2]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, ptr %x, align 4		%0 = load float, ptr %x, align 4
%add = fadd fast float %conv, 3.000000e+00		%add = fadd fast float %conv, 3.000000e+00
%add1 = fadd fast float %0, %add		%add1 = fadd fast float %0, %add
%arrayidx3 = getelementptr inbounds float, ptr %x, i64 1		%arrayidx3 = getelementptr inbounds float, ptr %x, i64 1
Show All 21 Lines	;
ret float %add4.6		ret float %add4.6
}		}

define float @extra_args_same_several_times(ptr nocapture readonly %x, i32 %a, i32 %b) {		define float @extra_args_same_several_times(ptr nocapture readonly %x, i32 %a, i32 %b) {
; CHECK-LABEL: @extra_args_same_several_times(		; CHECK-LABEL: @extra_args_same_several_times(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, ptr [[X:%.]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load <8 x float>, ptr [[X:%.]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP0]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], 5.000000e+00		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP1]], 1.300000e+01
; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[CONV]], [[CONV]]		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[CONV]], 2.000000e+00
; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX]], 8.000000e+00		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP2]]
; CHECK-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], [[OP_RDX1]]		; CHECK-NEXT: ret float [[OP_RDX1]]
; CHECK-NEXT: ret float [[OP_RDX3]]
;		;
; THRESHOLD-LABEL: @extra_args_same_several_times(		; THRESHOLD-LABEL: @extra_args_same_several_times(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, ptr [[X:%.]], align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load <8 x float>, ptr [[X:%.]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP0]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], 5.000000e+00		; THRESHOLD-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[OP_RDX]], i32 0		; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[CONV]], i32 1
; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[CONV]], i32 1		; THRESHOLD-NEXT: [[TMP4:%.*]] = fadd fast <2 x float> [[TMP3]], <float 1.300000e+01, float 2.000000e+00>
; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> <float 8.000000e+00, float poison>, float [[CONV]], i32 1		; THRESHOLD-NEXT: [[TMP5:%.*]] = fmul fast <2 x float> [[TMP3]], <float 1.300000e+01, float 2.000000e+00>
; THRESHOLD-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP4]], [[TMP5]]		; THRESHOLD-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0		; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1		; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP7]], [[TMP8]]		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[TMP7]], [[TMP8]]
; THRESHOLD-NEXT: ret float [[OP_RDX3]]		; THRESHOLD-NEXT: ret float [[OP_RDX1]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, ptr %x, align 4		%0 = load float, ptr %x, align 4
%add = fadd fast float %conv, 3.000000e+00		%add = fadd fast float %conv, 3.000000e+00
%add1 = fadd fast float %0, %add		%add1 = fadd fast float %0, %add
%arrayidx3 = getelementptr inbounds float, ptr %x, i64 1		%arrayidx3 = getelementptr inbounds float, ptr %x, i64 1
Show All 24 Lines
}		}

define float @extra_args_no_replace(ptr nocapture readonly %x, i32 %a, i32 %b, i32 %c) {		define float @extra_args_no_replace(ptr nocapture readonly %x, i32 %a, i32 %b, i32 %c) {
; CHECK-LABEL: @extra_args_no_replace(		; CHECK-LABEL: @extra_args_no_replace(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float		; CHECK-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, ptr [[X:%.]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load <8 x float>, ptr [[X:%.]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP0]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[CONV]], 2.000000e+00
; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[CONV]], [[CONVC]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX]], [[OP_RDX1]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[CONVC]]
; CHECK-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], 3.000000e+00		; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], 3.000000e+00
; CHECK-NEXT: ret float [[OP_RDX3]]		; CHECK-NEXT: ret float [[OP_RDX2]]
;		;
; THRESHOLD-LABEL: @extra_args_no_replace(		; THRESHOLD-LABEL: @extra_args_no_replace(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float		; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, ptr [[X:%.]], align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load <8 x float>, ptr [[X:%.]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP0]])
; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[TMP2]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast float [[CONV]], 2.000000e+00
; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[CONVC]], i32 1		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP1]], [[TMP2]]
; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[CONV]], i32 0		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[CONVC]]
; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <2 x i32> zeroinitializer		; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], 3.000000e+00
; THRESHOLD-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP4]], [[SHUFFLE]]		; THRESHOLD-NEXT: ret float [[OP_RDX2]]
; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP7]], [[TMP8]]
; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], 3.000000e+00
; THRESHOLD-NEXT: ret float [[OP_RDX3]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, ptr %x, align 4		%0 = load float, ptr %x, align 4
%convc = sitofp i32 %c to float		%convc = sitofp i32 %c to float
%addc = fadd fast float %convc, 3.000000e+00		%addc = fadd fast float %convc, 3.000000e+00
%add = fadd fast float %conv, %addc		%add = fadd fast float %conv, %addc
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	;
%add5 = fadd fast float %add4.2, %a		%add5 = fadd fast float %add4.2, %a
ret float %add5		ret float %add5
}		}

define i32 @wobble(i32 %arg, i32 %bar) {		define i32 @wobble(i32 %arg, i32 %bar) {
; CHECK-LABEL: @wobble(		; CHECK-LABEL: @wobble(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0		; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0
; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; CHECK-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer		; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <4 x i32> [[TMP4]], zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>		; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i1> [[TMP6]] to <4 x i32>
; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP6]], [[TMP3]]		; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[ARG]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], [[ARG]]
; CHECK-NEXT: ret i32 [[OP_RDX2]]		; CHECK-NEXT: ret i32 [[OP_RDX1]]
;		;
; THRESHOLD-LABEL: @wobble(		; THRESHOLD-LABEL: @wobble(
; THRESHOLD-NEXT: bb:		; THRESHOLD-NEXT: bb:
; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0		; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0
; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer		; THRESHOLD-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
; THRESHOLD-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0
; THRESHOLD-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; THRESHOLD-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
; THRESHOLD-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; THRESHOLD-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[TMP1]], [[TMP3]]
; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3		; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; THRESHOLD-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer		; THRESHOLD-NEXT: [[TMP6:%.*]] = icmp eq <4 x i32> [[TMP4]], zeroinitializer
; THRESHOLD-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>		; THRESHOLD-NEXT: [[TMP7:%.*]] = sext <4 x i1> [[TMP6]] to <4 x i32>
; THRESHOLD-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])		; THRESHOLD-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP6]], [[TMP3]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP8]], [[TMP5]]
; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[ARG]]		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], [[ARG]]
; THRESHOLD-NEXT: ret i32 [[OP_RDX2]]		; THRESHOLD-NEXT: ret i32 [[OP_RDX1]]
;		;
bb:		bb:
%x1 = xor i32 %arg, %bar		%x1 = xor i32 %arg, %bar
%i1 = icmp eq i32 %x1, 0		%i1 = icmp eq i32 %x1, 0
%s1 = sext i1 %i1 to i32		%s1 = sext i1 %i1 to i32
%x2 = xor i32 %arg, %bar		%x2 = xor i32 %arg, %bar
%i2 = icmp eq i32 %x2, 0		%i2 = icmp eq i32 %x2, 0
%s2 = sext i1 %i2 to i32		%s2 = sext i1 %i2 to i32
Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/malformed_phis.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

	define void @test_2(ptr addrspace(1) %arg, i32 %arg1) #0 {			define void @test_2(ptr addrspace(1) %arg, i32 %arg1) #0 {
	; CHECK-LABEL: @test_2(			; CHECK-LABEL: @test_2(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB:%.]] ], [ undef, [[BB2]] ]			; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB:%.]] ], [ undef, [[BB2]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi i32 [ 0, [[BB]] ], [ undef, [[BB2]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi i32 [ 0, [[BB]] ], [ undef, [[BB2]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = mul i32 [[TMP]], 8
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 undef, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> undef)			; CHECK-NEXT: call void @use(i32 [[OP_RDX]])
	; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], undef
	; CHECK-NEXT: call void @use(i32 [[OP_RDX1]])
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	bb:			bb:
	br label %bb2			br label %bb2

	bb2: ; preds = %bb2, %bb			bb2: ; preds = %bb2, %bb
	%tmp = phi i32 [ undef, %bb ], [ undef, %bb2 ]			%tmp = phi i32 [ undef, %bb ], [ undef, %bb2 ]
	%tmp3 = phi i32 [ 0, %bb ], [ undef, %bb2 ]			%tmp3 = phi i32 [ 0, %bb ], [ undef, %bb2 ]
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduced-gathered-vectorized.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define i16 @test() {			define i16 @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 5			; CHECK-NEXT: [[A:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 5
	; CHECK-NEXT: [[A1:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 6			; CHECK-NEXT: [[A1:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 6
	; CHECK-NEXT: [[A2:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 7
	; CHECK-NEXT: [[A3:%.*]] = getelementptr [1000 x i64], ptr null, i64 0, i64 8
	; CHECK-NEXT: br label [[WHILE:%.*]]			; CHECK-NEXT: br label [[WHILE:%.*]]
	; CHECK: while:			; CHECK: while:
	; CHECK-NEXT: [[PH:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OP_RDX12:%.*]], [[WHILE]] ]			; CHECK-NEXT: [[PH:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OP_RDX25:%.*]], [[WHILE]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr null, align 8			; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr null, align 8
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x i64>, ptr [[A2]], align 8			; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr null, align 8
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = load <2 x i64>, ptr [[A]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr null, align 8			; CHECK-NEXT: [[TMP3:%.*]] = load <2 x i64>, ptr [[A1]], align 16
	; CHECK-NEXT: [[TMP4:%.*]] = load <2 x i64>, ptr [[A]], align 8			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 0>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i64> poison, i64 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <16 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i64> [[TMP6]], <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 4>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> poison, <16 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.xor.v4i64(<4 x i64> [[TMP7]])
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <16 x i64> [[TMP6]], <16 x i64> [[TMP8]], <16 x i32> <i32 0, i32 16, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[OP_RDX23:%.*]] = xor i64 0, [[TMP1]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[OP_RDX24:%.*]] = xor i64 [[TMP0]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <16 x i64> [[TMP9]], <16 x i64> [[TMP10]], <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[OP_RDX25]] = xor i64 [[OP_RDX23]], [[OP_RDX24]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <16 x i64> [[TMP11]], <16 x i64> [[TMP8]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 17, i32 17, i32 17, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i64> [[TMP13]], i64 [[TMP0]], i32 9
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i64> [[TMP14]], i64 [[TMP0]], i32 10
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <16 x i64> [[TMP15]], i64 [[TMP0]], i32 11
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <16 x i64> [[TMP16]], <16 x i64> [[TMP17]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
	; CHECK-NEXT: [[TMP19:%.*]] = load i64, ptr [[A1]], align 16
	; CHECK-NEXT: [[TMP20:%.*]] = load i64, ptr [[A2]], align 8
	; CHECK-NEXT: [[TMP21:%.*]] = load i64, ptr [[A3]], align 16
	; CHECK-NEXT: [[TMP22:%.*]] = call i64 @llvm.vector.reduce.xor.v16i64(<16 x i64> [[TMP18]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = xor i64 [[TMP22]], [[TMP3]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = xor i64 [[TMP3]], [[TMP3]]
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i64> [[TMP5]], i32 3
	; CHECK-NEXT: [[OP_RDX2:%.*]] = xor i64 [[TMP3]], [[TMP23]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = xor i64 [[TMP23]], [[TMP19]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = xor i64 [[TMP19]], [[TMP19]]
	; CHECK-NEXT: [[OP_RDX5:%.*]] = xor i64 [[TMP20]], [[TMP20]]
	; CHECK-NEXT: [[OP_RDX6:%.*]] = xor i64 [[TMP21]], [[TMP21]]
	; CHECK-NEXT: [[OP_RDX7:%.*]] = xor i64 [[OP_RDX]], [[OP_RDX1]]
	; CHECK-NEXT: [[OP_RDX8:%.*]] = xor i64 [[OP_RDX2]], [[OP_RDX3]]
	; CHECK-NEXT: [[OP_RDX9:%.*]] = xor i64 [[OP_RDX4]], [[OP_RDX5]]
	; CHECK-NEXT: [[OP_RDX10:%.*]] = xor i64 [[OP_RDX7]], [[OP_RDX8]]
	; CHECK-NEXT: [[OP_RDX11:%.*]] = xor i64 [[OP_RDX9]], [[OP_RDX6]]
	; CHECK-NEXT: [[OP_RDX12]] = xor i64 [[OP_RDX10]], [[OP_RDX11]]
	; CHECK-NEXT: br label [[WHILE]]			; CHECK-NEXT: br label [[WHILE]]
	;			;
	entry:			entry:
	%a = getelementptr [1000 x i64], ptr null, i64 0, i64 5			%a = getelementptr [1000 x i64], ptr null, i64 0, i64 5
	%a1 = getelementptr [1000 x i64], ptr null, i64 0, i64 6			%a1 = getelementptr [1000 x i64], ptr null, i64 0, i64 6
	%a2 = getelementptr [1000 x i64], ptr null, i64 0, i64 7			%a2 = getelementptr [1000 x i64], ptr null, i64 0, i64 7
	%a3 = getelementptr [1000 x i64], ptr null, i64 0, i64 8			%a3 = getelementptr [1000 x i64], ptr null, i64 0, i64 8
	br label %while			br label %while
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-value-in-tree.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux < %s \| FileCheck %s
	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br i1 false, label [[PH:%.]], label [[EXIT:%.]]			; CHECK-NEXT: br i1 false, label [[PH:%.]], label [[EXIT:%.]]
	; CHECK: ph:			; CHECK: ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i8 @llvm.vector.reduce.and.v8i8(<8 x i8> zeroinitializer)			; CHECK-NEXT: [[TMP0:%.*]] = call i8 @llvm.vector.reduce.and.v8i8(<8 x i8> zeroinitializer)
	; CHECK-NEXT: [[TMP1:%.*]] = call i8 @llvm.vector.reduce.and.v4i8(<4 x i8> zeroinitializer)			; CHECK-NEXT: [[OP_RDX2:%.*]] = and i8 0, [[TMP0]]
	; CHECK-NEXT: [[OP_RDX:%.*]] = and i8 [[TMP0]], [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = call i8 @llvm.vector.reduce.and.v8i8(<8 x i8> zeroinitializer)
	; CHECK-NEXT: [[OP_RDX1:%.*]] = and i8 [[OP_RDX]], [[TMP2]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = and i8 [[OP_RDX1]], 0
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[PHI:%.]] = phi i8 [ [[OP_RDX2]], [[PH]] ], [ 0, [[BB:%.]] ]			; CHECK-NEXT: [[PHI:%.]] = phi i8 [ [[OP_RDX2]], [[PH]] ], [ 0, [[BB:%.]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	br i1 false, label %ph, label %exit			br i1 false, label %ph, label %exit

	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

	Show All 9 Lines
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15			; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> <i16 undef, i16 poison>, i16 [[T]], i32 1			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> <i16 undef, i16 poison>, i16 [[T]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i16> [[TMP0]] to <2 x i32>			; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i16> [[TMP0]] to <2 x i32>
	; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <2 x i32> <i32 63, i32 undef>, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <2 x i32> <i32 63, i32 undef>, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP2]], undef
	; CHECK-NEXT: [[SHUFFLE4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[SHUFFLE4]], <i32 15, i32 undef, i32 31, i32 47>			; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[SHUFFLE2]], <i32 15, i32 undef, i32 31, i32 47>
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP5]], i32 undef			; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP5]], i32 undef
	; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63			; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i32> undef, [[TMP1]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i32> undef, [[TMP1]]
	; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP6]], undef			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP6]], undef
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -17, i32 -33, i32 -33, i32 -49>			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -17, i32 -33, i32 -33, i32 -49>
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> undef)			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP8]])
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP8]])			; CHECK-NEXT: [[OP_RDX:%.*]] = icmp slt i32 undef, [[TMP9]]
	; CHECK-NEXT: [[OP_RDX:%.*]] = icmp slt i32 [[TMP9]], [[TMP10]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 undef, i32 [[TMP9]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP9]], i32 [[TMP10]]			; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_RDX1]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = icmp slt i32 [[OP_RDX1]], undef
	; CHECK-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[OP_RDX1]], i32 undef
	; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_RDX3]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	bb:			bb:
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1: ; preds = %bb			bb1: ; preds = %bb
	ret void			ret void

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/revectorized_rdx_crash.ll

	Show All 13 Lines

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_END:%.]], label [[FOR_COND_PREHEADER:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END:%.]], label [[FOR_COND_PREHEADER:%.]]
	; CHECK: for.cond.preheader:			; CHECK: for.cond.preheader:
	; CHECK-NEXT: [[I:%.*]] = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 2			; CHECK-NEXT: [[I:%.*]] = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 2
	; CHECK-NEXT: [[I1:%.*]] = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 3			; CHECK-NEXT: [[I1:%.*]] = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 3
	; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, ptr [[I]], align 8			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, ptr [[I]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP0]])
	; CHECK-NEXT: [[OP_RDX7:%.*]] = add i32 [[TMP2]], undef			; CHECK-NEXT: [[OP_RDX3:%.*]] = add i32 [[TMP1]], undef
	; CHECK-NEXT: [[OP_RDX8:%.*]] = add i32 [[OP_RDX7]], undef			; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i32>, ptr [[I1]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = load <4 x i32>, ptr [[I1]], align 4			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP2]])
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[TMP3]], undef
	; CHECK-NEXT: [[OP_RDX5:%.*]] = add i32 [[TMP5]], undef			; CHECK-NEXT: [[TMP4:%.*]] = mul i32 [[OP_RDX3]], 2
	; CHECK-NEXT: [[OP_RDX6:%.*]] = add i32 [[OP_RDX5]], undef			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 undef, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)			; CHECK-NEXT: [[TMP5:%.*]] = mul i32 [[OP_RDX2]], 2
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP6]], undef			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], [[TMP5]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX8]], [[OP_RDX8]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX6]], [[OP_RDX6]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = add i32 [[OP_RDX]], [[OP_RDX1]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = add i32 [[OP_RDX3]], [[OP_RDX2]]
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[R:%.]] = phi i32 [ [[OP_RDX4]], [[FOR_COND_PREHEADER]] ], [ undef, [[ENTRY:%.]] ]			; CHECK-NEXT: [[R:%.]] = phi i32 [ [[OP_RDX1]], [[FOR_COND_PREHEADER]] ], [ undef, [[ENTRY:%.]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end, label %for.cond.preheader			br i1 undef, label %if.end, label %for.cond.preheader

	for.cond.preheader: ; preds = %entry			for.cond.preheader: ; preds = %entry
	%i = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 2			%i = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 2
	%i1 = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 3			%i1 = getelementptr inbounds [100 x i32], ptr undef, i64 0, i64 3
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/scalarization-overhead.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=x86_64-- -passes=slp-vectorizer -S < %s \| FileCheck %s			; RUN: opt -mtriple=x86_64-- -passes=slp-vectorizer -S < %s \| FileCheck %s

	; Crash Test case reported on D134605			; Crash Test case reported on D134605

	define i16 @D134605() {			define i16 @D134605() {
	; CHECK-LABEL: @D134605(			; CHECK-LABEL: @D134605(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX81:%.*]] = getelementptr inbounds [32 x i16], ptr poison, i16 0, i16 3			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i16>, ptr poison, align 1
	; CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr [[ARRAYIDX81]], align 1			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i16> [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr poison, align 1			; CHECK-NEXT: [[REASS_ADD:%.*]] = add i16 poison, [[TMP1]]
	; CHECK-NEXT: [[ARRAYIDX101:%.*]] = getelementptr inbounds [32 x i16], ptr poison, i16 0, i16 1			; CHECK-NEXT: [[TMP2:%.*]] = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> [[TMP0]])
	; CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr [[ARRAYIDX101]], align 1			; CHECK-NEXT: [[TMP3:%.*]] = mul i16 [[TMP2]], 2
	; CHECK-NEXT: [[ARRAYIDX107:%.*]] = getelementptr inbounds [32 x i16], ptr poison, i16 0, i16 2			; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP3]], poison
	; CHECK-NEXT: [[TMP3:%.*]] = load i16, ptr [[ARRAYIDX107]], align 1			; CHECK-NEXT: [[REASS_MUL24:%.*]] = shl i16 [[OP_RDX]], 2
	; CHECK-NEXT: [[REASS_ADD:%.*]] = add i16 poison, [[TMP0]]
	; CHECK-NEXT: [[ADD116:%.*]] = add i16 [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[ADD122:%.*]] = add i16 [[ADD116]], [[TMP2]]
	; CHECK-NEXT: [[ADD124:%.*]] = add i16 [[ADD122]], [[TMP3]]
	; CHECK-NEXT: [[ADD125:%.*]] = add i16 [[ADD124]], poison
	; CHECK-NEXT: [[FACTOR2531:%.*]] = add i16 [[TMP3]], [[ADD125]]
	; CHECK-NEXT: [[ADD14332:%.*]] = add i16 [[FACTOR2531]], [[TMP2]]
	; CHECK-NEXT: [[ADD14933:%.*]] = add i16 [[ADD14332]], [[TMP1]]
	; CHECK-NEXT: [[ADD15534:%.*]] = add i16 [[ADD14933]], [[TMP0]]
	; CHECK-NEXT: [[ADD15935:%.*]] = add i16 [[ADD15534]], poison
	; CHECK-NEXT: [[REASS_MUL24:%.*]] = shl i16 [[ADD15935]], 2
	; CHECK-NEXT: [[CALL:%.*]] = call i16 @check_i16(i16 noundef 1, i16 noundef [[REASS_MUL24]], i16 noundef 5120)			; CHECK-NEXT: [[CALL:%.*]] = call i16 @check_i16(i16 noundef 1, i16 noundef [[REASS_MUL24]], i16 noundef 5120)
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%arrayidx81 = getelementptr inbounds [32 x i16], ptr poison, i16 0, i16 3			%arrayidx81 = getelementptr inbounds [32 x i16], ptr poison, i16 0, i16 3
	%0 = load i16, ptr %arrayidx81, align 1			%0 = load i16, ptr %arrayidx81, align 1
	%1 = load i16, ptr poison, align 1			%1 = load i16, ptr poison, align 1
	%arrayidx101 = getelementptr inbounds [32 x i16], ptr poison, i16 0, i16 1			%arrayidx101 = getelementptr inbounds [32 x i16], ptr poison, i16 0, i16 1
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/slp-schedule-use-order.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i1> [ [[TMP8:%.]], [[TMP1:%.]] ], [ zeroinitializer, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i1> [ [[TMP6:%.]], [[TMP1:%.]] ], [ zeroinitializer, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[TMP1]]			; CHECK-NEXT: br label [[TMP1]]
	; CHECK: 1:			; CHECK: 1:
	; CHECK-NEXT: [[TMP2:%.*]] = zext <2 x i1> [[TMP0]] to <2 x i8>			; CHECK-NEXT: [[TMP2:%.*]] = zext <2 x i1> [[TMP0]] to <2 x i8>
	; CHECK-NEXT: [[TMP3:%.*]] = or i8 0, 0			; CHECK-NEXT: [[TMP3:%.*]] = and <2 x i8> zeroinitializer, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i8> <i8 poison, i8 0>, i8 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = or <2 x i8> [[TMP3]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = and <2 x i8> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <2 x i8> [[TMP4]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = or <2 x i8> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP6]] = and <2 x i1> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = icmp ne <2 x i8> [[TMP6]], zeroinitializer
	; CHECK-NEXT: [[TMP8]] = and <2 x i1> [[TMP7]], zeroinitializer
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%phi1 = phi i1 [ %icmp2, %0 ], [ false, %entry ]			%phi1 = phi i1 [ %icmp2, %0 ], [ false, %entry ]
	%phi2 = phi i1 [ %icmp1, %0 ], [ false, %entry ]			%phi2 = phi i1 [ %icmp1, %0 ], [ false, %entry ]
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/undef_vect.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s

	%"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" = type { i32, i32 }			%"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" = type { i32, i32 }

	define void @_Z2azv() local_unnamed_addr {			define void @_Z2azv() local_unnamed_addr {
	; CHECK-LABEL: @_Z2azv(			; CHECK-LABEL: @_Z2azv(
	; CHECK-NEXT: for.body.lr.ph:			; CHECK-NEXT: for.body.lr.ph:
	; CHECK-NEXT: [[DOTSROA_CAST_4:%.*]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", ptr undef, i64 4, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_4:%.*]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", ptr undef, i64 4, i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = load <8 x i32>, ptr [[DOTSROA_CAST_4]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <8 x i32>, ptr [[DOTSROA_CAST_4]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP1]])			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP2]], undef			; CHECK-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP2]], undef
	; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP2]], i32 undef			; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP2]], i32 undef
	; CHECK-NEXT: [[OP_RDX2:%.*]] = icmp sgt i32 [[OP_RDX1]], undef			; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_RDX1]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[OP_RDX1]], i32 undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_RDX3]]
	; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef			; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	for.body.lr.ph:			for.body.lr.ph:
	%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", ptr undef, i64 4, i32 0			%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", ptr undef, i64 4, i32 0
	%retval.sroa.0.0.copyload.i5.4 = load i32, ptr %.sroa_cast.4, align 4			%retval.sroa.0.0.copyload.i5.4 = load i32, ptr %.sroa_cast.4, align 4
	%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", ptr undef, i64 4, i32 1			%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", ptr undef, i64 4, i32 1
	%retval.sroa.0.0.copyload.i7.4 = load i32, ptr %.sroa_raw_idx.4, align 4			%retval.sroa.0.0.copyload.i7.4 = load i32, ptr %.sroa_raw_idx.4, align 4
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Do not reduce repeated values, use scalar red ops instead.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 498204

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/PhaseOrdering/X86/scalarization-inseltpoison.ll

llvm/test/Transforms/PhaseOrdering/X86/scalarization.ll

llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/buildvector_splat_extractvalue.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

llvm/test/Transforms/SLPVectorizer/X86/float-min-max.ll

llvm/test/Transforms/SLPVectorizer/X86/gather-extractelements-different-bbs.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/test/Transforms/SLPVectorizer/X86/malformed_phis.ll

llvm/test/Transforms/SLPVectorizer/X86/reduced-gathered-vectorized.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-value-in-tree.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

llvm/test/Transforms/SLPVectorizer/X86/revectorized_rdx_crash.ll

llvm/test/Transforms/SLPVectorizer/X86/scalarization-overhead.ll

llvm/test/Transforms/SLPVectorizer/X86/slp-schedule-use-order.ll

llvm/test/Transforms/SLPVectorizer/X86/undef_vect.ll

[SLP]Do not reduce repeated values, use scalar red ops instead.
ClosedPublic