This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
LoopAccessAnalysis.h
-
Transforms/Vectorize/
-
Vectorize/
-
SLPVectorizer.h
-
lib/
-
Analysis/
-
LoopAccessAnalysis.cpp
-
Transforms/
-
Utils/
-
LoopUtils.cpp
-
Vectorize/
23/60
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
3/5
PR38339.ll
-
accelerate-vector-functions-inseltpoison.ll
-
accelerate-vector-functions.ll
-
ext-trunc.ll
-
gather-root.ll
-
horizontal.ll
-
insertelement-inseltpoison.ll
-
insertelement.ll
-
transpose-inseltpoison.ll
-
transpose.ll
-
trunc-insertion.ll
-
AMDGPU/
-
add_sub_sat-inseltpoison.ll
-
add_sub_sat.ll
-
SystemZ/
-
pr34619.ll
-
X86/
-
PR39774.ll
-
alternate-calls-inseltpoison.ll
-
alternate-calls.ll
-
alternate-cast-inseltpoison.ll
-
alternate-cast.ll
-
alternate-fp-inseltpoison.ll
-
alternate-fp.ll
-
alternate-int-inseltpoison.ll
-
alternate-int.ll
-
blending-shuffle-inseltpoison.ll
-
blending-shuffle.ll
-
cmp_commute-inseltpoison.ll
-
cmp_commute.ll
-
commutativity.ll
-
compare-reduce.ll
-
crash_cmpop.ll
-
crash_exceed_scheduling.ll
-
crash_lencod.ll
-
crash_mandeltext.ll
-
crash_reordering_undefs.ll
-
crash_smallpt.ll
-
crash_vectorizeTree.ll
1/2
cse.ll
-
extract.ll
-
extractelement.ll
-
fptosi-inseltpoison.ll
-
fptosi.ll
-
fptoui.ll
-
geps-non-pow-2.ll
-
hoist.ll
-
horizontal-minmax.ll
-
insert-element-build-vector-inseltpoison.ll
1/2
insert-element-build-vector.ll
-
jumbled-load-multiuse.ll
-
jumbled-load-used-in-phi.ll
-
load-merge-inseltpoison.ll
-
load-merge.ll
-
lookahead.ll
-
minimum-sizes.ll
-
no_alternate_divrem.ll
-
operandorder.ll
-
partail.ll
-
phi.ll
-
phi3.ll
-
phi_landingpad.ll
-
pr35497.ll
-
pr42022-inseltpoison.ll
-
pr42022.ll
2/4
pr47623.ll
-
pr47629-inseltpoison.ll
-
pr47629.ll
-
pr49081.ll
-
reduction2.ll
-
reorder_repeated_ops.ll
-
resched.ll
-
rgb_phi.ll
-
schedule-bundle.ll
-
shrink_after_reorder.ll
-
supernode.ll
-
used-reduced-op.ll
-
value-bug-inseltpoison.ll
-
value-bug.ll
-
vec_list_bias-inseltpoison.ll
-
vec_list_bias.ll
-
vectorize-reorder-reuse.ll
-
slp-max-phi-size.ll

Differential D57059

[SLP] Initial support for the vectorization of the non-power-of-2 vectors.
Needs ReviewPublic

Authored by ABataev on Jan 22 2019, 8:30 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
hfinkel
craig.topper
dtemirbulatov
anton-afanasyev

Summary

Possibly vectorized operations are extended to the power-of-2 number with UndefValues to allow to use regular vector operations.

For SPEC CPU2017 it gives ~7% perf gain for 526.blender_r (AVX512,
O3+LTOi, -march=native), ~2% gain for 538.imagick_r and 638.imagick_s,
~2% gain for 525.x264_r and 625.x264_s, ~2% gain for 526.blender_r (AVX2
, O3+LTO, -march=native), ~11% gain 526.blender_r, ~3% gain for
544.nab_r and 644.nab_s (AVX512, O3+LTO), ~3% gain
for 526.blender_r, ~2% gain for 544.nab_r and 644.nab_s (AVX2, O3+LTO).

Compile and link time are pretty the same:

AVX512, O3+LTO, -march=native
Metric: compile_time
Geomean difference -0.1% (-1.85 sec)

Metric: link_time
Geomean difference +1.2% (+14.46 sec)

AVX512, O3+LTO
Metric: compile_time
Geomean difference -0.2% (-4.71 sec)

Metric: link_time
Geomean difference -3.6% (-54.53 sec)

AVX2, O3+LTO, -march=native
Metric: compile_time
Geomean difference +0.3% (+10.56 sec)

Metric: link_time
Geomean difference -0.1% (-2.18 sec)

AVX2, O3+LTO
Metric: compile_time
Geomean difference +0.2% (+5.73 sec)

Metric: link_time
Geomean difference -3.4% (-67.45 sec)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Rebase

Harbormaster completed remote builds in B72530: Diff 293480.Sep 22 2020, 9:38 AM

some very minor style comments - a general comment would be to try and pre-commit the style/NFC refactor/cleanup changes so the size of this patch is smaller

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3106	Do these trivial style refactors separately now to reduce the size of the patch?
3121	Do these trivial style refactors separately now to reduce the size of the patch?
4110	duplicate cast
4144	duplicate cast
4158	duplicate cast
4215	duplicate cast
4921–4922	trivial style refactor - pull out of patch?

Rebase

Harbormaster completed remote builds in B72586: Diff 293576.Sep 22 2020, 4:09 PM

RKSimon added inline comments.Sep 23 2020, 4:24 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
217–218	Are we going to have a problem if VL[0] is UndefValue?

ABataev added inline comments.Sep 23 2020, 4:30 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
217–218	Yeah, will fix it.

Rebase + fix

Harbormaster completed remote builds in B72644: Diff 293701.Sep 23 2020, 5:42 AM

RKSimon added inline comments.Sep 23 2020, 5:52 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
15	These "feel" like regressions to me - any idea whats going on?

ABataev added inline comments.Sep 23 2020, 5:55 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
15	The cost model problem, if I recall it correctly. I investigated it before and found out that the cost model for AArch64 is not defined for long vectors in some cases and we fall back to the generic cost model evaluation which is not quite correct in many cases. Need to tweak the cost model for AArch64.

RKSimon added inline comments.Sep 23 2020, 6:15 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
15	Any instruction cost type (extract/shuffle/store?) in particular that needs better costs? It'd be good to at least raise a specific bug report to the aarch64 team

ABataev added inline comments.Sep 23 2020, 6:31 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
15	Do not remember already, need some time to investigate it again. Hope to do it by the end of this week. PS. There was a question about this test already.

spatel added inline comments.Sep 23 2020, 7:18 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8302	Is it necessary to copy these? If so, it would be better to name this function something like "getCopyOfExtraArgValues" to make that explicit. If not, we can just make this a standard 'get' method: const MapVector<Instruction , Value > &getExtraArgs() const { return ExtraArgs; } And then access the 'second' data in the user code?

ABataev added inline comments.Sep 24 2020, 6:26 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8302	We don't need to expose the `first` element of the `MapVector` here, it is not good from the general design point of view. I'll rename the member function.

Rebase + rename

Harbormaster completed remote builds in B72809: Diff 294040.Sep 24 2020, 6:59 AM

ABataev added inline comments.Sep 24 2020, 7:19 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
15	Found the reason. It is the cost of shuffle of `TTI::SK_PermuteSingleSrc` kind. Before this patch, the test operated with the vector `<2 x i16>`, which is transformed to `llvm::MVT::v2i32` by type legalization function and the cost of this shuffle is tweaked to be `1` (see llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp, `AArch64TTIImpl::getShuffleCost`). The cost of this operation is 1, per table. With this patch, the original vector type is `<4 x i16>` which is transformed to `llvm::MVT::v4i16` and there is no optimized value for `TTI::SK_PermuteSingleSrc` in the table for this type and the function falls back to the pessimistic cost model and returns `18`. There are several TODOs int the file already about fixing the cost model for different shuffle operations.

Does anyone have any more comments?

spatel mentioned this in D88505: [InstCombine] ease alignment restriction for converting masked load to normal load.Sep 30 2020, 6:10 AM

spatel added inline comments.Sep 30 2020, 6:23 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3593–3595	Use isValidElementType() or check for undef directly? I still can't tell from the debug statement exactly what we are guarding against. Should the type check already be here even without this patch?
4411	Are we always creating a masked load for a vector with 2 elements? This logic needs a code comment to explain the cases.
5563	Please add code comment/example to explain what the difference is between these 2 clauses.
5594	Is Passthrough a full vector of undef elements? If so, it should be created/named that way (or directly in the call to CreateMaskedLoad()) rather than in the loop.
5654–5655	Similar to above (so can we add a helper function to avoid duplicating the code?): Please add code comment/example to explain what the difference is between these 2 clauses.

reverse ping

Herald added a subscriber: pengfei. · View Herald TranscriptOct 14 2020, 5:56 AM

In D57059#2329971, @RKSimon wrote:

reverse ping

Will update the patch as soon as I'm back to work, in 2-3 weeks.

reverse ping?

In D57059#2379030, @RKSimon wrote:

reverse ping?

Need some time to setup my dev environment, will update ASAP

ABataev added inline comments.Nov 9 2020, 10:37 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3593–3595	I was just trying to protect the code and try to support it only for simple types at first. There are some doubts that the cost for masked loads/stores is completed and I protected it to make it work only for simple types. I can remove this check if the cost model for masked ops is good enough.
4411	No, no need to do it for 2 elements, removed it.
5563	Fixed it, thanks.
5594	Fixed
5654–5655	Fixed, thanks!

RKSimon added inline comments.Nov 9 2020, 11:14 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3593–3595	masked load/store costs for constant masks should be good enough now (getScalarizationOverhead should now provide us with a reasonable fallback)

Rebase, updates and fixes

Harbormaster completed remote builds in B78299: Diff 304196.Nov 10 2020, 8:24 AM

All of my comments were addressed, so LGTM. But please wait for an official 'accept' from at least 1 other reviewer.

RKSimon added inline comments.Nov 12 2020, 9:35 AM

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
272	There isn't a ANY check-prefix atm (it was cleaned out in rG119e4550ddedc75e4 as part of the unused prefix cleanup) - please can you review?

ABataev added inline comments.Nov 12 2020, 9:39 AM

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
272	Yes, need to remove it, I think. Most probably, caused but not quite clean merge.

Rebase, test cleanup + small code improvements.

Harbormaster completed remote builds in B78688: Diff 304968.Nov 12 2020, 2:10 PM

Rebase

Harbormaster completed remote builds in B79634: Diff 306735.Nov 20 2020, 10:42 AM

xbolva00 added a subscriber: xbolva00.Nov 20 2020, 10:53 AM

xbolva00 added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	Regression on avx?

ABataev added inline comments.Nov 20 2020, 11:01 AM

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	Yes, looks like the issue with the cost of `@llvm.masked.gather` for masked gather with some undefs in the mask

craig.topper added inline comments.Nov 20 2020, 11:07 AM

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	Gather is slow on CPUs prior to AVX512. And its cost is proportional to the number of elements. I don't think the value of the mask should be a factor.

ABataev added inline comments.Nov 20 2020, 11:18 AM

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	True, but in some cases it can be optimized into `gather + shuffle` instead of wide `gather`, if there are undefs in mask.

Rebase + improve handling of masked gathers.

Harbormaster completed remote builds in B79808: Diff 307098.Nov 23 2020, 9:22 AM

anton-afanasyev added inline comments.Nov 23 2020, 1:01 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2893	Could it be actually no one `Instruction` in `UseEntry` or should it be assert?
2897	"Lane 0" seems outdated here, but not sure about better description.
3077	`assert(NumberOfInstructions != 0 && "...")` and `if (NumberOfInstructions == 1)`?
3896	Comment typo: `aggrgate`.
5586–5587	`emplace_back()`
7014	typo: "indeces"

Fixed according to comments

Harbormaster completed remote builds in B79873: Diff 307204.Nov 23 2020, 2:59 PM

Fixed function name.

Harbormaster completed remote builds in B79874: Diff 307205.Nov 23 2020, 3:13 PM

Rebase

Harbormaster completed remote builds in B80547: Diff 308408.Nov 30 2020, 10:09 AM

Btw, I've observed significant compile-time regression with this patch: http://llvm-compile-time-tracker.com/compare.php?from=99d82412f822190a6caa3e3a5b9f87b71f56de47&to=81b636bae72c967f526bcd18de45a6f4a76daa41&stat=instructions (thanks to @nikic for awesome service). This could be justified in case of comparable performance improvements but have you done any benchmarking?

In D57059#2426996, @anton-afanasyev wrote:

Btw, I've observed significant compile-time regression with this patch: http://llvm-compile-time-tracker.com/compare.php?from=99d82412f822190a6caa3e3a5b9f87b71f56de47&to=81b636bae72c967f526bcd18de45a6f4a76daa41&stat=instructions (thanks to @nikic for awesome service). This could be justified in case of comparable performance improvements but have you done any benchmarking?

I have done a while back with SPECINT 2006 and as I remember results were good, but I am not sure that I could find those now. Yes, for me, having this new functionality with presented compile-time regression looks ok.

I dont think (geomean) 0.20% is significant compile time problem. TBH, I expected bigger CT regressions - up to 0.5% is fine IMHO.

Rebase

Harbormaster completed remote builds in B81116: Diff 309568.Dec 4 2020, 10:40 AM

Rebase

Harbormaster completed remote builds in B81501: Diff 310293.Dec 8 2020, 11:41 AM

AFAICT the only outstanding question is whether the compile time increase is acceptable?

In D57059#2442286, @RKSimon wrote:

AFAICT the only outstanding question is whether the compile time increase is acceptable?

I'd agree that geomean = 0.2% is acceptable for the change with such awesome performance impact, just noted that changed time compilation is significant in comparision with other changes. Generally it looks good to me apart from one minor unaddressed comment.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3599	Well, "unsortable" or "unprocessable" term would be more precise. But why did we change `if (sortPtrAccesses)...` to opposite condition? This change just duplicate debug output, since we didn't differentiate it. Also I'd prefer to see the same `if-else` structure as for the load case.

In D57059#2442527, @anton-afanasyev wrote:

In D57059#2442286, @RKSimon wrote:

AFAICT the only outstanding question is whether the compile time increase is acceptable?

I'd agree that geomean = 0.2% is acceptable for the change with such awesome performance impact, just noted that changed time compilation is significant in comparision with other changes. Generally it looks good to me apart from one minor unaddressed comment.

Could the summary of the revision be updated with the performance data? The thread is very long and I didn't spot where the measurements are, so it's hard to say what we're trading off here...

Generally though this level of geomean regression is fine, the issue is usually in the outliers. For example, I spot this this 4-5% regression:

CMakeFiles/lencod.dir/transform8x8.c.o 	3053M 	3188M (+4.41%)

It may be worthwhile to briefly check it in case something can be improved.

In D57059#2442542, @nikic wrote:

In D57059#2442527, @anton-afanasyev wrote:

In D57059#2442286, @RKSimon wrote:

AFAICT the only outstanding question is whether the compile time increase is acceptable?

I'd agree that geomean = 0.2% is acceptable for the change with such awesome performance impact, just noted that changed time compilation is significant in comparision with other changes. Generally it looks good to me apart from one minor unaddressed comment.

Could the summary of the revision be updated with the performance data? The thread is very long and I didn't spot where the measurements are, so it's hard to say what we're trading off here...

Will try to run the benchmarks and get fresh data.

Generally though this level of geomean regression is fine, the issue is usually in the outliers. For example, I spot this this 4-5% regression:
CMakeFiles/lencod.dir/transform8x8.c.o 	3053M 	3188M (+4.41%)
It may be worthwhile to briefly check it in case something can be improved.

I'll check what can be improved in terms of compile time. Not sure that will be able to improve it significantly since the patch itself does not add extensive analysis/transformations, just adds an extra 1 iteration for wider vector analysis. But I'll check it anyway and will try to improve things where possible.

While reviewing the latest update, I think I spotted SLP compile-time failure in SingleSource/Benchmarks/Misc/oourafft.c, here is the reduced testcase to reporduce:
source_filename = "/home/dtemirbulatov/llvm/test-suite/SingleSource/Benchmarks/Misc/oourafft.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local fastcc void @cft1st(double* %a) unnamed_addr #0 {
entry:

%0 = or i64 16, 2
%arrayidx107 = getelementptr inbounds double, double* %a, i64 %0
%1 = or i64 16, 3
%arrayidx114 = getelementptr inbounds double, double* %a, i64 %1
%2 = or i64 16, 4
%arrayidx131 = getelementptr inbounds double, double* %a, i64 %2
%3 = or i64 16, 6
%arrayidx134 = getelementptr inbounds double, double* %a, i64 %3
%4 = load double, double* %arrayidx134, align 8
%5 = or i64 16, 5
%arrayidx138 = getelementptr inbounds double, double* %a, i64 %5
%6 = or i64 16, 7
%arrayidx141 = getelementptr inbounds double, double* %a, i64 %6
%7 = load double, double* %arrayidx141, align 8
%sub149 = fsub double undef, %4
%sub156 = fsub double undef, %7
store double undef, double* %arrayidx131, align 8
store double undef, double* %arrayidx138, align 8
%sub178 = fsub double undef, %sub156
%add179 = fadd double undef, %sub149
%mul180 = fmul double undef, %sub178
%sub182 = fsub double %mul180, undef
store double %sub182, double* %arrayidx107, align 8
%mul186 = fmul double undef, %add179
%add188 = fadd double %mul186, undef
store double %add188, double* %arrayidx114, align 8
unreachable

}

attributes #0 = { "target-features"="+avx,+avx2,+bmi,+bmi2,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" }

!llvm.ident = !{!0}

!0 = !{!"clang version 12.0.0 (https://github.com/llvm/llvm-project.git aaa925795f93c389a96ee01bab73bc2b6b771cbb)"}

In D57059#2443350, @dtemirbulatov wrote:
While reviewing the latest update, I think I spotted SLP compile-time failure in SingleSource/Benchmarks/Misc/oourafft.c, here is the reduced testcase to reporduce:
source_filename = "/home/dtemirbulatov/llvm/test-suite/SingleSource/Benchmarks/Misc/oourafft.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local fastcc void @cft1st(double* %a) unnamed_addr #0 {
entry:
%0 = or i64 16, 2
%arrayidx107 = getelementptr inbounds double, double* %a, i64 %0
%1 = or i64 16, 3
%arrayidx114 = getelementptr inbounds double, double* %a, i64 %1
%2 = or i64 16, 4
%arrayidx131 = getelementptr inbounds double, double* %a, i64 %2
%3 = or i64 16, 6
%arrayidx134 = getelementptr inbounds double, double* %a, i64 %3
%4 = load double, double* %arrayidx134, align 8
%5 = or i64 16, 5
%arrayidx138 = getelementptr inbounds double, double* %a, i64 %5
%6 = or i64 16, 7
%arrayidx141 = getelementptr inbounds double, double* %a, i64 %6
%7 = load double, double* %arrayidx141, align 8
%sub149 = fsub double undef, %4
%sub156 = fsub double undef, %7
store double undef, double* %arrayidx131, align 8
store double undef, double* %arrayidx138, align 8
%sub178 = fsub double undef, %sub156
%add179 = fadd double undef, %sub149
%mul180 = fmul double undef, %sub178
%sub182 = fsub double %mul180, undef
store double %sub182, double* %arrayidx107, align 8
%mul186 = fmul double undef, %add179
%add188 = fadd double %mul186, undef
store double %add188, double* %arrayidx114, align 8
unreachable
}

attributes #0 = { "target-features"="+avx,+avx2,+bmi,+bmi2,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" }

!llvm.ident = !{!0}

!0 = !{!"clang version 12.0.0 (https://github.com/llvm/llvm-project.git aaa925795f93c389a96ee01bab73bc2b6b771cbb)"}

Do you mean compile time increasing? With this patch?

Do you mean compile time increasing? With this patch?

no, just compile-time error.

In D57059#2443492, @dtemirbulatov wrote:

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Crash or incorrect code?

In D57059#2443496, @ABataev wrote:

In D57059#2443492, @dtemirbulatov wrote:

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Crash or incorrect code?

Crash.

wxiao3 added a subscriber: wxiao3.Dec 14 2020, 7:09 AM

ABataev edited the summary of this revision. (Show Details)Feb 10 2021, 6:01 AM

Extra numbers:

AVX512, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   146.00   148.00   1.4%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   146.00   148.00   1.4%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    34.00    34.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    34.00    34.00   0.0%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  5587.00  5560.00  -0.5%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  5587.00  5560.00  -0.5%
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  7384.00  7341.00  -0.6%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  9607.00  9359.00  -2.6%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5340.00  5178.00  -3.0%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1053.00  1006.00  -4.5%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1053.00  1006.00  -4.5%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   141.00   134.00  -5.0%
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   141.00   134.00  -5.0%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3996.00  3563.00 -10.8%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3996.00  3563.00 -10.8%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   862.00   767.00 -11.0%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   862.00   767.00 -11.0%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   524.00   463.00 -11.6%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   524.00   463.00 -11.6%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   426.00   370.00 -13.1%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   426.00   370.00 -13.1%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 15945.00 12573.00 -21.1%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test      NaN    16.00   nan%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test      NaN    16.00   nan%
                                                             Geomean difference                     nan%

AVX512, O3+LTO
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test    22.00    60.00 172.7%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test    22.00    60.00 172.7%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test    68.00    72.00   5.9%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test    68.00    72.00   5.9%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    10.00    10.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  3396.00  3396.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    10.00    10.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  3396.00  3396.00   0.0%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   499.00   497.00  -0.4%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   499.00   497.00  -0.4%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   838.00   826.00  -1.4%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   838.00   826.00  -1.4%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  6090.00  5906.00  -3.0%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test   131.00   127.00  -3.1%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test   131.00   127.00  -3.1%
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  8815.00  8452.00  -4.1%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  2864.00  2712.00  -5.3%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  2864.00  2712.00  -5.3%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 16049.00 14753.00  -8.1%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   686.00   621.00  -9.5%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   686.00   621.00  -9.5%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   551.00   473.00 -14.2%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   551.00   473.00 -14.2%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 16240.00 13941.00 -14.2%
                                                             Geomean difference                     4.3%

AVX2, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  7309.00  7341.00   0.4%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    34.00    34.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  5490.00  5490.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    34.00    34.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  5490.00  5490.00   0.0%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   462.00   455.00  -1.5%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   462.00   455.00  -1.5%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  9508.00  9347.00  -1.7%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5393.00  5190.00  -3.8%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1066.00   968.00  -9.2%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1066.00   968.00  -9.2%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   151.00   134.00 -11.3%
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   151.00   134.00 -11.3%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   160.00   141.00 -11.9%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   160.00   141.00 -11.9%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   820.00   722.00 -12.0%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   820.00   722.00 -12.0%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3605.00  3173.00 -12.0%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3605.00  3173.00 -12.0%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   438.00   370.00 -15.5%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   438.00   370.00 -15.5%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14842.00 12463.00 -16.0%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test   106.00    79.00 -25.5%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test   106.00    79.00 -25.5%
                                                             Geomean difference                    -8.7%

AVX2, O3+LTO
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test    22.00    60.00 172.7%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test    22.00    60.00 172.7%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test    68.00    72.00   5.9%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test    68.00    72.00   5.9%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    10.00    10.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  3396.00  3396.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    10.00    10.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  3396.00  3396.00   0.0%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   499.00   497.00  -0.4%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   499.00   497.00  -0.4%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   838.00   826.00  -1.4%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   838.00   826.00  -1.4%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test   131.00   127.00  -3.1%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test   131.00   127.00  -3.1%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  6094.00  5906.00  -3.1%
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  8734.00  8452.00  -3.2%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  2867.00  2712.00  -5.4%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  2867.00  2712.00  -5.4%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 16026.00 14753.00  -7.9%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   686.00   621.00  -9.5%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   686.00   621.00  -9.5%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 16241.00 13941.00 -14.2%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   559.00   473.00 -15.4%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   559.00   473.00 -15.4%
                                                             Geomean difference                     4.2%

Will update the patch soon.

Rework, bug fixes, rebase

This is an integral patch, going to split it into several smaller patches.

Harbormaster completed remote builds in B88714: Diff 322816.Feb 10 2021, 8:03 PM

Btw, how could it be explained NumVectorInstructions stat reducing after this patch?

In D57059#2553914, @ABataev wrote:

Extra numbers:

AVX512, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

...
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3996.00  3563.00 -10.8%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3996.00  3563.00 -10.8%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   862.00   767.00 -11.0%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   862.00   767.00 -11.0%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   524.00   463.00 -11.6%
...

In D57059#2555233, @ABataev wrote:

This is an integral patch, going to split it into several smaller patches.

Are you planning to send for review all of this patches or just to commit them after this integral review? I.e. should we go on with review here?

In D57059#2578657, @anton-afanasyev wrote:

Btw, how could it be explained NumVectorInstructions stat reducing after this patch?

In D57059#2553914, @ABataev wrote:

Extra numbers:

AVX512, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

...
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3996.00  3563.00 -10.8%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3996.00  3563.00 -10.8%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   862.00   767.00 -11.0%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   862.00   767.00 -11.0%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   524.00   463.00 -11.6%
...

Actually, it is not reducing. This is how test-suite python script works. So, here lhs - number of instructions after this patch, rhs - before. And the less relative number, the more vector instructions we actually generate.

In D57059#2578661, @anton-afanasyev wrote:

In D57059#2555233, @ABataev wrote:

This is an integral patch, going to split it into several smaller patches.

Are you planning to send for review all of this patches or just to commit them after this integral review? I.e. should we go on with review here?

You can publish your comments here, no need to wait for small patches.

Actually, it is not reducing. This is how test-suite python script works. So, here lhs - number of instructions after this patch, rhs - before. And the less relative number, the more vector instructions we actually generate.

Oh, I see. There are still several reducing cases though.

In D57059#2578770, @anton-afanasyev wrote:

Actually, it is not reducing. This is how test-suite python script works. So, here lhs - number of instructions after this patch, rhs - before. And the less relative number, the more vector instructions we actually generate.

Oh, I see. There are still several reducing cases though.

Actually, no. I compared the resulting IR files for these cases - they are absolutely the same as before. It is just we generating fewer ExtractElement/InsertElement instructions and directly emit Shuffle instructions in some cases. That's why it seems it generates fewer vector instructions though it is not.

ABataev added a child revision: D97406: [Instcombiner]Improve emission of logical or/and reductions..Feb 24 2021, 1:06 PM

ABataev mentioned this in rG04ba80ca4dee: [Instcombiner]Improve emission of logical or/and reductions..Mar 4 2021, 8:02 AM

Rebase

Harbormaster completed remote builds in B94525: Diff 331650.Mar 18 2021, 1:08 PM

Removed logical reductions conversion code.

Harbormaster completed remote builds in B94702: Diff 331873.Mar 19 2021, 8:30 AM

ABataev mentioned this in D98967: [Analysis]Add getPointersDiff function to improve compile time..Mar 19 2021, 10:33 AM

ABataev mentioned this in rG065a14a12d26: [Analysis]Add getPointersDiff function to improve compile time..Mar 23 2021, 12:59 PM

ABataev mentioned this in rG99203f2004d0: [Analysis]Add getPointersDiff function to improve compile time..Mar 23 2021, 2:26 PM

Rebase

Harbormaster completed remote builds in B95477: Diff 332970.Mar 24 2021, 11:00 AM

Rebase

Harbormaster completed remote builds in B96712: Diff 334682.Apr 1 2021, 7:42 AM

What is the status of this patch? any blockers? or just lack of reviewers?

In D57059#2666065, @xbolva00 wrote:

What is the status of this patch? any blockers? or just lack of reviewers?

The patch is just too big, I'm splitting it into smaller chunks and commit them step by step. The final chunk would be largest one. But before need to commit other improvements that can be separated from the patch.

ABataev mentioned this in D99796: [SLP]Improve vectorization of the CmpInst instructions..Apr 2 2021, 8:25 AM

ABataev mentioned this in rG00a84f9a7f89: [SLP]Improve vectorization of the CmpInst instructions..Apr 5 2021, 6:50 AM

Rebase

Harbormaster completed remote builds in B97157: Diff 335310.Apr 5 2021, 12:59 PM

ABataev mentioned this in D99980: [SLP]Improve cost model for the vectorized extractelements..Apr 6 2021, 11:09 AM

ABataev mentioned this in rGe99b98cb1bca: [SLP]Improve cost model for the vectorized extractelements..Apr 22 2021, 7:41 AM

ABataev mentioned this in D101109: [SLP]Improve multinode analysis..Apr 22 2021, 2:01 PM

ABataev mentioned this in D101297: [SLP]Allow masked gathers only if allowed by target..Apr 26 2021, 7:41 AM

ABataev mentioned this in rGb5f64768cfee: [SLP]Allow masked gathers only if allowed by target..May 3 2021, 7:05 AM

ABataev mentioned this in rGfd18547e0721: [SLP]Allow masked gathers only if allowed by target..May 3 2021, 8:07 AM

Hi Alexey! Is this patch ready for reviewing or will other patches be splitted from this one?

In D57059#2766508, @anton-afanasyev wrote:

Hi Alexey! Is this patch ready for reviewing or will other patches be splitted from this one?

Just like I said before, it must be split into several smaller patches. I'm doing this step by step, there is D101109, which is part of this big patch. I want to commit it, then rebase this patch and split it again, there are some other parts that can be committed independently.

ABataev mentioned this in D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..May 20 2021, 6:31 AM

SjoerdMeijer added a subscriber: SjoerdMeijer.May 20 2021, 6:35 AM

ABataev mentioned this in D103247: [SLP]Allow to reorder nodes with >2 scalar values..May 27 2021, 6:48 AM

Is it worth rebasing this to show the remaining diffs that still need to go in?

In D57059#2785073, @RKSimon wrote:

Is it worth rebasing this to show the remaining diffs that still need to go in?

There were not many commits for it, need to commit some extra patches to fix some regressions, but any way I've started rebasing. We'll try to rebase it next week

ABataev mentioned this in D103458: [SLP]Improve gathering of scalar elements..Jun 1 2021, 6:53 AM

ABataev mentioned this in rG89f3bc7698c5: [SLP]Allow to reorder nodes with >2 scalar values..Jun 3 2021, 10:03 AM

ABataev mentioned this in D103638: [SLP]Improve vectorization of PHI instructions..Jun 3 2021, 11:18 AM

ABataev mentioned this in rGa0086add2e52: [SLP]Improve gathering of scalar elements..Jun 9 2021, 5:24 AM

ABataev mentioned this in D104122: [SLP]Improve vectorization of stores..Jun 11 2021, 8:07 AM

ABataev mentioned this in rG908b7536615e: [SLP]Improve vectorization of PHI instructions..Jun 21 2021, 12:27 PM

anton-afanasyev mentioned this in D105042: [SLP][COST][X86]Improve cost model for masked gather..Jul 6 2021, 3:33 PM

ABataev mentioned this in rGc574d2fbaca4: [SLP]Improve vectorization of stores..Jul 8 2021, 12:49 PM

ABataev mentioned this in D105986: [SLP]Improve vectorization of gathered loads..Jul 14 2021, 7:41 AM

In D57059#2785076, @ABataev wrote:

In D57059#2785073, @RKSimon wrote:

Is it worth rebasing this to show the remaining diffs that still need to go in?

There were not many commits for it, need to commit some extra patches to fix some regressions, but any way I've started rebasing. We'll try to rebase it next week

Any chance that you could refresh this patch with your rebase please? I'm investigating a lot of 'float3' style performance issues at the moment (PR50920, PR51075, PR51091) and I'd like to get a better idea of how close the non-pow2 slp support will get us. Thanks.

In D57059#2877539, @RKSimon wrote:

In D57059#2785076, @ABataev wrote:

In D57059#2785073, @RKSimon wrote:

Is it worth rebasing this to show the remaining diffs that still need to go in?

There were not many commits for it, need to commit some extra patches to fix some regressions, but any way I've started rebasing. We'll try to rebase it next week

Any chance that you could refresh this patch with your rebase please? I'm investigating a lot of 'float3' style performance issues at the moment (PR50920, PR51075, PR51091) and I'd like to get a better idea of how close the non-pow2 slp support will get us. Thanks.

I will try to do it ASAP.

That's awesome thanks, it'll definitely help improve the feedback I can give on the patch series.

RKSimon mentioned this in D106399: [VectorCombine] Widening of partial vector loads.Jul 21 2021, 4:59 AM

Rebase. Did not test it thoroughly, just rebased and fixed test cases.

Harbormaster completed remote builds in B115286: Diff 360420.Jul 21 2021, 6:59 AM

Thank you!

Thank you, checked this patch after rebase, trying to fix PR49933. It works well for it, reported to https://bugs.llvm.org/show_bug.cgi?id=49933.

anton-afanasyev mentioned this in rGdd028c359e09: [SLP][Test] Add tests for PR47624 and PR49933.Sep 4 2021, 3:18 PM

nick added a subscriber: nick.Sep 4 2021, 4:30 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 8:01 PM

ABataev mentioned this in rGbd053769867f: [SLP]Improve multinode analysis..Dec 14 2021, 6:18 AM

ABataev mentioned this in D123516: Fix SLP score for out of order contiguous loads.Apr 12 2022, 11:38 AM

liaolucy added a subscriber: liaolucy.May 17 2022, 7:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 7:18 PM

Herald added subscribers: kosarev, StephenFan. · View Herald Transcript

Current status?

Herald added subscribers: • pcwang-thead, nlopes. · View Herald TranscriptSep 7 2022, 12:53 PM

In D57059#3775369, @xbolva00 wrote:

Current status?

Still requires several patches to commit.

In D57059#3775511, @ABataev wrote:

In D57059#3775369, @xbolva00 wrote:

Current status?

Still requires several patches to commit.

Are those patches linked somewhere?

Herald added a subscriber: wangpc. · View Herald TranscriptAug 8 2023, 6:40 AM

In D57059#4569337, @danilaml wrote:

In D57059#3775511, @ABataev wrote:

In D57059#3775369, @xbolva00 wrote:

Current status?

Still requires several patches to commit.

Are those patches linked somewhere?

Almost all my current SLP patches are related to non-power-of-2 support. But even these patches are not the full list. Need to add several others after.

sunshaoce added a subscriber: sunshaoce.Aug 17 2023, 2:19 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

6 lines

Transforms/

Vectorize/

SLPVectorizer.h

3 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

85 lines

Transforms/

Utils/

LoopUtils.cpp

20 lines

Vectorize/

SLPVectorizer.cpp

2772 lines

test/

Transforms/

SLPVectorizer/

AArch64/

PR38339.ll

37 lines

accelerate-vector-functions-inseltpoison.ll

90 lines

accelerate-vector-functions.ll

98 lines

ext-trunc.ll

46 lines

gather-root.ll

35 lines

horizontal.ll

2 lines

insertelement-inseltpoison.ll

2 lines

insertelement.ll

2 lines

transpose-inseltpoison.ll

127 lines

transpose.ll

127 lines

trunc-insertion.ll

10 lines

AMDGPU/

add_sub_sat-inseltpoison.ll

32 lines

add_sub_sat.ll

32 lines

SystemZ/

pr34619.ll

23 lines

X86/

PR39774.ll

55 lines

alternate-calls-inseltpoison.ll

60 lines

alternate-calls.ll

60 lines

alternate-cast-inseltpoison.ll

92 lines

alternate-cast.ll

92 lines

alternate-fp-inseltpoison.ll

11 lines

alternate-fp.ll

11 lines

alternate-int-inseltpoison.ll

164 lines

alternate-int.ll

164 lines

blending-shuffle-inseltpoison.ll

81 lines

blending-shuffle.ll

81 lines

cmp_commute-inseltpoison.ll

17 lines

17 lines

42 lines

51 lines

45 lines

crash_exceed_scheduling.ll

17 lines

crash_lencod.ll

7 lines

crash_mandeltext.ll

6 lines

crash_reordering_undefs.ll

26 lines

crash_smallpt.ll

31 lines

crash_vectorizeTree.ll

27 lines

cse.ll

18 lines

extract.ll

11 lines

extractelement.ll

16 lines

fptosi-inseltpoison.ll

159 lines

159 lines

145 lines

43 lines

6 lines

90 lines

insert-element-build-vector-inseltpoison.ll

57 lines

insert-element-build-vector.ll

57 lines

jumbled-load-multiuse.ll

9 lines

jumbled-load-used-in-phi.ll

2 lines

load-merge-inseltpoison.ll

32 lines

load-merge.ll

32 lines

lookahead.ll

104 lines

minimum-sizes.ll

64 lines

no_alternate_divrem.ll

56 lines

78 lines

41 lines

68 lines

12 lines

4 lines

50 lines

pr42022-inseltpoison.ll

5 lines

pr42022.ll

45 lines

pr47623.ll

19 lines

pr47629-inseltpoison.ll

639 lines

pr47629.ll

639 lines

pr49081.ll

15 lines

reduction2.ll

8 lines

reorder_repeated_ops.ll

48 lines

resched.ll

87 lines

rgb_phi.ll

50 lines

schedule-bundle.ll

16 lines

shrink_after_reorder.ll

13 lines

supernode.ll

2 lines

used-reduced-op.ll

406 lines

value-bug-inseltpoison.ll

20 lines

value-bug.ll

20 lines

vec_list_bias-inseltpoison.ll

38 lines

vec_list_bias.ll

38 lines

vectorize-reorder-reuse.ll

52 lines

slp-max-phi-size.ll

445 lines

Diff 331650

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

	Show First 20 Lines • Show All 673 Lines • ▼ Show 20 Lines
	/// If necessary this method will version the stride of the pointer according			/// If necessary this method will version the stride of the pointer according
	/// to \p PtrToStride and therefore add further predicates to \p PSE.			/// to \p PtrToStride and therefore add further predicates to \p PSE.
	/// The \p Assume parameter indicates if we are allowed to make additional			/// The \p Assume parameter indicates if we are allowed to make additional
	/// run-time assumptions.			/// run-time assumptions.
	int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value Ptr, const Loop Lp,			int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value Ptr, const Loop Lp,
	const ValueToValueMap &StridesMap = ValueToValueMap(),			const ValueToValueMap &StridesMap = ValueToValueMap(),
	bool Assume = false, bool ShouldCheckWrap = true);			bool Assume = false, bool ShouldCheckWrap = true);

				/// Returns the distance between the pointers \p PtrA and \p PtrB iff they are
				/// compatible and it is possible to calculate the distance between them. This
				/// is a simple API that does not depend on the analysis pass.
				Optional<int> getPointersDiff(Value PtrA, Value PtrB, const DataLayout &DL,
				ScalarEvolution &SE);

	/// Attempt to sort the pointers in \p VL and return the sorted indices			/// Attempt to sort the pointers in \p VL and return the sorted indices
	/// in \p SortedIndices, if reordering is required.			/// in \p SortedIndices, if reordering is required.
	///			///
	/// Returns 'true' if sorting is legal, otherwise returns 'false'.			/// Returns 'true' if sorting is legal, otherwise returns 'false'.
	///			///
	/// For example, for a given \p VL of memory accesses in program order, a[i+4],			/// For example, for a given \p VL of memory accesses in program order, a[i+4],
	/// a[i+0], a[i+1] and a[i+7], this function will sort the \p VL and save the			/// a[i+0], a[i+1] and a[i+7], this function will sort the \p VL and save the
	/// sorted indices in \p SortedIndices as a[i+0], a[i+1], a[i+4], a[i+7] and			/// sorted indices in \p SortedIndices as a[i+0], a[i+1], a[i+4], a[i+7] and
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	bool vectorizeInsertElementInst(InsertElementInst IEI, BasicBlock BB,
slpvectorizer::BoUpSLP &R);		slpvectorizer::BoUpSLP &R);

/// Try to vectorize trees that start at compare instructions.		/// Try to vectorize trees that start at compare instructions.
bool vectorizeCmpInst(CmpInst CI, BasicBlock BB, slpvectorizer::BoUpSLP &R);		bool vectorizeCmpInst(CmpInst CI, BasicBlock BB, slpvectorizer::BoUpSLP &R);

/// Tries to vectorize constructs started from CmpInst, InsertValueInst or		/// Tries to vectorize constructs started from CmpInst, InsertValueInst or
/// InsertElementInst instructions.		/// InsertElementInst instructions.
bool vectorizeSimpleInstructions(SmallVectorImpl<Instruction *> &Instructions,		bool vectorizeSimpleInstructions(SmallVectorImpl<Instruction *> &Instructions,
BasicBlock *BB, slpvectorizer::BoUpSLP &R);		BasicBlock *BB, slpvectorizer::BoUpSLP &R,
		bool AtTerminator);

/// Scan the basic block and look for patterns that are likely to start		/// Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.		/// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);		bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,		bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,
unsigned Idx);		unsigned Idx);

Show All 12 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,118 Lines • ▼ Show 20 Lines	if (Assume) {
PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);		PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
} else		} else
return 0;		return 0;
}		}

return Stride;		return Stride;
}		}

		Optional<int> llvm::getPointersDiff(Value PtrA, Value PtrB,
		const DataLayout &DL, ScalarEvolution &SE) {
		unsigned ASA = PtrA->getType()->getPointerAddressSpace();
		unsigned ASB = PtrB->getType()->getPointerAddressSpace();

		// Check that the address spaces match and that the pointers are valid.
		if (!PtrA \|\| !PtrB \|\| (ASA != ASB) \|\| PtrA->getType() != PtrB->getType())
		return None;

		// Make sure that A and B are different pointers.
		if (PtrA == PtrB)
		return 0;

		unsigned IdxWidth = DL.getIndexSizeInBits(ASA);
		Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();

		APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0);
		Value *PtrA1 = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA);
		Value *PtrB1 = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB);

		IdxWidth = DL.getIndexSizeInBits(ASA);
		APInt Size(IdxWidth, DL.getTypeStoreSize(Ty));
		if (PtrA1 == PtrB1) {
		// Retrieve the address space again as pointer stripping now tracks through
		// `addrspacecast`.
		ASA = cast<PointerType>(PtrA1->getType())->getAddressSpace();
		ASB = cast<PointerType>(PtrB1->getType())->getAddressSpace();
		// Check that the address spaces match and that the pointers are valid.
		if (ASA != ASB)
		return None;

		OffsetA = OffsetA.sextOrTrunc(IdxWidth);
		OffsetB = OffsetB.sextOrTrunc(IdxWidth);

		// OffsetDelta = OffsetB - OffsetA;
		const SCEV *OffsetSCEVA = SE.getConstant(OffsetA);
		const SCEV *OffsetSCEVB = SE.getConstant(OffsetB);
		const SCEV *OffsetDeltaSCEV = SE.getMinusSCEV(OffsetSCEVB, OffsetSCEVA);
		const APInt &OffsetDelta = cast<SCEVConstant>(OffsetDeltaSCEV)->getAPInt();

		// Check if they are based on the same pointer. That makes the offsets
		// sufficient.
		int Val = OffsetDelta.getSExtValue() / Size.getSExtValue();
		if (Val * Size == OffsetDelta)
		return Val;
		return None;
		}

		// Otherwise compute the distance with SCEV between the base pointers.
		const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
		const SCEV *PtrSCEVB = SE.getSCEV(PtrB);
		const auto *Diff =
		dyn_cast<SCEVConstant>(SE.getMinusSCEV(PtrSCEVB, PtrSCEVA));
		if (Diff) {
		int Val = Diff->getAPInt().getSExtValue() / Size.getSExtValue();
		if (Val * Size == Diff->getAPInt())
		return Val;
		}
		return None;
		}

bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,		bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
ScalarEvolution &SE,		ScalarEvolution &SE,
SmallVectorImpl<unsigned> &SortedIndices) {		SmallVectorImpl<unsigned> &SortedIndices) {
assert(llvm::all_of(		assert(llvm::all_of(
VL, [](const Value *V) { return V->getType()->isPointerTy(); }) &&		VL, [](const Value *V) { return V->getType()->isPointerTy(); }) &&
"Expected list of pointer operands.");		"Expected list of pointer operands.");
SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;		SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;
OffValPairs.reserve(VL.size());		OffValPairs.reserve(VL.size());

// Walk over the pointers, and map each of them to an offset relative to		// Walk over the pointers, and map each of them to an offset relative to
// first pointer in the array.		// first pointer in the array.
Value *Ptr0 = VL[0];		Value *Ptr0 = VL[0];
const SCEV *Scev0 = SE.getSCEV(Ptr0);		OffValPairs.emplace_back(0, Ptr0);
Value *Obj0 = getUnderlyingObject(Ptr0);

llvm::SmallSet<int64_t, 4> Offsets;		llvm::SmallSet<int64_t, 4> Offsets;
for (auto *Ptr : VL) {		for (auto *Ptr : VL.drop_front()) {
// TODO: Outline this code as a special, more time consuming, version of		Optional<int> Diff = getPointersDiff(Ptr0, Ptr, DL, SE);
// computeConstantDifference() function.
if (Ptr->getType()->getPointerAddressSpace() !=
Ptr0->getType()->getPointerAddressSpace())
return false;
// If a pointer refers to a different underlying object, bail - the
// pointers are by definition incomparable.
Value *CurrObj = getUnderlyingObject(Ptr);
if (CurrObj != Obj0)
return false;

const SCEV *Scev = SE.getSCEV(Ptr);
const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev, Scev0));
// The pointers may not have a constant offset from each other, or SCEV
// may just not be smart enough to figure out they do. Regardless,
// there's nothing we can do.
if (!Diff)		if (!Diff)
return false;		return false;

// Check if the pointer with the same offset is found.		// Check if the pointer with the same offset is found.
int64_t Offset = Diff->getAPInt().getSExtValue();		int Offset = *Diff;
if (!Offsets.insert(Offset).second)		if (!Offsets.insert(Offset).second)
return false;		return false;
OffValPairs.emplace_back(Offset, Ptr);		OffValPairs.emplace_back(Offset, Ptr);
}		}
SortedIndices.clear();		SortedIndices.clear();
SortedIndices.resize(VL.size());		SortedIndices.resize(VL.size());
std::iota(SortedIndices.begin(), SortedIndices.end(), 0);		std::iota(SortedIndices.begin(), SortedIndices.end(), 0);

▲ Show 20 Lines • Show All 1,139 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

	Show First 20 Lines • Show All 1,018 Lines • ▼ Show 20 Lines
	Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,			Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,
	const TargetTransformInfo *TTI,			const TargetTransformInfo *TTI,
	Value *Src, RecurKind RdxKind,			Value *Src, RecurKind RdxKind,
	ArrayRef<Value *> RedOps) {			ArrayRef<Value *> RedOps) {
	TargetTransformInfo::ReductionFlags RdxFlags;			TargetTransformInfo::ReductionFlags RdxFlags;
	RdxFlags.IsMaxOp = RdxKind == RecurKind::SMax \|\| RdxKind == RecurKind::UMax \|\|			RdxFlags.IsMaxOp = RdxKind == RecurKind::SMax \|\| RdxKind == RecurKind::UMax \|\|
	RdxKind == RecurKind::FMax;			RdxKind == RecurKind::FMax;
	RdxFlags.IsSigned = RdxKind == RecurKind::SMax \|\| RdxKind == RecurKind::SMin;			RdxFlags.IsSigned = RdxKind == RecurKind::SMax \|\| RdxKind == RecurKind::SMin;
				// Special reductions for i1 or and and operations. No need to emit reductions
				// here, just x != <0, 0, .., 0> for reduction or and x == <1, 1, .., 1> for
				// reduction and.
	auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();			auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
				if ((RdxKind == RecurKind::And \|\| RdxKind == RecurKind::Or) &&
				SrcVecEltTy == Builder.getInt1Ty()) {
				Value *Res = Builder.CreateBitCast(
				Src, Builder.getIntNTy(cast<VectorType>(Src->getType())
				->getElementCount()
				.getFixedValue()));
				if (RdxKind == RecurKind::And) {
				Res = Builder.CreateICmpEQ(Res,
				ConstantInt::getAllOnesValue(Res->getType()));
				} else {
				assert(RdxKind == RecurKind::Or && "Expected or reduction.");
				Res = Builder.CreateIsNotNull(Res);
				}
				return Res;
				}

	switch (RdxKind) {			switch (RdxKind) {
	case RecurKind::Add:			case RecurKind::Add:
	return Builder.CreateAddReduce(Src);			return Builder.CreateAddReduce(Src);
	case RecurKind::Mul:			case RecurKind::Mul:
	return Builder.CreateMulReduce(Src);			return Builder.CreateMulReduce(Src);
	case RecurKind::And:			case RecurKind::And:
	return Builder.CreateAndReduce(Src);			return Builder.CreateAndReduce(Src);
	case RecurKind::Or:			case RecurKind::Or:
	▲ Show 20 Lines • Show All 671 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> LookAheadUsersBudget(
"slp-look-ahead-users-budget", cl::init(2), cl::Hidden,		"slp-look-ahead-users-budget", cl::init(2), cl::Hidden,
cl::desc("The maximum number of users to visit while visiting the "		cl::desc("The maximum number of users to visit while visiting the "
"predecessors. This prevents compilation time increase."));		"predecessors. This prevents compilation time increase."));

static cl::opt<bool>		static cl::opt<bool>
ViewSLPTree("view-slp-tree", cl::Hidden,		ViewSLPTree("view-slp-tree", cl::Hidden,
cl::desc("Display the SLP trees with Graphviz"));		cl::desc("Display the SLP trees with Graphviz"));

		// FIXME: These 2 options are required to avoid regressions in O3+LTO because of
		// too early optimizations at compile time.
		static cl::opt<int>
		MinNonPow2StoresSize("slp-min-non-power2-stores-size", cl::init(6),
		cl::Hidden,
		cl::desc("The minimum number of non-power-2 stores to "
		"vectorize to try to use masked stores."));

		static cl::opt<int>
		MinNonPow2ValuesSize("slp-min-non-power2-values-size", cl::init(4),
		cl::Hidden,
		cl::desc("The minimum number of non-power-2 non-store "
		"values to try the vectorization."));

// Limit the number of alias checks. The limit is chosen so that		// Limit the number of alias checks. The limit is chosen so that
// it has no negative effect on the llvm benchmarks.		// it has no negative effect on the llvm benchmarks.
static const unsigned AliasedCheckLimit = 10;		static const unsigned AliasedCheckLimit = 10;

// Another limit for the alias checks: The maximum distance between load/store		// Another limit for the alias checks: The maximum distance between load/store
// instructions where alias checks are done.		// instructions where alias checks are done.
// This limit is useful for very large basic blocks.		// This limit is useful for very large basic blocks.
static const unsigned MaxMemDepDistance = 160;		static const unsigned MaxMemDepDistance = 160;
Show All 10 Lines
/// avoids spending time checking the cost model and realizing that they will		/// avoids spending time checking the cost model and realizing that they will
/// be inevitably scalarized.		/// be inevitably scalarized.
static bool isValidElementType(Type *Ty) {		static bool isValidElementType(Type *Ty) {
return VectorType::isValidElementType(Ty) && !Ty->isX86_FP80Ty() &&		return VectorType::isValidElementType(Ty) && !Ty->isX86_FP80Ty() &&
!Ty->isPPC_FP128Ty();		!Ty->isPPC_FP128Ty();
}		}

/// \returns true if all of the instructions in \p VL are in the same block or		/// \returns true if all of the instructions in \p VL are in the same block or
/// false otherwise.		/// false otherwise.
static bool allSameBlock(ArrayRef<Value *> VL) {		template <typename T> static bool allSameBlock(T &&VL) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Are we going to have a problem if VL[0] is UndefValue? RKSimon: Are we going to have a problem if VL[0] is UndefValue?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yeah, will fix it. ABataev: Yeah, will fix it.
Instruction *I0 = dyn_cast<Instruction>(VL[0]);		if (empty(VL))
if (!I0)
return false;
BasicBlock *BB = I0->getParent();
for (int I = 1, E = VL.size(); I < E; I++) {
auto *II = dyn_cast<Instruction>(VL[I]);
if (!II)
return false;

if (BB != II->getParent())
return false;
}
return true;		return true;
		auto I0 = cast<Instruction>(VL.begin());
		BasicBlock *BB = I0->getParent();
		return all_of(drop_begin(VL, 1), [BB](Value *V) {
		return BB == cast<Instruction>(V)->getParent();
		});
}		}

/// \returns True if all of the values in \p VL are constants (but not		/// \returns True if all of the values in \p VL are constants (but not
/// globals/constant expressions).		/// globals/constant expressions).
static bool allConstant(ArrayRef<Value *> VL) {		static bool allConstant(ArrayRef<Value *> VL) {
// Constant expressions and globals can't be vectorized like normal integer/FP		// Constant expressions and globals can't be vectorized like normal integer/FP
// constants.		// constants.
for (Value *i : VL)		for (Value *i : VL)
if (!isa<Constant>(i) \|\| isa<ConstantExpr>(i) \|\| isa<GlobalValue>(i))		if (!isa<Constant>(i) \|\| isa<ConstantExpr>(i) \|\| isa<GlobalValue>(i))
return false;		return false;
return true;		return true;
}		}

/// \returns True if all of the values in \p VL are identical.		/// \returns True if all defined values in \p VL are identical.
static bool isSplat(ArrayRef<Value *> VL) {		static bool isSplat(ArrayRef<Value *> VL) {
for (unsigned i = 1, e = VL.size(); i < e; ++i)		Value *VL0 = nullptr;
if (VL[i] != VL[0])		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
		if (!VL0) {
		VL0 = V;
		continue;
		}
		if (V != VL0)
return false;		return false;
		}
return true;		return true;
}		}

/// \returns True if \p I is commutative, handles CmpInst and BinaryOperator.		/// \returns True if \p I is commutative, handles CmpInst and BinaryOperator.
static bool isCommutative(Instruction *I) {		static bool isCommutative(Instruction *I) {
if (auto *Cmp = dyn_cast<CmpInst>(I))		if (auto *Cmp = dyn_cast<CmpInst>(I))
return Cmp->isCommutative();		return Cmp->isCommutative();
if (auto *BO = dyn_cast<BinaryOperator>(I))		if (auto *BO = dyn_cast<BinaryOperator>(I))
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	static bool isValidForAlternation(unsigned Opcode) {
return true;		return true;
}		}

/// \returns analysis of the Instructions in \p VL described in		/// \returns analysis of the Instructions in \p VL described in
/// InstructionsState, the Opcode that we suppose the whole list		/// InstructionsState, the Opcode that we suppose the whole list
/// could be vectorized even if its structure is diverse.		/// could be vectorized even if its structure is diverse.
static InstructionsState getSameOpcode(ArrayRef<Value *> VL,		static InstructionsState getSameOpcode(ArrayRef<Value *> VL,
unsigned BaseIndex = 0) {		unsigned BaseIndex = 0) {
// Make sure these are all Instructions.		// Make sure these are all Instructions or UndefValues.
if (llvm::any_of(VL, [](Value *V) { return !isa<Instruction>(V); }))		auto &&IsNotInstructionOrAllUndefs = [](ArrayRef<Value *> VL) {
		bool AllUndefs = true;
		for (Value *V : VL) {
		if (!isa<UndefValue>(V)) {
		if (isa<Instruction>(V)) {
		AllUndefs = false;
		continue;
		}
		return true;
		}
		}
		return AllUndefs;
		};
		if (IsNotInstructionOrAllUndefs(VL))
return InstructionsState(VL[BaseIndex], nullptr, nullptr);		return InstructionsState(VL[BaseIndex], nullptr, nullptr);
		BaseIndex =
		std::distance(VL.begin(), llvm::find_if(llvm::drop_begin(VL, BaseIndex),
		Instruction::classof));

bool IsCastOp = isa<CastInst>(VL[BaseIndex]);		bool IsCastOp = isa<CastInst>(VL[BaseIndex]);
		RKSimonUnsubmitted Done Reply Inline Actions Worth using find_if ? RKSimon: Worth using find_if ?
bool IsBinOp = isa<BinaryOperator>(VL[BaseIndex]);		bool IsBinOp = isa<BinaryOperator>(VL[BaseIndex]);
unsigned Opcode = cast<Instruction>(VL[BaseIndex])->getOpcode();		unsigned Opcode = cast<Instruction>(VL[BaseIndex])->getOpcode();
unsigned AltOpcode = Opcode;		unsigned AltOpcode = Opcode;
unsigned AltIndex = BaseIndex;		unsigned AltIndex = BaseIndex;

// Check for one alternate opcode from another BinaryOperator.		// Check for one alternate opcode from another BinaryOperator.
// TODO - generalize to support all operators (types, calls etc.).		// TODO - generalize to support all operators (types, calls etc.).
for (int Cnt = 0, E = VL.size(); Cnt < E; Cnt++) {		for (int Cnt = 0, E = VL.size(); Cnt < E; Cnt++) {
		if (isa<UndefValue>(VL[Cnt]))
		continue;
unsigned InstOpcode = cast<Instruction>(VL[Cnt])->getOpcode();		unsigned InstOpcode = cast<Instruction>(VL[Cnt])->getOpcode();
if (IsBinOp && isa<BinaryOperator>(VL[Cnt])) {		if (IsBinOp && isa<BinaryOperator>(VL[Cnt])) {
if (InstOpcode == Opcode \|\| InstOpcode == AltOpcode)		if (InstOpcode == Opcode \|\| InstOpcode == AltOpcode)
continue;		continue;
if (Opcode == AltOpcode && isValidForAlternation(InstOpcode) &&		if (Opcode == AltOpcode && isValidForAlternation(InstOpcode) &&
isValidForAlternation(Opcode)) {		isValidForAlternation(Opcode)) {
AltOpcode = InstOpcode;		AltOpcode = InstOpcode;
AltIndex = Cnt;		AltIndex = Cnt;
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
}		}

namespace llvm {		namespace llvm {

static void inversePermutation(ArrayRef<unsigned> Indices,		static void inversePermutation(ArrayRef<unsigned> Indices,
SmallVectorImpl<int> &Mask) {		SmallVectorImpl<int> &Mask) {
Mask.clear();		Mask.clear();
const unsigned E = Indices.size();		const unsigned E = Indices.size();
Mask.resize(E, E + 1);		Mask.resize(E, UndefMaskElem);
for (unsigned I = 0; I < E; ++I)		for (unsigned I = 0; I < E; ++I)
		if (Indices[I] < E)
Mask[Indices[I]] = I;		Mask[Indices[I]] = I;
}		}

namespace slpvectorizer {		namespace slpvectorizer {

/// Bottom Up SLP Vectorizer.		/// Bottom Up SLP Vectorizer.
class BoUpSLP {		class BoUpSLP {
struct TreeEntry;		struct TreeEntry;
struct ScheduleData;		struct ScheduleData;
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	public:
void buildTree(ArrayRef<Value *> Roots,		void buildTree(ArrayRef<Value *> Roots,
ExtraValueToDebugLocsMap &ExternallyUsedValues,		ExtraValueToDebugLocsMap &ExternallyUsedValues,
ArrayRef<Value *> UserIgnoreLst = None);		ArrayRef<Value *> UserIgnoreLst = None);

/// Clear the internal data structures that are created by 'buildTree'.		/// Clear the internal data structures that are created by 'buildTree'.
void deleteTree() {		void deleteTree() {
VectorizableTree.clear();		VectorizableTree.clear();
ScalarToTreeEntry.clear();		ScalarToTreeEntry.clear();
		EntryVFs.clear();
MustGather.clear();		MustGather.clear();
		GatheredLoads.clear();
		GatheredLoadsEntriesFirst = -1;
ExternalUses.clear();		ExternalUses.clear();
NumOpsWantToKeepOrder.clear();		NumOpsWantToKeepOrder.clear();
NumOpsWantToKeepOriginalOrder = 0;		NumOpsWantToKeepOriginalOrder = 0;
for (auto &Iter : BlocksSchedules) {		for (auto &Iter : BlocksSchedules) {
BlockScheduling *BS = Iter.second.get();		BlockScheduling *BS = Iter.second.get();
BS->clear();		BS->clear();
}		}
MinBWs.clear();		MinBWs.clear();
		InstrElementSize.clear();
}		}

unsigned getTreeSize() const { return VectorizableTree.size(); }		unsigned getTreeSize() const { return VectorizableTree.size(); }

/// Perform LICM and CSE on the newly generated gather sequences.		/// Perform LICM and CSE on the newly generated gather sequences.
void optimizeGatherSequence();		void optimizeGatherSequence();

/// \returns The best order of instructions for vectorization.		/// \returns The best order of instructions for vectorization.
Show All 31 Lines	public:
/// be reordered, the best order will be \<1, 0\>. We need to extend this		/// be reordered, the best order will be \<1, 0\>. We need to extend this
/// order for the root node. For the root node this order should look like		/// order for the root node. For the root node this order should look like
/// \<3, 0, 1, 2\>. This function extends the order for the reused		/// \<3, 0, 1, 2\>. This function extends the order for the reused
/// instructions.		/// instructions.
void findRootOrder(OrdersType &Order) {		void findRootOrder(OrdersType &Order) {
// If the leaf has the same number of instructions to vectorize as the root		// If the leaf has the same number of instructions to vectorize as the root
// - order must be set already.		// - order must be set already.
unsigned RootSize = VectorizableTree[0]->Scalars.size();		unsigned RootSize = VectorizableTree[0]->Scalars.size();
if (Order.size() == RootSize)		// Checks if the order is normalized relatively the root node, i.e. it has
		// the same number of undef elements (undef element is equal to RootSize
		// value) as the root node scalars.
		auto &&IsNormalizedOrder = [this, RootSize](const OrdersType &Order) {
		return count(Order, RootSize) ==
		count_if(VectorizableTree[0]->Scalars, UndefValue::classof);
		};
		// Check if the current order has the same number of undefined elements as
		// the root node.
		if (IsNormalizedOrder(Order))
return;		return;
SmallVector<unsigned, 4> RealOrder(Order.size());
std::swap(Order, RealOrder);
SmallVector<int, 4> Mask;
inversePermutation(RealOrder, Mask);
Order.assign(Mask.begin(), Mask.end());
// The leaf has less number of instructions - need to find the true order of		// The leaf has less number of instructions - need to find the true order of
// the root.		// the root.
// Scan the nodes starting from the leaf back to the root.		// Scan the nodes starting from the leaf back to the root.
const TreeEntry *PNode = VectorizableTree.back().get();		const TreeEntry *PNode = VectorizableTree.back().get();
SmallVector<const TreeEntry *, 4> Nodes(1, PNode);		SmallVector<const TreeEntry *, 4> Nodes(1, PNode);
SmallPtrSet<const TreeEntry *, 4> Visited;		SmallPtrSet<const TreeEntry *, 4> Visited;
while (!Nodes.empty() && Order.size() != RootSize) {		while (!Nodes.empty() && !IsNormalizedOrder(Order)) {
const TreeEntry *PNode = Nodes.pop_back_val();		const TreeEntry *PNode = Nodes.pop_back_val();
if (!Visited.insert(PNode).second)		if (!Visited.insert(PNode).second)
continue;		continue;
const TreeEntry &Node = *PNode;		const TreeEntry &Node = *PNode;
for (const EdgeInfo &EI : Node.UserTreeIndices)		for (const EdgeInfo &EI : Node.UserTreeIndices)
if (EI.UserTE)		if (EI.UserTE)
Nodes.push_back(EI.UserTE);		Nodes.push_back(EI.UserTE);
if (Node.ReuseShuffleIndices.empty())		if (Node.ReuseShuffleIndices.empty())
continue;		continue;
// Build the order for the parent node.		// Build the order for the parent node.
OrdersType NewOrder(Node.ReuseShuffleIndices.size(), RootSize);		SmallVector<int, 4> Mask;
SmallVector<unsigned, 4> OrderCounter(Order.size(), 0);		inversePermutation(Order, Mask);
		Order.assign(RootSize, RootSize);
		SmallVector<unsigned, 4> OrderCounter(RootSize + 1, 0);
// The algorithm of the order extension is:		// The algorithm of the order extension is:
// 1. Calculate the number of the same instructions for the order.		// 1. Calculate the number of the same instructions for the order.
// 2. Calculate the index of the new order: total number of instructions		// 2. Calculate the index of the new order: total number of instructions
// with order less than the order of the current instruction + reuse		// with order less than the order of the current instruction + reuse
// number of the current instruction.		// number of the current instruction.
// 3. The new order is just the index of the instruction in the original		// 3. The new order is just the index of the instruction in the original
// vector of the instructions.		// vector of the instructions.
for (unsigned I : Node.ReuseShuffleIndices)		for (unsigned I : Node.ReuseShuffleIndices)
++OrderCounter[Order[I]];		if (I != RootSize && Mask[I] != UndefMaskElem)
SmallVector<unsigned, 4> CurrentCounter(Order.size(), 0);		++OrderCounter[Mask[I]];
		SmallVector<unsigned, 4> CurrentCounter(Order.size() + 1, 0);
for (unsigned I = 0, E = Node.ReuseShuffleIndices.size(); I < E; ++I) {		for (unsigned I = 0, E = Node.ReuseShuffleIndices.size(); I < E; ++I) {
unsigned ReusedIdx = Node.ReuseShuffleIndices[I];		unsigned ReusedIdx = Node.ReuseShuffleIndices[I];
unsigned OrderIdx = Order[ReusedIdx];		if (ReusedIdx == RootSize)
		continue;
		int OrderIdx = Mask[ReusedIdx];
		if (OrderIdx == UndefMaskElem) {
		// Special case where the UndefValue is actually a real operand. Need
		// to expand the order taking this UndefValue into account.
		OrderIdx = RootSize;
		}
unsigned NewIdx = 0;		unsigned NewIdx = 0;
for (unsigned J = 0; J < OrderIdx; ++J)		for (int J = 0; J < OrderIdx; ++J)
NewIdx += OrderCounter[J];		NewIdx += OrderCounter[J];
NewIdx += CurrentCounter[OrderIdx];		NewIdx += CurrentCounter[OrderIdx];
++CurrentCounter[OrderIdx];		++CurrentCounter[OrderIdx];
assert(NewOrder[NewIdx] == RootSize &&		assert(Order[NewIdx] == RootSize &&
"The order index should not be written already.");		"The order index should not be written already.");
NewOrder[NewIdx] = I;		Order[NewIdx] = I;
}		}
std::swap(Order, NewOrder);
}		}
assert(Order.size() == RootSize &&		// The order must be normalized relatively the root node after the
"Root node is expected or the size of the order must be the same as "		// function.
"the number of elements in the root node.");		assert(IsNormalizedOrder(Order) &&
assert(llvm::all_of(Order,		"Indices for all non-undefs must be set.");
[RootSize](unsigned Val) { return Val != RootSize; }) &&
"All indices must be initialized");
}		}

/// \return The vector element size in bits to use when vectorizing the		/// \return The vector element size in bits to use when vectorizing the
/// expression tree ending at \p V. If V is a store, the size is the width of		/// expression tree ending at \p V. If V is a store, the size is the width of
/// the stored value. Otherwise, the size is the width of the largest loaded		/// the stored value. Otherwise, the size is the width of the largest loaded
/// value reaching V. This method is used by the vectorizer to calculate		/// value reaching V. This method is used by the vectorizer to calculate
/// vectorization factors.		/// vectorization factors.
unsigned getVectorElementSize(Value *V);		unsigned getVectorElementSize(Value *V);
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	class VLOperands {

/// During operand reordering, we are trying to select the operand at lane		/// During operand reordering, we are trying to select the operand at lane
/// that matches best with the operand at the neighboring lane. Our		/// that matches best with the operand at the neighboring lane. Our
/// selection is based on the type of value we are looking for. For example,		/// selection is based on the type of value we are looking for. For example,
/// if the neighboring lane has a load, we need to look for a load that is		/// if the neighboring lane has a load, we need to look for a load that is
/// accessing a consecutive address. These strategies are summarized in the		/// accessing a consecutive address. These strategies are summarized in the
/// 'ReorderingMode' enumerator.		/// 'ReorderingMode' enumerator.
enum class ReorderingMode {		enum class ReorderingMode {
		Unknown, ///< Mode is not defined yet
Load, ///< Matching loads to consecutive memory addresses		Load, ///< Matching loads to consecutive memory addresses
Opcode, ///< Matching instructions based on opcode (same or alternate)		Opcode, ///< Matching instructions based on opcode (same or alternate)
Constant, ///< Matching constants		Constant, ///< Matching constants
Splat, ///< Matching the same instruction multiple times (broadcast)		Splat, ///< Matching the same instruction multiple times (broadcast)
Failed, ///< We failed to create a vectorizable group		Failed, ///< We failed to create a vectorizable group
};		};

using OperandDataVec = SmallVector<OperandData, 2>;		using OperandDataVec = SmallVector<OperandData, 2>;

/// A vector of operand vectors.		/// A vector of operand vectors.
SmallVector<OperandDataVec, 4> OpsVec;		SmallVector<OperandDataVec, 4> OpsVec;

const DataLayout &DL;		const DataLayout &DL;
ScalarEvolution &SE;		ScalarEvolution &SE;
const BoUpSLP &R;		const BoUpSLP &R;
		/// Base instruction in the list of scalars, the first instruction with the
		/// main opcode.
		Instruction &VL0;
		/// Number of lanes in the node, i.e. PowerOf2Ceil(number of instructions in
		/// the node).
		unsigned NumLanes = 0;

/// \returns the operand data at \p OpIdx and \p Lane.		/// \returns the operand data at \p OpIdx and \p Lane.
OperandData &getData(unsigned OpIdx, unsigned Lane) {		OperandData &getData(unsigned OpIdx, unsigned Lane) {
return OpsVec[OpIdx][Lane];		return OpsVec[OpIdx][Lane];
}		}

/// \returns the operand data at \p OpIdx and \p Lane. Const version.		/// \returns the operand data at \p OpIdx and \p Lane. Const version.
const OperandData &getData(unsigned OpIdx, unsigned Lane) const {		const OperandData &getData(unsigned OpIdx, unsigned Lane) const {
Show All 9 Lines	void clearUsed() {
OpsVec[OpIdx][Lane].IsUsed = false;		OpsVec[OpIdx][Lane].IsUsed = false;
}		}

/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.		/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.
void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {		void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {
std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);		std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);
}		}

// The hard-coded scores listed here are not very important. When computing		// The hard-coded scores listed here are not very important, though it shall
// the scores of matching one sub-tree with another, we are basically		// be higher for better matches to iimprove the resulting cost. When
// counting the number of values that are matching. So even if all scores		// computing the scores of matching one sub-tree with another, we are
// are set to 1, we would still get a decent matching result.		// basically counting the number of values that are matching. So even if all
		// scores are set to 1, we would still get a decent matching result.
// However, sometimes we have to break ties. For example we may have to		// However, sometimes we have to break ties. For example we may have to
// choose between matching loads vs matching opcodes. This is what these		// choose between matching loads vs matching opcodes. This is what these
// scores are helping us with: they provide the order of preference.		// scores are helping us with: they provide the order of preference. Also,
		// this is improtant if the scalar is externally used or used in another
		// tree entry node in the different lane.

/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).		/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).
static const int ScoreConsecutiveLoads = 3;		static const int ScoreConsecutiveLoads = 4;
		/// Loads from reversed memory addresses, e.g. load(A[i+1]), load(A[i]).
		static const int ScoreReversedLoads = 3;
/// ExtractElementInst from same vector and consecutive indexes.		/// ExtractElementInst from same vector and consecutive indexes.
static const int ScoreConsecutiveExtracts = 3;		static const int ScoreConsecutiveExtracts = 4;
		/// ExtractElementInst from same vector and reversed indices.
		static const int ScoreReversedExtracts = 3;
/// Constants.		/// Constants.
static const int ScoreConstants = 2;		static const int ScoreConstants = 2;
/// Instructions with the same opcode.		/// Instructions with the same opcode.
static const int ScoreSameOpcode = 2;		static const int ScoreSameOpcode = 2;
/// Instructions with alt opcodes (e.g, add + sub).		/// Instructions with alt opcodes (e.g, add + sub).
static const int ScoreAltOpcodes = 1;		static const int ScoreAltOpcodes = 1;
/// Identical instructions (a.k.a. splat or broadcast).		/// Identical instructions (a.k.a. splat or broadcast).
static const int ScoreSplat = 1;		static const int ScoreSplat = 1;
/// Matching with an undef is preferable to failing.		/// Matching with an undef is preferable to failing.
static const int ScoreUndef = 1;		static const int ScoreUndef = 1;
/// Score for failing to find a decent match.		/// Score for failing to find a decent match.
static const int ScoreFail = 0;		static const int ScoreFail = 0;
/// User exteranl to the vectorized code.		/// User exteranl to the vectorized code.
static const int ExternalUseCost = 1;		static const int ExternalUseCost = 1;
/// The user is internal but in a different lane.		/// The user is internal but in a different lane.
static const int UserInDiffLaneCost = ExternalUseCost;		static const int UserInDiffLaneCost = ExternalUseCost;

/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.		/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.
static int getShallowScore(Value V1, Value V2, const DataLayout &DL,		static int getShallowScore(Value V1, Value V2, const DataLayout &DL,
ScalarEvolution &SE) {		ScalarEvolution &SE, int NumLanes) {
		if (V1 == V2)
		return VLOperands::ScoreSplat;

auto *LI1 = dyn_cast<LoadInst>(V1);		auto *LI1 = dyn_cast<LoadInst>(V1);
auto *LI2 = dyn_cast<LoadInst>(V2);		auto *LI2 = dyn_cast<LoadInst>(V2);
if (LI1 && LI2)		if (LI1 && LI2) {
return isConsecutiveAccess(LI1, LI2, DL, SE)		if (LI1->getParent() != LI2->getParent())
? VLOperands::ScoreConsecutiveLoads		return VLOperands::ScoreFail;
: VLOperands::ScoreFail;
		Optional<int> Dist = getPointersDiff(LI1->getPointerOperand(),
		LI2->getPointerOperand(), DL, SE);
		if (!Dist)
		return VLOperands::ScoreFail;
		// The distance is too large - still may be profitable to use masked
		// loads/gathers.
		if (std::abs(*Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		return (*Dist > 0) ? VLOperands::ScoreConsecutiveLoads
		: VLOperands::ScoreReversedLoads;
		}

auto *C1 = dyn_cast<Constant>(V1);		auto *C1 = dyn_cast<Constant>(V1);
auto *C2 = dyn_cast<Constant>(V2);		auto *C2 = dyn_cast<Constant>(V2);
if (C1 && C2)		if (C1 && C2 && !isa<UndefValue>(V2))
return VLOperands::ScoreConstants;		return VLOperands::ScoreConstants;

// Extracts from consecutive indexes of the same vector better score as		// Extracts from consecutive indexes of the same vector better score as
// the extracts could be optimized away.		// the extracts could be optimized away.
Value *EV;		Value *EV;
ConstantInt Ex1Idx, Ex2Idx;		ConstantInt Ex1Idx, Ex2Idx;
if (match(V1, m_ExtractElt(m_Value(EV), m_ConstantInt(Ex1Idx))) &&		if (match(V2, m_ExtractElt(m_Value(EV), m_ConstantInt(Ex2Idx)))) {
match(V2, m_ExtractElt(m_Deferred(EV), m_ConstantInt(Ex2Idx))) &&		if (match(V1, m_ExtractElt(m_Deferred(EV), m_ConstantInt(Ex1Idx)))) {
Ex1Idx->getZExtValue() + 1 == Ex2Idx->getZExtValue())		int Idx1 = Ex1Idx->getZExtValue();
return VLOperands::ScoreConsecutiveExtracts;		int Idx2 = Ex2Idx->getZExtValue();
		int Dist = Idx2 - Idx1;
		// The distance is too large - still may be profitable to use
		// shuffles.
		if (std::abs(Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		return (Dist > 0) ? VLOperands::ScoreConsecutiveExtracts
		: VLOperands::ScoreReversedExtracts;
		}
		return VLOperands::ScoreFail;
		}

auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (I1 && I2) {		if (I1 && I2) {
if (I1 == I2)		if (I1->getParent() != I2->getParent())
return VLOperands::ScoreSplat;		return VLOperands::ScoreFail;
InstructionsState S = getSameOpcode({I1, I2});		InstructionsState S = getSameOpcode({I1, I2});
// Note: Only consider instructions with <= 2 operands to avoid		// Note: Only consider instructions with <= 2 operands to avoid
// complexity explosion.		// complexity explosion.
if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)		if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)
return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes		return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes
: VLOperands::ScoreSameOpcode;		: VLOperands::ScoreSameOpcode;
}		}

if (isa<UndefValue>(V2))		if (isa<UndefValue>(V2))
return VLOperands::ScoreUndef;		return VLOperands::ScoreUndef;

return VLOperands::ScoreFail;		return VLOperands::ScoreFail;
}		}

/// Holds the values and their lane that are taking part in the look-ahead		/// Holds the values and their lanes that are taking part in the look-ahead
/// score calculation. This is used in the external uses cost calculation.		/// score calculation. This is used in the external uses cost calculation.
SmallDenseMap<Value *, int> InLookAheadValues;		/// Need to hold all the lanes in case of splat/broadcast at least to
		/// correctly check for the use in the different lane.
		SmallDenseMap<Value *, SmallSet<int, 4>> InLookAheadValues;

/// \Returns the additinal cost due to uses of \p LHS and \p RHS that are		/// \Returns the additinal cost due to uses of \p LHS and \p RHS that are
/// either external to the vectorized code, or require shuffling.		/// either external to the vectorized code, or require shuffling.
int getExternalUsesCost(const std::pair<Value *, int> &LHS,		int getExternalUsesCost(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS) {		const std::pair<Value *, int> &RHS) {
int Cost = 0;		int Cost = 0;
std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};		std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};
for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {		for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {
Show All 13 Lines	int getExternalUsesCost(const std::pair<Value *, int> &LHS,
unsigned UsersBudget = LookAheadUsersBudget;		unsigned UsersBudget = LookAheadUsersBudget;
for (User *U : V->users()) {		for (User *U : V->users()) {
if (const TreeEntry *UserTE = R.getTreeEntry(U)) {		if (const TreeEntry *UserTE = R.getTreeEntry(U)) {
// The user is in the VectorizableTree. Check if we need to insert.		// The user is in the VectorizableTree. Check if we need to insert.
auto It = llvm::find(UserTE->Scalars, U);		auto It = llvm::find(UserTE->Scalars, U);
assert(It != UserTE->Scalars.end() && "U is in UserTE");		assert(It != UserTE->Scalars.end() && "U is in UserTE");
int UserLn = std::distance(UserTE->Scalars.begin(), It);		int UserLn = std::distance(UserTE->Scalars.begin(), It);
assert(UserLn >= 0 && "Bad lane");		assert(UserLn >= 0 && "Bad lane");
if (UserLn != Ln)		// If the values are different, check just the line of the current
		// value. If the values are the same, need to add UserInDiffLaneCost
		// only if UserLn does not match both line numbers.
		if ((LHS.first != RHS.first && UserLn != Ln) \|\|
		(LHS.first == RHS.first && UserLn != LHS.second &&
		UserLn != RHS.second)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// Check if the user is in the look-ahead code.		// Check if the user is in the look-ahead code.
auto It2 = InLookAheadValues.find(U);		auto It2 = InLookAheadValues.find(U);
if (It2 != InLookAheadValues.end()) {		if (It2 != InLookAheadValues.end()) {
// The user is in the look-ahead code. Check the lane.		// The user is in the look-ahead code. Check the lane.
if (It2->second != Ln)		if (!It2->getSecond().contains(Ln)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// The user is neither in SLP tree nor in the look-ahead code.		// The user is neither in SLP tree nor in the look-ahead code.
Cost += ExternalUseCost;		Cost += ExternalUseCost;
		break;
}		}
}		}
// Limit the number of visited uses to cap compilation time.		// Limit the number of visited uses to cap compilation time.
if (--UsersBudget == 0)		if (--UsersBudget == 0)
break;		break;
}		}
}		}
return Cost;		return Cost;
Show All 22 Lines	class VLOperands {
/// Luís F. W. Góes		/// Luís F. W. Góes
int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,		int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS, int CurrLevel,		const std::pair<Value *, int> &RHS, int CurrLevel,
int MaxLevel) {		int MaxLevel) {

Value *V1 = LHS.first;		Value *V1 = LHS.first;
Value *V2 = RHS.first;		Value *V2 = RHS.first;
// Get the shallow score of V1 and V2.		// Get the shallow score of V1 and V2.
int ShallowScoreAtThisLevel =		int ShallowScoreAtThisLevel = std::max(
std::max((int)ScoreFail, getShallowScore(V1, V2, DL, SE) -		(int)ScoreFail, getShallowScore(V1, V2, DL, SE, getNumLanes()) -
getExternalUsesCost(LHS, RHS));		getExternalUsesCost(LHS, RHS));
int Lane1 = LHS.second;		int Lane1 = LHS.second;
int Lane2 = RHS.second;		int Lane2 = RHS.second;

// If reached MaxLevel,		// If reached MaxLevel,
// or if V1 and V2 are not instructions,		// or if V1 and V2 are not instructions,
// or if they are SPLAT,		// or if they are SPLAT,
// or if they are not consecutive, early return the current cost.		// or if they are not consecutive,
		// or if profitable to vectorize loads or extractelements, early return
		// the current cost.
auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|		if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|
ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|		ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|
(isa<LoadInst>(I1) && isa<LoadInst>(I2) && ShallowScoreAtThisLevel))		(((isa<LoadInst>(I1) && isa<LoadInst>(I2)) \|\|
		(isa<ExtractElementInst>(I1) && isa<ExtractElementInst>(I2))) &&
		ShallowScoreAtThisLevel))
return ShallowScoreAtThisLevel;		return ShallowScoreAtThisLevel;
assert(I1 && I2 && "Should have early exited.");		assert(I1 && I2 && "Should have early exited.");

// Keep track of in-tree values for determining the external-use cost.		// Keep track of in-tree values for determining the external-use cost.
InLookAheadValues[V1] = Lane1;		InLookAheadValues[V1].insert(Lane1);
InLookAheadValues[V2] = Lane2;		InLookAheadValues[V2].insert(Lane2);

// Contains the I2 operand indexes that got matched with I1 operands.		// Contains the I2 operand indexes that got matched with I1 operands.
SmallSet<unsigned, 4> Op2Used;		SmallSet<unsigned, 4> Op2Used;

// Recursion towards the operands of I1 and I2. We are trying all possbile		// Recursion towards the operands of I1 and I2. We are trying all possbile
// operand pairs, and keeping track of the best score.		// operand pairs, and keeping track of the best score.
for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();		for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();
OpIdx1 != NumOperands1; ++OpIdx1) {		OpIdx1 != NumOperands1; ++OpIdx1) {
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	getBestOperand(unsigned OpIdx, int Lane, int LastLane,
// Sometimes we have more than one option (e.g., Opcode and Undefs), so we		// Sometimes we have more than one option (e.g., Opcode and Undefs), so we
// are using the score to differentiate between the two.		// are using the score to differentiate between the two.
struct BestOpData {		struct BestOpData {
Optional<unsigned> Idx = None;		Optional<unsigned> Idx = None;
unsigned Score = 0;		unsigned Score = 0;
} BestOp;		} BestOp;

// Iterate through all unused operands and look for the best.		// Iterate through all unused operands and look for the best.
		bool IsOpLastLaneUndef = isa<UndefValue>(OpLastLane);
for (unsigned Idx = 0; Idx != NumOperands; ++Idx) {		for (unsigned Idx = 0; Idx != NumOperands; ++Idx) {
// Get the operand at Idx and Lane.		// Get the operand at Idx and Lane.
OperandData &OpData = getData(Idx, Lane);		OperandData &OpData = getData(Idx, Lane);
Value *Op = OpData.V;		Value *Op = OpData.V;
bool OpAPO = OpData.APO;		bool OpAPO = OpData.APO;

// Skip already selected operands.		// Skip already selected operands.
if (OpData.IsUsed)		if (OpData.IsUsed)
continue;		continue;

// Skip if we are trying to move the operand to a position with a		// Skip if we are trying to move the operand to a position with a
// different opcode in the linearized tree form. This would break the		// different opcode in the linearized tree form. This would break the
// semantics.		// semantics.
if (OpAPO != OpIdxAPO)		if (OpAPO != OpIdxAPO)
continue;		continue;

		// Ignore two undefs.
		if (IsOpLastLaneUndef && isa<UndefValue>(Op)) {
		if (BestOp.Score < VLOperands::ScoreUndef) {
		BestOp.Idx = Idx;
		BestOp.Score = VLOperands::ScoreUndef;
		}
		continue;
		}

// Look for an operand that matches the current mode.		// Look for an operand that matches the current mode.
switch (RMode) {		switch (RMode) {
case ReorderingMode::Load:		case ReorderingMode::Load:
case ReorderingMode::Constant:		case ReorderingMode::Constant:
case ReorderingMode::Opcode: {		case ReorderingMode::Opcode: {
bool LeftToRight = Lane > LastLane;		bool LeftToRight = Lane > LastLane;
Value *OpLeft = (LeftToRight) ? OpLastLane : Op;		Value *OpLeft = (LeftToRight) ? OpLastLane : Op;
Value *OpRight = (LeftToRight) ? Op : OpLastLane;		Value *OpRight = (LeftToRight) ? Op : OpLastLane;
unsigned Score =		unsigned Score =
getLookAheadScore({OpLeft, LastLane}, {OpRight, Lane});		getLookAheadScore({OpLeft, LastLane}, {OpRight, Lane});
if (Score > BestOp.Score) {		if (Score > BestOp.Score) {
BestOp.Idx = Idx;		BestOp.Idx = Idx;
BestOp.Score = Score;		BestOp.Score = Score;
}		}
break;		break;
}		}
case ReorderingMode::Splat:		case ReorderingMode::Splat:
if (Op == OpLastLane)		// Undef is also can be part of splat/broadcast.
		if (Op == OpLastLane \|\| IsOpLastLaneUndef \|\| isa<UndefValue>(Op))
BestOp.Idx = Idx;		BestOp.Idx = Idx;
break;		break;
case ReorderingMode::Failed:		case ReorderingMode::Failed:
return None;		return None;
		case ReorderingMode::Unknown:
		llvm_unreachable("Unknown mode is not expected here.");
}		}
}		}

if (BestOp.Idx) {		if (BestOp.Idx) {
getData(BestOp.Idx.getValue(), Lane).IsUsed = true;		getData(BestOp.Idx.getValue(), Lane).IsUsed = true;
return BestOp.Idx;		return BestOp.Idx;
}		}
// If we could not find a good match return None.		// If we could not find a good match return None.
return None;		return None;
}		}

/// Helper for reorderOperandVecs. \Returns the lane that we should start		/// Helper for reorderOperandVecs. \Returns the lane that we should start
/// reordering from. This is the one which has the least number of operands		/// reordering from. This is the one which has the least number of operands
/// that can freely move about.		/// that can freely move about or less profitable because it already has the
		/// most optimal set of operands.
unsigned getBestLaneToStartReordering() const {		unsigned getBestLaneToStartReordering() const {
unsigned BestLane = 0;		unsigned BestLane = 0;
unsigned Min = UINT_MAX;		unsigned Min = UINT_MAX;
for (unsigned Lane = 0, NumLanes = getNumLanes(); Lane != NumLanes;		unsigned SameOpNumber = 0;
++Lane) {		for (int I = getNumLanes(); I > 0; --I) {
unsigned NumFreeOps = getMaxNumOperandsThatCanBeReordered(Lane);		unsigned Lane = I - 1;
if (NumFreeOps < Min) {		std::pair<unsigned, unsigned> NumFreeOpsHash =
Min = NumFreeOps;		getMaxNumOperandsThatCanBeReordered(Lane);
		// Compare the number of operands that can move and choose the one with
		// the least number.
		if (NumFreeOpsHash.first < Min) {
		Min = NumFreeOpsHash.first;
		SameOpNumber = NumFreeOpsHash.second;
		BestLane = Lane;
		} else if (NumFreeOpsHash.first == Min &&
		NumFreeOpsHash.second < SameOpNumber) {
		// Select the most optimal lane in terms of number of operands that
		// should be moved around.
		SameOpNumber = NumFreeOpsHash.second;
BestLane = Lane;		BestLane = Lane;
}		}
}		}
return BestLane;		return BestLane;
}		}

/// \Returns the maximum number of operands that are allowed to be reordered		/// \Returns the maximum number of operands that are allowed to be reordered
/// for \p Lane. This is used as a heuristic for selecting the first lane to		/// for \p Lane and the number of compatible instructions(with the same
/// start operand reordering.		/// parent/opcode). This is used as a heuristic for selecting the first lane
unsigned getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {		/// to start operand reordering.
		std::pair<unsigned, unsigned>
		getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {
unsigned CntTrue = 0;		unsigned CntTrue = 0;
unsigned NumOperands = getNumOperands();		unsigned NumOperands = getNumOperands();
// Operands with the same APO can be reordered. We therefore need to count		// Operands with the same APO can be reordered. We therefore need to count
// how many of them we have for each APO, like this: Cnt[APO] = x.		// how many of them we have for each APO, like this: Cnt[APO] = x.
// Since we only have two APOs, namely true and false, we can avoid using		// Since we only have two APOs, namely true and false, we can avoid using
// a map. Instead we can simply count the number of operands that		// a map. Instead we can simply count the number of operands that
// correspond to one of them (in this case the 'true' APO), and calculate		// correspond to one of them (in this case the 'true' APO), and calculate
// the other by subtracting it from the total number of operands.		// the other by subtracting it from the total number of operands.
for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx)		// Operands with the same instruction opcode and parent are more
if (getData(OpIdx, Lane).APO)		// profitable since we don't need to move them in many cases.
		bool AllUndefs = true;
		unsigned SameCodeParentOps = 0;
		unsigned Opcode = 0;
		BasicBlock *Parent = nullptr;
		for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {
		const OperandData &OpData = getData(OpIdx, Lane);
		if (OpData.APO)
++CntTrue;		++CntTrue;
		if (auto *I = dyn_cast<Instruction>(OpData.V)) {
		if (Opcode != I->getOpcode() \|\| I->getParent() != Parent) {
		if (SameCodeParentOps == 0) {
		SameCodeParentOps = 1;
		Opcode = I->getOpcode();
		Parent = I->getParent();
		} else {
		--SameCodeParentOps;
		}
		} else {
		++SameCodeParentOps;
		}
		}
		AllUndefs = AllUndefs && isa<UndefValue>(OpData.V);
		}
		if (AllUndefs)
		return std::make_pair(UINT_MAX, 0);
unsigned CntFalse = NumOperands - CntTrue;		unsigned CntFalse = NumOperands - CntTrue;
return std::max(CntTrue, CntFalse);		return std::make_pair(std::max(CntTrue, CntFalse), SameCodeParentOps);
}		}

/// Go through the instructions in VL and append their operands.		/// Go through the instructions in VL and append their operands.
void appendOperandsOfVL(ArrayRef<Value *> VL) {		void appendOperandsOfVL(ArrayRef<Value *> VL) {
assert(!VL.empty() && "Bad VL");		assert(!VL.empty() && "Bad VL");
assert((empty() \|\| VL.size() == getNumLanes()) &&		assert((empty() \|\| VL.size() == getNumLanes()) &&
"Expected same number of lanes");		"Expected same number of lanes");
assert(isa<Instruction>(VL[0]) && "Expected instruction");		unsigned NumOperands = VL0.getNumOperands();
unsigned NumOperands = cast<Instruction>(VL[0])->getNumOperands();
OpsVec.resize(NumOperands);		OpsVec.resize(NumOperands);
unsigned NumLanes = VL.size();		unsigned NumLanes = VL.size();
for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {		for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {
OpsVec[OpIdx].resize(NumLanes);		OpsVec[OpIdx].resize(NumLanes);
for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {		for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
		if (isa<UndefValue>(VL[Lane])) {
		OpsVec[OpIdx][Lane] = {
		UndefValue::get(VL0.getOperand(OpIdx)->getType()), false,
		false};
		continue;
		}
assert(isa<Instruction>(VL[Lane]) && "Expected instruction");		assert(isa<Instruction>(VL[Lane]) && "Expected instruction");
// Our tree has just 3 nodes: the root and two operands.		// Our tree has just 3 nodes: the root and two operands.
// It is therefore trivial to get the APO. We only need to check the		// It is therefore trivial to get the APO. We only need to check the
// opcode of VL[Lane] and whether the operand at OpIdx is the LHS or		// opcode of VL[Lane] and whether the operand at OpIdx is the LHS or
// RHS operand. The LHS operand of both add and sub is never attached		// RHS operand. The LHS operand of both add and sub is never attached
// to an inversese operation in the linearized form, therefore its APO		// to an inversese operation in the linearized form, therefore its APO
// is false. The RHS is true only if VL[Lane] is an inverse operation.		// is false. The RHS is true only if VL[Lane] is an inverse operation.

// Since operand reordering is performed on groups of commutative		// Since operand reordering is performed on groups of commutative
// operations or alternating sequences (e.g., +, -), we can safely		// operations or alternating sequences (e.g., +, -), we can safely
// tell the inverse operations by checking commutativity.		// tell the inverse operations by checking commutativity.
bool IsInverseOperation = !isCommutative(cast<Instruction>(VL[Lane]));		bool IsInverseOperation = !isCommutative(cast<Instruction>(VL[Lane]));
bool APO = (OpIdx == 0) ? false : IsInverseOperation;		bool APO = (OpIdx == 0) ? false : IsInverseOperation;
OpsVec[OpIdx][Lane] = {cast<Instruction>(VL[Lane])->getOperand(OpIdx),		OpsVec[OpIdx][Lane] = {cast<Instruction>(VL[Lane])->getOperand(OpIdx),
APO, false};		APO, false};
}		}
}		}
}		}

/// \returns the number of operands.		/// \returns the number of operands.
unsigned getNumOperands() const { return OpsVec.size(); }		unsigned getNumOperands() const { return OpsVec.size(); }

/// \returns the number of lanes.		/// \returns the number of lanes.
unsigned getNumLanes() const { return OpsVec[0].size(); }		unsigned getNumLanes() const { return NumLanes; }

/// \returns the operand value at \p OpIdx and \p Lane.		/// \returns the operand value at \p OpIdx and \p Lane.
Value *getValue(unsigned OpIdx, unsigned Lane) const {		Value *getValue(unsigned OpIdx, unsigned Lane) const {
return getData(OpIdx, Lane).V;		return getData(OpIdx, Lane).V;
}		}

/// \returns true if the data structure is empty.		/// \returns true if the data structure is empty.
bool empty() const { return OpsVec.empty(); }		bool empty() const { return OpsVec.empty(); }
Show All 10 Lines	bool shouldBroadcast(Value *Op, unsigned OpIdx, unsigned Lane) {
if (Ln == Lane)		if (Ln == Lane)
continue;		continue;
// This is set to true if we found a candidate for broadcast at Lane.		// This is set to true if we found a candidate for broadcast at Lane.
bool FoundCandidate = false;		bool FoundCandidate = false;
for (unsigned OpI = 0, OpE = getNumOperands(); OpI != OpE; ++OpI) {		for (unsigned OpI = 0, OpE = getNumOperands(); OpI != OpE; ++OpI) {
OperandData &Data = getData(OpI, Ln);		OperandData &Data = getData(OpI, Ln);
if (Data.APO != OpAPO \|\| Data.IsUsed)		if (Data.APO != OpAPO \|\| Data.IsUsed)
continue;		continue;
if (Data.V == Op) {		if (Data.V == Op \|\| isa<UndefValue>(Op)) {
FoundCandidate = true;		FoundCandidate = true;
Data.IsUsed = true;		Data.IsUsed = true;
break;		break;
}		}
}		}
if (!FoundCandidate)		if (!FoundCandidate)
return false;		return false;
}		}
return true;		return true;
}		}

public:		public:
/// Initialize with all the operands of the instruction vector \p RootVL.		/// Initialize with all the operands of the instruction vector \p RootVL.
VLOperands(ArrayRef<Value *> RootVL, const DataLayout &DL,		VLOperands(Instruction &VL0, ArrayRef<Value *> RootVL, const DataLayout &DL,
ScalarEvolution &SE, const BoUpSLP &R)		ScalarEvolution &SE, const BoUpSLP &R)
: DL(DL), SE(SE), R(R) {		: DL(DL), SE(SE), R(R), VL0(VL0) {
// Append all the operands of RootVL.		// Append all the operands of RootVL.
appendOperandsOfVL(RootVL);		appendOperandsOfVL(RootVL);
		// PowerOf2Ceil(distance between the last instrcution and the first
		// instruction in the array of scalars).
		NumLanes = PowerOf2Ceil(
		std::distance(RootVL.begin(), find_if(reverse(RootVL), [](Value *V) {
		return !isa<UndefValue>(V);
		}).base()));
}		}

/// \Returns a value vector with the operands across all lanes for the		/// \Returns a value vector with the operands across all lanes for the
/// opearnd at \p OpIdx.		/// opearnd at \p OpIdx.
ValueList getVL(unsigned OpIdx) const {		ValueList getVL(unsigned OpIdx) const {
ValueList OpVL(OpsVec[OpIdx].size());		ValueList OpVL(OpsVec[OpIdx].size());
assert(OpsVec[OpIdx].size() == getNumLanes() &&		assert(std::all_of(std::next(OpsVec[OpIdx].begin(), getNumLanes()),
		OpsVec[OpIdx].end(),
		[](const OperandData &Data) {
		return isa<UndefValue>(Data.V);
		}) &&
"Expected same num of lanes across all operands");		"Expected same num of lanes across all operands");
for (unsigned Lane = 0, Lanes = getNumLanes(); Lane != Lanes; ++Lane)		for (unsigned Lane = 0, Lanes = OpsVec[OpIdx].size(); Lane != Lanes;
		++Lane)
OpVL[Lane] = OpsVec[OpIdx][Lane].V;		OpVL[Lane] = OpsVec[OpIdx][Lane].V;
return OpVL;		return OpVL;
}		}

// Performs operand reordering for 2 or more operands.		// Performs operand reordering for 2 or more operands.
// The original operands are in OrigOps[OpIdx][Lane].		// The original operands are in OrigOps[OpIdx][Lane].
// The reordered operands are returned in 'SortedOps[OpIdx][Lane]'.		// The reordered operands are returned in 'SortedOps[OpIdx][Lane]'.
void reorder() {		void reorder() {
unsigned NumOperands = getNumOperands();		unsigned NumOperands = getNumOperands();
unsigned NumLanes = getNumLanes();		unsigned NumLanes = getNumLanes();
// Each operand has its own mode. We are using this mode to help us select		// Each operand has its own mode. We are using this mode to help us select
// the instructions for each lane, so that they match best with the ones		// the instructions for each lane, so that they match best with the ones
// we have selected so far.		// we have selected so far.
SmallVector<ReorderingMode, 2> ReorderingModes(NumOperands);		SmallVector<ReorderingMode, 2> ReorderingModes(NumOperands,
		ReorderingMode::Unknown);

// This is a greedy single-pass algorithm. We are going over each lane		// This is a greedy single-pass algorithm. We are going over each lane
// once and deciding on the best order right away with no back-tracking.		// once and deciding on the best order right away with no back-tracking.
// However, in order to increase its effectiveness, we start with the lane		// However, in order to increase its effectiveness, we start with the lane
// that has operands that can move the least. For example, given the		// that has operands that can move the least. For example, given the
// following lanes:		// following lanes:
// Lane 0 : A[0] = B[0] + C[0] // Visited 3rd		// Lane 0 : A[0] = B[0] + C[0] // Visited 3rd
// Lane 1 : A[1] = C[1] - B[1] // Visited 1st		// Lane 1 : A[1] = C[1] - B[1] // Visited 1st
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	void reorder() {
if (!StrategyFailed)		if (!StrategyFailed)
break;		break;
}		}
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD static StringRef getModeStr(ReorderingMode RMode) {		LLVM_DUMP_METHOD static StringRef getModeStr(ReorderingMode RMode) {
switch (RMode) {		switch (RMode) {
		case ReorderingMode::Unknown:
		return "Unknown";
case ReorderingMode::Load:		case ReorderingMode::Load:
return "Load";		return "Load";
case ReorderingMode::Opcode:		case ReorderingMode::Opcode:
return "Opcode";		return "Opcode";
case ReorderingMode::Constant:		case ReorderingMode::Constant:
return "Constant";		return "Constant";
case ReorderingMode::Splat:		case ReorderingMode::Splat:
return "Splat";		return "Splat";
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	#endif
void eraseInstructions(ArrayRef<Value *> AV);		void eraseInstructions(ArrayRef<Value *> AV);

~BoUpSLP();		~BoUpSLP();

private:		private:
/// Checks if all users of \p I are the part of the vectorization tree.		/// Checks if all users of \p I are the part of the vectorization tree.
bool areAllUsersVectorized(Instruction *I) const;		bool areAllUsersVectorized(Instruction *I) const;

		/// Gets most optimial vectorization factor for the tree entry.
		/// \param UserVFs Vectorization factors of the user nodes.
		/// \param IE The starting node when trying to get the vectorization factor.
		/// Required to stop correctly inside of loops, if we have PHI instructions.
		unsigned getEntryVF(const TreeEntry *E, SmallSet<unsigned, 4> &UserVFs,
		const TreeEntry *IE);

/// \returns the cost of the vectorizable entry.		/// \returns the cost of the vectorizable entry.
InstructionCost getEntryCost(TreeEntry *E);		InstructionCost getEntryCost(TreeEntry *E);

/// This is the recursive part of buildTree.		/// This is the recursive part of buildTree.
void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth,		void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth,
const EdgeInfo &EI);		const EdgeInfo &EI);

/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can		/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can
/// be vectorized to use the original vector (or aggregate "bitcast" to a		/// be vectorized to use the original vector (or aggregate "bitcast" to a
/// vector) and sets \p CurrentOrder to the identity permutation; otherwise		/// vector) and sets \p CurrentOrder to the identity permutation; otherwise
/// returns false, setting \p CurrentOrder to either an empty vector or a		/// returns false, setting \p CurrentOrder to either an empty vector or a
/// non-identity permutation that allows to reuse extract instructions.		/// non-identity permutation that allows to reuse extract instructions.
bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,		bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,
SmallVectorImpl<unsigned> &CurrentOrder) const;		SmallVectorImpl<unsigned> &CurrentOrder) const;

/// Vectorize a single entry in the tree.		/// Vectorize a single entry in the tree.
Value vectorizeTree(TreeEntry E);		Value vectorizeTree(TreeEntry E);

/// Vectorize a single entry in the tree, starting in \p VL.		/// Vectorize a single entry in the tree, starting in \p VL and for
Value vectorizeTree(ArrayRef<Value > VL);		/// vectorization factor \p VF.
		Value vectorizeTree(ArrayRef<Value > VL, unsigned VF);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars.		/// context means the creation of vectors from a group of scalars.
InstructionCost		/// \param NeedToShuffle true, if need to shuffle the resulting gather instead
getGatherCost(FixedVectorType *Ty,		/// of inserting same scalars several times.
const DenseSet<unsigned> &ShuffledIndices) const;		InstructionCost getGatherCost(FixedVectorType *Ty,
		const DenseSet<unsigned> &ShuffledIndices,
		bool NeedToShuffle) const;

/// \returns the scalarization cost for this list of values. Assuming that		/// \returns the scalarization cost for this list of values. Assuming that
/// this subtree gets vectorized, we may need to extract the values from the		/// this subtree gets vectorized, we may need to extract the values from the
/// roots. This method calculates the cost of extracting the values.		/// roots. This method calculates the cost of extracting the values.
InstructionCost getGatherCost(ArrayRef<Value *> VL) const;		InstructionCost getGatherCost(ArrayRef<Value *> VL, unsigned VF) const;

/// Set the Builder insert point to one after the last instruction in		/// Set the Builder insert point to one after the last instruction in
/// the bundle		/// the bundle
void setInsertPointAfterBundle(TreeEntry *E);		void setInsertPointAfterBundle(TreeEntry *E);

/// \returns a vector from a collection of scalars in \p VL.		/// \returns a vector from a collection of scalars in \p VL.
Value gather(ArrayRef<Value > VL);		Value gather(ArrayRef<Value > VL);

/// \returns whether the VectorizableTree is fully vectorizable and will		/// \returns whether the VectorizableTree is fully vectorizable and will
/// be beneficial even the tree height is tiny.		/// be beneficial even the tree height is tiny.
bool isFullyVectorizableTinyTree() const;		bool isFullyVectorizableTinyTree() const;

/// Reorder commutative or alt operands to get better probability of		/// Reorder commutative or alt operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		static void reorderInputsAccordingToOpcode(
SmallVectorImpl<Value *> &Left,		Instruction &VL0, ArrayRef<Value > VL, SmallVectorImpl<Value > &Left,
SmallVectorImpl<Value *> &Right,		SmallVectorImpl<Value *> &Right, const DataLayout &DL,
const DataLayout &DL,		ScalarEvolution &SE, const BoUpSLP &R);
ScalarEvolution &SE,
const BoUpSLP &R);
struct TreeEntry {		struct TreeEntry {
using VecTreeTy = SmallVector<std::unique_ptr<TreeEntry>, 8>;		using VecTreeTy = SmallVector<std::unique_ptr<TreeEntry>, 8>;
TreeEntry(VecTreeTy &Container) : Container(Container) {}		TreeEntry(VecTreeTy &Container) : Container(Container) {}

/// \returns true if the scalars in VL are equal to this entry.		/// \returns true if the scalars in VL are equal to this entry. The scalars
		/// in VL are equal to this entry if it contains the same scalars(or udefs)
		/// on the same places.
bool isSame(ArrayRef<Value *> VL) const {		bool isSame(ArrayRef<Value *> VL) const {
if (VL.size() == Scalars.size())		if (!ReuseShuffleIndices.empty()) {
return std::equal(VL.begin(), VL.end(), Scalars.begin());		for (int I = 0, E = VL.size(); I < E; ++I) {
return VL.size() == ReuseShuffleIndices.size() &&		int Idx = ReuseShuffleIndices[I];
std::equal(		if (Idx == E) {
VL.begin(), VL.end(), ReuseShuffleIndices.begin(),		if (!isa<UndefValue>(VL[I]))
[this](Value *V, int Idx) { return V == Scalars[Idx]; });		return false;
		continue;
		}
		if (VL[I] != Scalars[Idx] && !isa<UndefValue>(VL[I]))
		return false;
		}
		return true;
		}
		for (int I = 0, E = VL.size(); I < E; ++I)
		if (VL[I] != Scalars[I] && !isa<UndefValue>(VL[I]))
		return false;
		return true;
}		}

/// A vector of scalars.		/// A vector of scalars.
ValueList Scalars;		ValueList Scalars;

/// The Scalars are vectorized into this value. It is initialized to Null.		/// The Scalars are vectorized into this value. It is initialized to Null.
Value *VectorizedValue = nullptr;		Value *VectorizedValue = nullptr;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void setOperand(unsigned OpIdx, ArrayRef<Value *> OpVL) {
Operands.resize(OpIdx + 1);		Operands.resize(OpIdx + 1);
assert(Operands[OpIdx].size() == 0 && "Already resized?");		assert(Operands[OpIdx].size() == 0 && "Already resized?");
Operands[OpIdx].resize(Scalars.size());		Operands[OpIdx].resize(Scalars.size());
for (unsigned Lane = 0, E = Scalars.size(); Lane != E; ++Lane)		for (unsigned Lane = 0, E = Scalars.size(); Lane != E; ++Lane)
Operands[OpIdx][Lane] = OpVL[Lane];		Operands[OpIdx][Lane] = OpVL[Lane];
}		}

/// Set the operands of this bundle in their original order.		/// Set the operands of this bundle in their original order.
void setOperandsInOrder() {		void setOperandsInOrder(Instruction *I0) {
assert(Operands.empty() && "Already initialized?");		assert(Operands.empty() && "Already initialized?");
auto *I0 = cast<Instruction>(Scalars[0]);
Operands.resize(I0->getNumOperands());		Operands.resize(I0->getNumOperands());
unsigned NumLanes = Scalars.size();		unsigned NumLanes = Scalars.size();
for (unsigned OpIdx = 0, NumOperands = I0->getNumOperands();		for (unsigned OpIdx = 0, NumOperands = I0->getNumOperands();
OpIdx != NumOperands; ++OpIdx) {		OpIdx != NumOperands; ++OpIdx) {
Operands[OpIdx].resize(NumLanes);		Operands[OpIdx].resize(NumLanes);
for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {		for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
		if (isa<UndefValue>(Scalars[Lane])) {
		Operands[OpIdx][Lane] =
		UndefValue::get(I0->getOperand(OpIdx)->getType());
		continue;
		}
auto *I = cast<Instruction>(Scalars[Lane]);		auto *I = cast<Instruction>(Scalars[Lane]);
assert(I->getNumOperands() == NumOperands &&		assert(I->getNumOperands() == NumOperands &&
"Expected same number of operands");		"Expected same number of operands");
Operands[OpIdx][Lane] = I->getOperand(OpIdx);		Operands[OpIdx][Lane] = I->getOperand(OpIdx);
}		}
}		}
}		}

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
unsigned getAltOpcode() const {		unsigned getAltOpcode() const {
return AltOp ? AltOp->getOpcode() : 0;		return AltOp ? AltOp->getOpcode() : 0;
}		}

/// Update operations state of this entry if reorder occurred.		/// Update operations state of this entry if reorder occurred.
bool updateStateIfReorder() {		bool updateStateIfReorder() {
if (ReorderIndices.empty())		if (ReorderIndices.empty())
return false;		return false;
InstructionsState S = getSameOpcode(Scalars, ReorderIndices.front());		unsigned Size = Scalars.size();
		InstructionsState S =
		getSameOpcode(Scalars, *find_if(ReorderIndices, [Size](unsigned Idx) {
		return Idx < Size;
		}));
setOperations(S);		setOperations(S);
return true;		return true;
}		}

#ifndef NDEBUG		#ifndef NDEBUG
/// Debug printer.		/// Debug printer.
LLVM_DUMP_METHOD void dump() const {		LLVM_DUMP_METHOD void dump() const {
dbgs() << Idx << ".\n";		dbgs() << Idx << ".\n";
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	TreeEntry newTreeEntry(ArrayRef<Value > VL,
TreeEntry *Last = VectorizableTree.back().get();		TreeEntry *Last = VectorizableTree.back().get();
Last->Idx = VectorizableTree.size() - 1;		Last->Idx = VectorizableTree.size() - 1;
Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());		Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->State = EntryState;		Last->State = EntryState;
Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),		Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
ReuseShuffleIndices.end());		ReuseShuffleIndices.end());
Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());		Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());
Last->setOperations(S);		Last->setOperations(S);
		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
if (Last->State != TreeEntry::NeedToGather) {		if (Last->State != TreeEntry::NeedToGather) {
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
assert(!getTreeEntry(V) && "Scalar already in tree!");		assert(!getTreeEntry(V) && "Scalar already in tree!");
ScalarToTreeEntry[V] = Last;		ScalarToTreeEntry[V] = Last;
}		}
// Update the scheduler bundle to point to this TreeEntry.		// Update the scheduler bundle to point to this TreeEntry.
unsigned Lane = 0;		unsigned Lane = 0;
for (ScheduleData *BundleMember = Bundle.getValue(); BundleMember;		for (ScheduleData *BundleMember = Bundle.getValue(); BundleMember;
BundleMember = BundleMember->NextInBundle) {		BundleMember = BundleMember->NextInBundle) {
BundleMember->TE = Last;		BundleMember->TE = Last;
BundleMember->Lane = Lane;		BundleMember->Lane = Lane;
++Lane;		++Lane;
}		}
assert((!Bundle.getValue() \|\| Lane == VL.size()) &&		assert((!Bundle.getValue() \|\|
		Lane == std::distance(InstructionsOnly.begin(),
		InstructionsOnly.end())) &&
"Bundle and VL out of sync");		"Bundle and VL out of sync");
} else {		} else {
MustGather.insert(VL.begin(), VL.end());		MustGather.insert(InstructionsOnly.begin(), InstructionsOnly.end());
}		}

if (UserTreeIdx.UserTE)		if (UserTreeIdx.UserTE)
Last->UserTreeIndices.push_back(UserTreeIdx);		Last->UserTreeIndices.push_back(UserTreeIdx);

return Last;		return Last;
}		}

Show All 18 Lines	#endif
}		}

/// Maps a specific scalar to its tree entry.		/// Maps a specific scalar to its tree entry.
SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;		SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;

/// Maps a value!to the proposed vectorizable size.		/// Maps a value!to the proposed vectorizable size.
SmallDenseMap<Value *, unsigned> InstrElementSize;		SmallDenseMap<Value *, unsigned> InstrElementSize;

		/// Vectorization factors for tree entries.
		SmallDenseMap<const TreeEntry *, unsigned> EntryVFs;

/// A list of scalars that we found that we need to keep as scalars.		/// A list of scalars that we found that we need to keep as scalars.
ValueSet MustGather;		ValueSet MustGather;

		/// A list of loads to be gathered during the vectorization process. We can
		/// try to vectorize them at the end, if profitable.
		SmallVector<LoadInst *, 4> GatheredLoads;
		/// The index of the first gathered load entry in the VectorizeTree.
		int GatheredLoadsEntriesFirst = -1;

/// This POD struct describes one external user in the vectorized tree.		/// This POD struct describes one external user in the vectorized tree.
struct ExternalUser {		struct ExternalUser {
ExternalUser(Value S, llvm::User U, int L)		ExternalUser(Value S, llvm::User U, int L)
: Scalar(S), User(U), Lane(L) {}		: Scalar(S), User(U), Lane(L) {}

// Which scalar in our function.		// Which scalar in our function.
Value *Scalar;		Value *Scalar;

▲ Show 20 Lines • Show All 655 Lines • ▼ Show 20 Lines
void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ExtraValueToDebugLocsMap &ExternallyUsedValues,		ExtraValueToDebugLocsMap &ExternallyUsedValues,
ArrayRef<Value *> UserIgnoreLst) {		ArrayRef<Value *> UserIgnoreLst) {
deleteTree();		deleteTree();
UserIgnoreList = UserIgnoreLst;		UserIgnoreList = UserIgnoreLst;
if (!allSameType(Roots))		if (!allSameType(Roots))
return;		return;
buildTree_rec(Roots, 0, EdgeInfo());		buildTree_rec(Roots, 0, EdgeInfo());
		// Try to vectorize gathered loads.
		if (!GatheredLoads.empty() && !isTreeTinyAndNotFullyVectorizable()) {
		GatheredLoadsEntriesFirst = VectorizableTree.size();
		SmallDenseMap<LoadInst , Value , 8> GatherPointers;
		for (LoadInst *LI : GatheredLoads)
		GatherPointers.try_emplace(LI,
		getUnderlyingObject(LI->getPointerOperand()));

		// Sort by type, base pointers and parents.
		auto &&LoadSorter = [&GatherPointers](LoadInst V, LoadInst V2) {
		return V->getParent() < V2->getParent() \|\|
		(V->getParent() == V2->getParent() &&
		V->getPointerOperand()->getType() <
		V2->getPointerOperand()->getType()) \|\|
		(V->getParent() == V2->getParent() &&
		V->getPointerOperand()->getType() ==
		V2->getPointerOperand()->getType() &&
		GatherPointers[V] < GatherPointers[V2]);
		};

		llvm::stable_sort(GatheredLoads, LoadSorter);

		// Try to vectorize elements based on their types, bases and parents.
		for (auto IncIt = GatheredLoads.begin(), E = GatheredLoads.end();
		IncIt != E;) {

		// Look for the next elements with the same type.
		auto *SameTypeIt = IncIt;
		Type EltTy = (IncIt)->getPointerOperand()->getType();
		Value Ptr = GatherPointers[IncIt];

		SetVector<LoadInst *> Set(IncIt, SameTypeIt);
		while (SameTypeIt != E &&
		(SameTypeIt)->getParent() == (IncIt)->getParent() &&
		(*SameTypeIt)->getPointerOperand()->getType() == EltTy &&
		Ptr == GatherPointers[*SameTypeIt]) {
		if (!getTreeEntry(*SameTypeIt))
		Set.insert(*SameTypeIt);
		++SameTypeIt;
		}

		ArrayRef<LoadInst *> Loads = Set.getArrayRef();
		int NumElts = Loads.size();
		if (NumElts >= 3 \|\| (NumElts == 2 && all_of(Loads, [](LoadInst *LI) {
		return LI->hasOneUse();
		}))) {
		SmallVector<Value *, 4> Pointers(NumElts);
		for (int I = 0; I < NumElts; ++I)
		Pointers[I] = Loads[I]->getPointerOperand();
		SmallVector<unsigned, 4> SortedIndicies;
		if (sortPtrAccesses(Pointers, DL, SE, SortedIndicies)) {
		if (SortedIndicies.empty()) {
		SortedIndicies.assign(NumElts, 0);
		std::iota(SortedIndicies.begin(), SortedIndicies.end(), 0);
		}
		Optional<int> Diff =
		getPointersDiff(Pointers[SortedIndicies.front()],
		Pointers[SortedIndicies.back()], DL, SE);
		int MaxLoads = std::max(getMaxVecRegSize() / DL->getTypeSizeInBits(
		Loads[0]->getType()),
		Roots.size()) *
		(NumElts >= 4 ? 1 : 2);
		if (Diff && *Diff < MaxLoads) {
		SmallVector<Value *, 4> Values(
		PowerOf2Ceil(Diff + 1), UndefValue::get((IncIt)->getType()));
		// Sort loads.
		Values[0] = Loads[SortedIndicies.front()];
		for (int I = 1; I < NumElts; ++I) {
		Optional<int> Diff =
		getPointersDiff(Pointers[SortedIndicies.front()],
		Pointers[SortedIndicies[I]], DL, SE);
		Values[*Diff] = Loads[SortedIndicies[I]];
		}
		LLVM_DEBUG(dbgs() << "SLP: Trying to vectorize gathered loads ("
		<< NumElts << ")\n");

		buildTree_rec(Values, 0, EdgeInfo());
		}
		}
		}

		// Start over at the next instruction of a different type (or the end).
		IncIt = SameTypeIt;
		}
		}

// Collect the values that we need to extract from the tree.		// Collect the values that we need to extract from the tree.
for (auto &TEPtr : VectorizableTree) {		for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();		TreeEntry *Entry = TEPtr.get();

// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

// For each lane:		// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];
		if (isa<UndefValue>(Scalar))
		continue;
int FoundLane = Lane;		int FoundLane = Lane;
if (!Entry->ReuseShuffleIndices.empty()) {		if (!Entry->ReuseShuffleIndices.empty()) {
FoundLane =		FoundLane =
std::distance(Entry->ReuseShuffleIndices.begin(),		std::distance(Entry->ReuseShuffleIndices.begin(),
llvm::find(Entry->ReuseShuffleIndices, FoundLane));		llvm::find(Entry->ReuseShuffleIndices, FoundLane));
}		}

// Check if the scalar is externally used as an extra arg.		// Check if the scalar is externally used as an extra arg.
auto ExtI = ExternallyUsedValues.find(Scalar);		auto ExtI = ExternallyUsedValues.find(Scalar);
if (ExtI != ExternallyUsedValues.end()) {		if (ExtI != ExternallyUsedValues.end()) {
LLVM_DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane "		LLVM_DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane "
<< Lane << " from " << *Scalar << ".\n");		<< Lane << " from " << *Scalar << ".\n");
ExternalUses.emplace_back(Scalar, nullptr, FoundLane);		ExternalUses.emplace_back(Scalar, nullptr, FoundLane);
}		}
for (User *U : Scalar->users()) {		for (User *U : Scalar->users()) {
LLVM_DEBUG(dbgs() << "SLP: Checking user:" << *U << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Checking user:" << *U << ".\n");

Instruction *UserInst = dyn_cast<Instruction>(U);		Instruction *UserInst = dyn_cast<Instruction>(U);
if (!UserInst)		if (!UserInst)
continue;		continue;

// Skip in-tree scalars that become vectors		// Skip in-tree scalars that become vectors
if (TreeEntry *UseEntry = getTreeEntry(U)) {		if (TreeEntry *UseEntry = getTreeEntry(U)) {
Value *UseScalar = UseEntry->Scalars[0];		auto *It = llvm::find_if(UseEntry->Scalars, Instruction::classof);
		assert(It != UseEntry->Scalars.end() &&
		"At least single instruction is expected.");
		Value UseScalar = It;
// Some in-tree scalars will remain as scalar in vectorized		// Some in-tree scalars will remain as scalar in vectorized
// instructions. If that is the case, the one in Lane 0 will		// instructions. If that is the case, the one in the first lane will
// be used.		// be used.
if (UseScalar != U \|\|		if (UseScalar != U \|\|
UseEntry->State == TreeEntry::ScatterVectorize \|\|		UseEntry->State == TreeEntry::ScatterVectorize \|\|
!InTreeUserNeedToExtract(Scalar, UserInst, TLI)) {		!InTreeUserNeedToExtract(Scalar, UserInst, TLI)) {
LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U		LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U
<< ".\n");		<< ".\n");
assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");		assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
continue;		continue;
}		}
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions Could it be actually no one `Instruction` in `UseEntry` or should it be assert? anton-afanasyev: Could it be actually no one `Instruction` in `UseEntry` or should it be assert?
}		}

// Ignore users in the user ignore list.		// Ignore users in the user ignore list.
if (is_contained(UserIgnoreList, UserInst))		if (is_contained(UserIgnoreList, UserInst))
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions "Lane 0" seems outdated here, but not sure about better description. anton-afanasyev: "Lane 0" seems outdated here, but not sure about better description.
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "		LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "
<< Lane << " from " << *Scalar << ".\n");		<< Lane << " from " << *Scalar << ".\n");
ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));		ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));
}		}
}		}
}		}
}		}

		/// Tries to find subvector of loads and builds new vector of only loads if can
		/// be profitable.
		static void
		gatherPossiblyVectorizableLoads(const BoUpSLP &R, ArrayRef<Value *> VL,
		SmallVectorImpl<LoadInst *> &GatheredLoads) {
		for (Value *V : VL) {
		if (auto *LI = dyn_cast<LoadInst>(V))
		if (!R.isDeleted(LI))
		GatheredLoads.push_back(LI);
		}
		}

		/// Checks if the mask is uniforms, i.e. consequent and/or with some undefs.
		template <typename T> static bool isUniform(const T &Mask) {
		for (typename T::value_type I = 0, E = Mask.size(); I < E; ++I) {
		if (Mask[I] != I && Mask[I] != E &&
		Mask[I] != static_cast<typename T::value_type>(UndefMaskElem))
		return false;
		}
		return true;
		}

void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,		void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
const EdgeInfo &UserTreeIdx) {		const EdgeInfo &UserTreeIdx) {
assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");		assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");

InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (Depth == RecursionMaxDepth) {		if (Depth == RecursionMaxDepth) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
Show All 9 Lines	void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,

if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))		if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))
if (SI->getValueOperand()->getType()->isVectorTy()) {		if (SI->getValueOperand()->getType()->isVectorTy()) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

		auto InitialInstructionsOnly = make_filter_range(VL, Instruction::classof);
// If all of the operands are identical or constant we have a simple solution.		// If all of the operands are identical or constant we have a simple solution.
if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !S.getOpcode()) {		if (allConstant(VL) \|\| isSplat(VL) \|\|
		!allSameBlock(InitialInstructionsOnly) \|\| !S.getOpcode()) {
		gatherPossiblyVectorizableLoads(*this, VL, GatheredLoads);
LLVM_DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

// We now know that this is a vector of instructions of the same type from		// We now know that this is a vector of instructions of the same type from
// the same block.		// the same block.

// Don't vectorize ephemeral values.		// Don't vectorize ephemeral values.
for (Value *V : VL) {		for (Value *V : InitialInstructionsOnly) {
if (EphValues.count(V)) {		if (EphValues.count(V)) {
LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V		LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V
<< ") is ephemeral.\n");		<< ") is ephemeral.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// Check if this is a duplicate of another entry.		// Check if this is a duplicate of another entry.
if (TreeEntry *E = getTreeEntry(S.OpValue)) {		if (TreeEntry *E = getTreeEntry(S.OpValue)) {
LLVM_DEBUG(dbgs() << "SLP: \tChecking bundle: " << *S.OpValue << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tChecking bundle: " << *S.OpValue << ".\n");
if (!E->isSame(VL)) {		if (!E->isSame(VL)) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
// Record the reuse of the tree node. FIXME, currently this is only used to		// Record the reuse of the tree node. FIXME, currently this is only used to
// properly draw the graph rather than for the actual vectorization.		// properly draw the graph rather than for the actual vectorization.
E->UserTreeIndices.push_back(UserTreeIdx);		E->UserTreeIndices.push_back(UserTreeIdx);
LLVM_DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *S.OpValue		LLVM_DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *S.OpValue
<< ".\n");		<< ".\n");
return;		return;
}		}

// Check that none of the instructions in the bundle are already in the tree.		// Check that none of the instructions in the bundle are already in the tree.
for (Value *V : VL) {		for (Value *V : InitialInstructionsOnly) {
		RKSimonUnsubmitted Done Reply Inline Actions Can we use for (Value V : make_filter_range(VL, Instruction::classof) ? RKSimon:* Can we use for (Value *V : make_filter_range(VL, Instruction::classof) ?
auto *I = dyn_cast<Instruction>(V);		if (getTreeEntry(V)) {
if (!I)
continue;
if (getTreeEntry(I)) {
LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V		LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V
<< ") is already in tree.\n");		<< ") is already in tree.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// If any of the scalars is marked as a value that needs to stay scalar, then		// The reduction nodes (stored in UserIgnoreList) should stay scalar.
		RKSimonUnsubmitted Done Reply Inline Actions for (Value V : make_filter_range(VL, Instruction::classof) ? RKSimon:* for (Value *V : make_filter_range(VL, Instruction::classof) ?
// we need to gather the scalars.		for (Value *V : InitialInstructionsOnly) {
// The reduction nodes (stored in UserIgnoreList) also should stay scalar.		if (is_contained(UserIgnoreList, V)) {
for (Value *V : VL) {
if (MustGather.count(V) \|\| is_contained(UserIgnoreList, V)) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// Check that all of the users of the scalars that we want to vectorize are		// Check that all of the users of the scalars that we want to vectorize are
// schedulable.		// schedulable.
auto *VL0 = cast<Instruction>(S.OpValue);		auto *VL0 = cast<Instruction>(S.OpValue);
BasicBlock *BB = VL0->getParent();		BasicBlock *BB = VL0->getParent();

if (!DT->isReachableFromEntry(BB)) {		if (!DT->isReachableFromEntry(BB)) {
// Don't go into unreachable blocks. They may contain instructions with		// Don't go into unreachable blocks. They may contain instructions with
// dependency cycles which confuse the final scheduling.		// dependency cycles which confuse the final scheduling.
LLVM_DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");		LLVM_DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

		ArrayRef<Value *> OriginalVL = VL;
// Check that every instruction appears once in this bundle.		// Check that every instruction appears once in this bundle.
SmallVector<unsigned, 4> ReuseShuffleIndicies;		SmallVector<unsigned, 4> ReuseShuffleIndicies;
		RKSimonUnsubmitted Not Done Reply Inline Actions Should ReuseShuffleIndicies be SmallVector<int, 4> - and we then tag undefs with -1 (llvm::UndefMaskElem) ? RKSimon: Should ReuseShuffleIndicies be SmallVector<int, 4> - and we then tag undefs with -1 (llvm…
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, it won't work, need to register actual positions in `ReuseShuffleIndicies`, `-1` does not work here ABataev: No, it won't work, need to register actual positions in `ReuseShuffleIndicies`, `-1` does not…
SmallVector<Value *, 4> UniqueValues;		SmallVector<Value *, 4> UniqueValues;
DenseMap<Value *, unsigned> UniquePositions;		DenseMap<Value *, unsigned> UniquePositions;
		UniqueValues.reserve(VL.size());
		ReuseShuffleIndicies.reserve(VL.size());
		unsigned NumberOfInstructions = 0;
		unsigned UserNumberOfInstructions = 0;
		if (const TreeEntry *UserTE = UserTreeIdx.UserTE)
		UserNumberOfInstructions =
		count_if(UserTE->Scalars, [](Value *V) { return !isa<UndefValue>(V); });
		unsigned Pos = 0;
for (Value *V : VL) {		for (Value *V : VL) {
		if (isa<UndefValue>(V)) {
		ReuseShuffleIndicies.emplace_back(
		Pos < UserNumberOfInstructions ? Pos : VL.size());
		++Pos;
		continue;
		}
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		auto Res = UniquePositions.try_emplace(V, UniqueValues.size());
ReuseShuffleIndicies.emplace_back(Res.first->second);		ReuseShuffleIndicies.emplace_back(Res.first->second);
if (Res.second)		if (Res.second) {
UniqueValues.emplace_back(V);		UniqueValues.emplace_back(V);
		++NumberOfInstructions;
}		}
size_t NumUniqueScalarValues = UniqueValues.size();		++Pos;
if (NumUniqueScalarValues == VL.size()) {		}
		if (NumberOfInstructions == VL.size()) {
ReuseShuffleIndicies.clear();		ReuseShuffleIndicies.clear();
} else {		} else {
LLVM_DEBUG(dbgs() << "SLP: Shuffle for reused scalars.\n");		LLVM_DEBUG(dbgs() << "SLP: Shuffle for reused scalars.\n");
if (NumUniqueScalarValues <= 1 \|\|		if (NumberOfInstructions <= 1) {
!llvm::isPowerOf2_32(NumUniqueScalarValues)) {		gatherPossiblyVectorizableLoads(*this, VL, GatheredLoads);
LLVM_DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");		LLVM_DEBUG(dbgs() << "SLP: Single scalar in bundle"
		<< *UniqueValues.front() << ".\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
		// Check if the reuse shuffle mask is uniform anbd no need to count undefs
		// as real operands.
		if ((UserNumberOfInstructions == 0 \|\|
		UserNumberOfInstructions == NumberOfInstructions) &&
		isUniform(ReuseShuffleIndicies))
		ReuseShuffleIndicies.clear();
		UniqueValues.append(VL.size() - UniqueValues.size(),
		UndefValue::get(VL0->getType()));
VL = UniqueValues;		VL = UniqueValues;
}		}
		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);

auto &BSRef = BlocksSchedules[BB];		auto &BSRef = BlocksSchedules[BB];
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions `assert(NumberOfInstructions != 0 && "...")` and `if (NumberOfInstructions == 1)`? anton-afanasyev: `assert(NumberOfInstructions != 0 && "...")` and `if (NumberOfInstructions == 1)`?
if (!BSRef)		if (!BSRef)
BSRef = std::make_unique<BlockScheduling>(BB);		BSRef = std::make_unique<BlockScheduling>(BB);

BlockScheduling &BS = *BSRef.get();		BlockScheduling &BS = *BSRef.get();

Optional<ScheduleData *> Bundle = BS.tryScheduleBundle(VL, this, S);		Optional<ScheduleData *> Bundle = BS.tryScheduleBundle(VL, this, S);
if (!Bundle) {		if (!Bundle) {
LLVM_DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");		LLVM_DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions what do you think about defining InstructionsOnly in InstructionsState? dtemirbulatov: what do you think about defining InstructionsOnly in InstructionsState?
		ABataevAuthorUnsubmitted Done Reply Inline Actions I don't think it is really required. `InstructrionsOnly` is just a range, not a container ABataev: I don't think it is really required. `InstructrionsOnly` is just a range, not a container
assert((!BS.getScheduleData(VL0) \|\|		assert((!BS.getScheduleData(VL0) \|\|
!BS.getScheduleData(VL0)->isPartOfBundle()) &&		!BS.getScheduleData(VL0)->isPartOfBundle()) &&
"tryScheduleBundle should cancelScheduling on failure");		"tryScheduleBundle should cancelScheduling on failure");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
return;		return;
}		}
LLVM_DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");		LLVM_DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");

unsigned ShuffleOrOp = S.isAltShuffle() ?		unsigned ShuffleOrOp = S.isAltShuffle() ?
(unsigned) Instruction::ShuffleVector : S.getOpcode();		(unsigned) Instruction::ShuffleVector : S.getOpcode();
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
auto *PH = cast<PHINode>(VL0);		auto *PH = cast<PHINode>(VL0);

// Check for terminator values (e.g. invoke).		// Check for terminator values (e.g. invoke).
for (Value *V : VL)		for (Value *V : InstructionsOnly)
for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {		for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {
Instruction *Term = dyn_cast<Instruction>(		auto *Term =
cast<PHINode>(V)->getIncomingValueForBlock(		dyn_cast<Instruction>(cast<PHINode>(V)->getIncomingValueForBlock(
PH->getIncomingBlock(I)));		PH->getIncomingBlock(I)));
if (Term && Term->isTerminator()) {		if (Term && Term->isTerminator()) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Do these trivial style refactors separately now to reduce the size of the patch? RKSimon: Do these trivial style refactors separately now to reduce the size of the patch?
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: Need to swizzle PHINodes (terminator use).\n");		<< "SLP: Need to swizzle PHINodes (terminator use).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
return;		return;
}		}
}		}

TreeEntry *TE =		TreeEntry *TE =
newTreeEntry(VL, Bundle, S, UserTreeIdx, ReuseShuffleIndicies);		newTreeEntry(VL, Bundle, S, UserTreeIdx, ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");

// Keeps the reordered operands to avoid code duplication.		// Keeps the reordered operands to avoid code duplication.
SmallVector<ValueList, 2> OperandsVec;		SmallVector<ValueList, 2> OperandsVec;
for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {		for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Do these trivial style refactors separately now to reduce the size of the patch? RKSimon: Do these trivial style refactors separately now to reduce the size of the patch?
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<PHINode>(V)->getIncomingValueForBlock(		Operands.emplace_back(
		isa<UndefValue>(V) ? UndefValue::get(V->getType())
		: cast<PHINode>(V)->getIncomingValueForBlock(
PH->getIncomingBlock(I)));		PH->getIncomingBlock(I)));
TE->setOperand(I, Operands);		TE->setOperand(I, Operands);
OperandsVec.push_back(Operands);		OperandsVec.push_back(Operands);
}		}
for (unsigned OpIdx = 0, OpE = OperandsVec.size(); OpIdx != OpE; ++OpIdx)		for (unsigned OpIdx = 0, OpE = OperandsVec.size(); OpIdx != OpE; ++OpIdx)
buildTree_rec(OperandsVec[OpIdx], Depth + 1, {TE, OpIdx});		buildTree_rec(OperandsVec[OpIdx], Depth + 1, {TE, OpIdx});
return;		return;
}		}
case Instruction::ExtractValue:		case Instruction::ExtractValue:
Show All 19 Lines	case Instruction::ExtractElement: {
for (unsigned Idx : CurrentOrder)		for (unsigned Idx : CurrentOrder)
dbgs() << " " << Idx;		dbgs() << " " << Idx;
dbgs() << "\n";		dbgs() << "\n";
});		});
// Insert new order with initial value 0, if it does not exist,		// Insert new order with initial value 0, if it does not exist,
// otherwise return the iterator to the existing one.		// otherwise return the iterator to the existing one.
newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies, CurrentOrder);		ReuseShuffleIndicies, CurrentOrder);
		// No need to reorder if still need to shuffle reuses.
		if (ReuseShuffleIndicies.empty()) {
findRootOrder(CurrentOrder);		findRootOrder(CurrentOrder);
++NumOpsWantToKeepOrder[CurrentOrder];		++NumOpsWantToKeepOrder[CurrentOrder];
		} else {
		++NumOpsWantToKeepOriginalOrder;
		}
// This is a special case, as it does not gather, but at the same time		// This is a special case, as it does not gather, but at the same time
// we are not extending buildTree_rec() towards the operands.		// we are not extending buildTree_rec() towards the operands.
ValueList Op0;		ValueList Op0;
Op0.assign(VL.size(), VL0->getOperand(0));		Op0.assign(VL.size(), VL0->getOperand(0));
VectorizableTree.back()->setOperand(0, Op0);		VectorizableTree.back()->setOperand(0, Op0);
return;		return;
}		}
LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");		LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
return;		return;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Check that a vectorized load would load the same memory as a scalar		// Check that a vectorized load would load the same memory as a scalar
// load. For example, we don't want to vectorize loads that are smaller		// load. For example, we don't want to vectorize loads that are smaller
// than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM		// than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM
// treats loading/storing it as an i8 struct. If we vectorize loads/stores		// treats loading/storing it as an i8 struct. If we vectorize loads/stores
// from such a struct, we read/write packed bits disagreeing with the		// from such a struct, we read/write packed bits disagreeing with the
// unvectorized version.		// unvectorized version.
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();

if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
DL->getTypeAllocSizeInBits(ScalarTy)) {		DL->getTypeAllocSizeInBits(ScalarTy)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");
return;		return;
}		}

// Make sure all loads in the bundle are simple - we can't vectorize		// Make sure all loads in the bundle are simple - we can't vectorize
// atomic or volatile loads.		// atomic or volatile loads.
SmallVector<Value *, 4> PointerOps(VL.size());		SmallVector<Value *, 4> PointerOps(NumberOfInstructions);
auto POIter = PointerOps.begin();		OrdersType OriginalOrder(NumberOfInstructions, 0);
for (Value *V : VL) {		auto *POIter = PointerOps.begin();
auto *L = cast<LoadInst>(V);		auto *OOIter = OriginalOrder.begin();
		for (int I = 0, E = VL.size(); I < E; ++I) {
		if (isa<UndefValue>(VL[I]))
		continue;
		auto *L = cast<LoadInst>(VL[I]);
if (!L->isSimple()) {		if (!L->isSimple()) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
return;		return;
}		}
*POIter = L->getPointerOperand();		*POIter = L->getPointerOperand();
++POIter;		++POIter;
		*OOIter = I;
		++OOIter;
}		}

OrdersType CurrentOrder;		OrdersType CurrentOrder;
// Check the order of pointer operands.		// Check the order of pointer operands.
if (llvm::sortPtrAccesses(PointerOps, DL, SE, CurrentOrder)) {		if (llvm::sortPtrAccesses(PointerOps, DL, SE, CurrentOrder)) {
Value *Ptr0;		Value *Ptr0;
Value *PtrN;		Value *PtrN;
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
Ptr0 = PointerOps.front();		Ptr0 = PointerOps.front();
PtrN = PointerOps.back();		PtrN = PointerOps.back();
} else {		} else {
Ptr0 = PointerOps[CurrentOrder.front()];		Ptr0 = PointerOps[CurrentOrder.front()];
PtrN = PointerOps[CurrentOrder.back()];		PtrN = PointerOps[CurrentOrder.back()];
}		}
const SCEV *Scev0 = SE->getSCEV(Ptr0);		Optional<int> Diff = getPointersDiff(Ptr0, PtrN, DL, SE);
const SCEV *ScevN = SE->getSCEV(PtrN);
const auto *Diff =
dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
uint64_t Size = DL->getTypeAllocSize(ScalarTy);
// Check that the sorted loads are consecutive.		// Check that the sorted loads are consecutive.
if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {		int AcceptableDiff = NumberOfInstructions - 1;
		Align CommonAlign = cast<LoadInst>(VL0)->getAlign();
		vdmitrieUnsubmitted Not Done Reply Inline Actions This check is not quite complete. If we for example have following scalars set (VL) 0: load i32 from p[0] 1: load i32 from p[2] 3: undef i32 4: undef i32 (note that p[1] is not loaded) Pointers difference is 8, number of instructions is 2 and VL size is 4: thus 8 <= (4 -1)4 is true but pointers actually not loaded consecutively (although It is vectorizeable via masked load+shuffle but support seems not implemented yet). Similar issue exists for store. vdmitrie:* This check is not quite complete. If we for example have following scalars set (VL) 0: load…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Hmm, see lines 4574-4600 (masked load + shuffle) and 4643-4678 (shuffle + masked store) ABataev: Hmm, see lines 4574-4600 (masked load + shuffle) and 4643-4678 (shuffle + masked store)
		vdmitrieUnsubmitted Not Done Reply Inline Actions Note that two is a power of two. Thus at 4569 it takes path that creates plain load and ends up with loading p[0] + p[1]. And even if we would go masked load+shuffle path that not correct either. Mask and shuffle there being built based on undefs rather than pointer analysis of scalar loads. In order to end up with loading p[0] and p2[] VL should look like: 0: load p[0] 1: undef 2: load p[2] 3: undef vdmitrie: Note that two is a power of two. Thus at 4569 it takes path that creates plain load and ends up…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah, yes. Will check this carefully. ABataev: Ah, yes. Will check this carefully.
		if (!CurrentOrder.empty())
		CommonAlign = cast<LoadInst>(VL[OriginalOrder[CurrentOrder.front()]])
		->getAlign();
		unsigned Sz = DL->getTypeStoreSize(ScalarTy);
		if (Diff && *Diff >= AcceptableDiff &&
		*Diff <= static_cast<int>(VL.size() - 1) &&
		(TTI->isLegalMaskedLoad(
		FixedVectorType::get(ScalarTy, PowerOf2Ceil(*Diff + 1)),
		CommonAlign) \|\|
		isPowerOf2_32(
		std::min(PowerOf2Ceil(*Diff + 1),
		alignTo((Diff + 1) Sz, CommonAlign) / Sz)))) {
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
// Original loads are consecutive and does not require reordering.		if (*Diff == AcceptableDiff && isUniform(OriginalOrder)) {
++NumOpsWantToKeepOriginalOrder;		// Original loads are consecutive and do not require reordering.
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S,
UserTreeIdx, ReuseShuffleIndicies);		UserTreeIdx, ReuseShuffleIndicies);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
		} else {
		OrdersType NormalizedOrder(VL.size(), VL.size());
		for (int I = 0, E = OriginalOrder.size(); I < E; ++I) {
		NormalizedOrder[getPointersDiff(Ptr0, PointerOps[I], DL,
		*SE)] = OriginalOrder[I];
		}
		// Need to extend.
		TreeEntry *TE =
		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
		ReuseShuffleIndicies, NormalizedOrder);
		TE->setOperandsInOrder(VL0);
		}
		// Count orders of non-gathered loads only.
		if ((UserTreeIdx.UserTE \|\| Depth == 0) &&
		!all_of(InstructionsOnly,
		[this](Value *V) { return MustGather.contains(V); }))
		++NumOpsWantToKeepOriginalOrder;
LLVM_DEBUG(dbgs() << "SLP: added a vector of loads.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of loads.\n");
} else {		} else {
		OrdersType NormalizedOrder(VL.size(), VL.size());
		SmallVector<int, 4> Orders(CurrentOrder.size());
		inversePermutation(CurrentOrder, Orders);
		for (int I = 0, E = CurrentOrder.size(); I < E; ++I) {
		NormalizedOrder[getPointersDiff(Ptr0, PointerOps[Orders[I]], DL,
		*SE)] = OriginalOrder[Orders[I]];
		}
// Need to reorder.		// Need to reorder.
TreeEntry *TE =		TreeEntry *TE =
newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies, CurrentOrder);		ReuseShuffleIndicies, NormalizedOrder);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled loads.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled loads.\n");
findRootOrder(CurrentOrder);		// No need to reorder if still need to shuffle reuses.
++NumOpsWantToKeepOrder[CurrentOrder];		if (ReuseShuffleIndicies.empty()) {
		findRootOrder(NormalizedOrder);
		++NumOpsWantToKeepOrder[NormalizedOrder];
		} else {
		++NumOpsWantToKeepOriginalOrder;
		}
}		}
return;		return;
}		}
		Align CommonAlignment = cast<LoadInst>(VL0)->getAlign();
		for (Value *V : InstructionsOnly)
		CommonAlignment =
		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
		if (TTI->isLegalMaskedGather(
		FixedVectorType::get(ScalarTy,
		PowerOf2Ceil(NumberOfInstructions)),
		CommonAlignment)) {
// Vectorizing non-consecutive loads with `llvm.masked.gather`.		// Vectorizing non-consecutive loads with `llvm.masked.gather`.
TreeEntry *TE = newTreeEntry(VL, TreeEntry::ScatterVectorize, Bundle, S,		TreeEntry *TE = newTreeEntry(VL, TreeEntry::ScatterVectorize, Bundle,
UserTreeIdx, ReuseShuffleIndicies);		S, UserTreeIdx, ReuseShuffleIndicies);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
		PointerOps.append(
		VL.size() - NumberOfInstructions,
		UndefValue::get(cast<LoadInst>(VL0)->getPointerOperandType()));
buildTree_rec(PointerOps, Depth + 1, {TE, 0});		buildTree_rec(PointerOps, Depth + 1, {TE, 0});
LLVM_DEBUG(dbgs() << "SLP: added a vector of non-consecutive loads.\n");		LLVM_DEBUG(dbgs()
		<< "SLP: added a vector of non-consecutive loads.\n");
return;		return;
}		}
		}

LLVM_DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
return;		return;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
Type *Ty = cast<Instruction>(V)->getOperand(0)->getType();		Type *Ty = cast<Instruction>(V)->getOperand(0)->getType();
if (Ty != SrcTy \|\| !isValidElementType(Ty)) {		if (Ty != SrcTy \|\| !isValidElementType(Ty)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: Gathering casts with different src types.\n");		<< "SLP: Gathering casts with different src types.\n");
return;		return;
}		}
}		}
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of casts.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of casts.\n");

TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<Instruction>(V)->getOperand(i));		Operands.push_back(isa<UndefValue>(V)
		? UndefValue::get(SrcTy)
		: cast<Instruction>(V)->getOperand(i));

buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
// Check that all of the compares have the same predicate.		// Check that all of the compares have the same predicate.
CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
CmpInst::Predicate SwapP0 = CmpInst::getSwappedPredicate(P0);		CmpInst::Predicate SwapP0 = CmpInst::getSwappedPredicate(P0);
Type *ComparedTy = VL0->getOperand(0)->getType();		Type *ComparedTy = VL0->getOperand(0)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
CmpInst *Cmp = cast<CmpInst>(V);		auto *Cmp = cast<CmpInst>(V);
if ((Cmp->getPredicate() != P0 && Cmp->getPredicate() != SwapP0) \|\|		if ((Cmp->getPredicate() != P0 && Cmp->getPredicate() != SwapP0) \|\|
Cmp->getOperand(0)->getType() != ComparedTy) {		Cmp->getOperand(0)->getType() != ComparedTy) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: Gathering cmp with different predicate.\n");		<< "SLP: Gathering cmp with different predicate.\n");
return;		return;
}		}
}		}

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of compares.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of compares.\n");

ValueList Left, Right;		ValueList Left, Right;
if (cast<CmpInst>(VL0)->isCommutative()) {		if (cast<CmpInst>(VL0)->isCommutative()) {
// Commutative predicate - collect + sort operands of the instructions		// Commutative predicate - collect + sort operands of the instructions
// so that each side is more likely to have the same opcode.		// so that each side is more likely to have the same opcode.
assert(P0 == SwapP0 && "Commutative Predicate mismatch");		assert(P0 == SwapP0 && "Commutative Predicate mismatch");
reorderInputsAccordingToOpcode(VL, Left, Right, DL, SE, *this);		reorderInputsAccordingToOpcode(VL0, VL, Left, Right, DL, SE, this);
} else {		} else {
// Collect operands - commute if it uses the swapped predicate.		// Collect operands - commute if it uses the swapped predicate.
for (Value *V : VL) {		for (Value *V : VL) {
		if (isa<UndefValue>(V)) {
		Left.push_back(UndefValue::get(VL0->getOperand(0)->getType()));
		Right.push_back(UndefValue::get(VL0->getOperand(1)->getType()));
		continue;
		}
auto *Cmp = cast<CmpInst>(V);		auto *Cmp = cast<CmpInst>(V);
Value *LHS = Cmp->getOperand(0);		Value *LHS = Cmp->getOperand(0);
Value *RHS = Cmp->getOperand(1);		Value *RHS = Cmp->getOperand(1);
if (Cmp->getPredicate() != P0)		if (Cmp->getPredicate() != P0)
std::swap(LHS, RHS);		std::swap(LHS, RHS);
Left.push_back(LHS);		Left.push_back(LHS);
Right.push_back(RHS);		Right.push_back(RHS);
}		}
Show All 27 Lines	case Instruction::Xor: {
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of un/bin op.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of un/bin op.\n");

// Sort operands of the instructions so that each side is more likely to		// Sort operands of the instructions so that each side is more likely to
// have the same opcode.		// have the same opcode.
if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {		if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {
ValueList Left, Right;		ValueList Left, Right;
reorderInputsAccordingToOpcode(VL, Left, Right, DL, SE, *this);		reorderInputsAccordingToOpcode(VL0, VL, Left, Right, DL, SE, this);
TE->setOperand(0, Left);		TE->setOperand(0, Left);
TE->setOperand(1, Right);		TE->setOperand(1, Right);
buildTree_rec(Left, Depth + 1, {TE, 0});		buildTree_rec(Left, Depth + 1, {TE, 0});
buildTree_rec(Right, Depth + 1, {TE, 1});		buildTree_rec(Right, Depth + 1, {TE, 1});
return;		return;
}		}

TE->setOperandsInOrder();		SmallVector<ValueList, 2> OperandsVec;
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned I = 0, E = VL0->getNumOperands(); I < E; ++I) {
ValueList Operands;		ValueList Operands;
		Value *DefinedOp = nullptr;
		// Cannot use undef for int div/rem, use the last real value instead.
		if (BinaryOperator::isIntDivRem(ShuffleOrOp)) {
		const auto It = find_if(VL, [I](Value V) {
		return isa<Instruction>(V) &&
		!isa<UndefValue>(cast<Instruction>(V)->getOperand(I));
		});
		if (It != VL.end())
		DefinedOp = cast<Instruction>(*It)->getOperand(I);
		}
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL.slice(
Operands.push_back(cast<Instruction>(V)->getOperand(i));		0, PowerOf2Ceil(std::distance(
		VL.begin(),
buildTree_rec(Operands, Depth + 1, {TE, i});		find_if(reverse(VL), Instruction::classof).base())))) {
		Value *OpV;
		if (isa<UndefValue>(V)) {
		if (BinaryOperator::isIntDivRem(ShuffleOrOp) && DefinedOp)
		OpV = DefinedOp;
		else
		OpV = UndefValue::get(VL0->getOperand(I)->getType());
		} else {
		OpV = cast<Instruction>(V)->getOperand(I);
		if (isa<UndefValue>(OpV) &&
		BinaryOperator::isIntDivRem(ShuffleOrOp) && DefinedOp)
		OpV = DefinedOp;
		}
		Operands.push_back(OpV);
		}
		Operands.append(VL.size() - Operands.size(),
		UndefValue::get(VL0->getOperand(I)->getType()));
		TE->setOperand(I, Operands);
		OperandsVec.push_back(Operands);
}		}
		for (unsigned OpIdx = 0, OpE = OperandsVec.size(); OpIdx != OpE; ++OpIdx)
		buildTree_rec(OperandsVec[OpIdx], Depth + 1, {TE, OpIdx});
return;		return;
		vdmitrieUnsubmitted Not Done Reply Inline Actions Here is the case https://reviews.llvm.org/D75296 is trying to prevent. vdmitrie: Here is the case https://reviews.llvm.org/D75296 is trying to prevent.
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
// We don't combine GEPs with complicated (nested) indexing.		// We don't combine GEPs with complicated (nested) indexing.
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
if (cast<Instruction>(V)->getNumOperands() != 2) {		if (cast<Instruction>(V)->getNumOperands() != 2) {
LLVM_DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");		LLVM_DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
return;		return;
}		}
}		}

// We can't combine several GEPs into one vector if they operate on		// We can't combine several GEPs into one vector if they operate on
// different types.		// different types.
Type *Ty0 = VL0->getOperand(0)->getType();		Type *Ty0 = VL0->getOperand(0)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
Type *CurTy = cast<Instruction>(V)->getOperand(0)->getType();		Type *CurTy = cast<Instruction>(V)->getOperand(0)->getType();
if (Ty0 != CurTy) {		if (Ty0 != CurTy) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: not-vectorizable GEP (different types).\n");		<< "SLP: not-vectorizable GEP (different types).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
return;		return;
}		}
}		}

// We don't combine GEPs with non-constant indexes.		// We don't combine GEPs with non-constant indexes.
Type *Ty1 = VL0->getOperand(1)->getType();		Type *Ty1 = VL0->getOperand(1)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
auto Op = cast<Instruction>(V)->getOperand(1);		auto *Op = cast<Instruction>(V)->getOperand(1);
if (!isa<ConstantInt>(Op) \|\|		if (!isa<ConstantInt>(Op) \|\|
(Op->getType() != Ty1 &&		(Op->getType() != Ty1 &&
Op->getType()->getScalarSizeInBits() >		Op->getType()->getScalarSizeInBits() >
DL->getIndexSizeInBits(		DL->getIndexSizeInBits(
V->getType()->getPointerAddressSpace()))) {		V->getType()->getPointerAddressSpace()))) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: not-vectorizable GEP (non-constant indexes).\n");		<< "SLP: not-vectorizable GEP (non-constant indexes).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
return;		return;
}		}
}		}

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = 2; i < e; ++i) {		for (unsigned i = 0, e = 2; i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<Instruction>(V)->getOperand(i));		Operands.push_back(
		isa<UndefValue>(V)
		? UndefValue::get(VL0->getOperand(i)->getType())
		: cast<Instruction>(V)->getOperand(i));

buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
case Instruction::Store: {		case Instruction::Store: {
// Check if the stores are consecutive or if we need to swizzle them.		// Check if the stores are consecutive or if we need to swizzle them.
llvm::Type *ScalarTy = cast<StoreInst>(VL0)->getValueOperand()->getType();		llvm::Type *ScalarTy = cast<StoreInst>(VL0)->getValueOperand()->getType();
// Avoid types that are padded when being allocated as scalars, while		// Avoid types that are padded when being allocated as scalars, while
// being packed together in a vector (such as i1).		// being packed together in a vector (such as i1).
if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
DL->getTypeAllocSizeInBits(ScalarTy)) {		DL->getTypeAllocSizeInBits(ScalarTy)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering stores of non-packed type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering stores of non-packed type.\n");
return;		return;
}		}
// Make sure all stores in the bundle are simple - we can't vectorize		// Make sure all stores in the bundle are simple - we can't vectorize
// atomic or volatile stores.		// atomic or volatile stores.
SmallVector<Value *, 4> PointerOps(VL.size());		SmallVector<Value *, 4> PointerOps(NumberOfInstructions);
		OrdersType OriginalOrder(NumberOfInstructions, 0);
ValueList Operands(VL.size());		ValueList Operands(VL.size());
auto POIter = PointerOps.begin();		auto POIter = PointerOps.begin();
auto OIter = Operands.begin();		auto OIter = Operands.begin();
for (Value *V : VL) {		auto *OOIter = OriginalOrder.begin();
auto *SI = cast<StoreInst>(V);		for (int I = 0, E = VL.size(); I < E; ++I) {
		if (isa<UndefValue>(VL[I])) {
		*OIter = UndefValue::get(VL0->getOperand(0)->getType());
		++OIter;
		continue;
		}
		auto *SI = cast<StoreInst>(VL[I]);
if (!SI->isSimple()) {		if (!SI->isSimple()) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple stores.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple stores.\n");
return;		return;
}		}
*POIter = SI->getPointerOperand();		*POIter = SI->getPointerOperand();
*OIter = SI->getValueOperand();		*OIter = SI->getValueOperand();
		*OOIter = I;
++POIter;		++POIter;
++OIter;		++OIter;
		++OOIter;
}		}

OrdersType CurrentOrder;		OrdersType CurrentOrder;
		if (!llvm::sortPtrAccesses(PointerOps, DL, SE, CurrentOrder)) {
		BS.cancelScheduling(VL, VL0);
		spatelUnsubmitted Not Done Reply Inline Actions Use isValidElementType() or check for undef directly? I still can't tell from the debug statement exactly what we are guarding against. Should the type check already be here even without this patch? spatel: Use isValidElementType() or check for undef directly? I still can't tell from the debug…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I was just trying to protect the code and try to support it only for simple types at first. There are some doubts that the cost for masked loads/stores is completed and I protected it to make it work only for simple types. I can remove this check if the cost model for masked ops is good enough. ABataev: I was just trying to protect the code and try to support it only for simple types at first.
		RKSimonUnsubmitted Not Done Reply Inline Actions masked load/store costs for constant masks should be good enough now (getScalarizationOverhead should now provide us with a reasonable fallback) RKSimon: masked load/store costs for constant masks should be good enough now (getScalarizationOverhead…
		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
		LLVM_DEBUG(dbgs() << "SLP: Non-consecutive store.\n");
		return;
		}
		vdmitrieUnsubmitted Not Done Reply Inline Actions "Non-consecutive" here is not the actual reason. vdmitrie: "Non-consecutive" here is not the actual reason.
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions Well, "unsortable" or "unprocessable" term would be more precise. But why did we change `if (sortPtrAccesses)...` to opposite condition? This change just duplicate debug output, since we didn't differentiate it. Also I'd prefer to see the same `if-else` structure as for the load case. anton-afanasyev: Well, "unsortable" or "unprocessable" term would be more precise. But why did we change `if…
// Check the order of pointer operands.		// Check the order of pointer operands.
if (llvm::sortPtrAccesses(PointerOps, DL, SE, CurrentOrder)) {
Value *Ptr0;		Value *Ptr0;
		vdmitrieUnsubmitted Not Done Reply Inline Actions If we have for example this sequence: store addr[2] store addr[0] store addr[1] undef then we bypass sorting pointers and end up vectorizing this store sequence with incorrect order. vdmitrie: If we have for example this sequence: store addr[2] store addr[0] store addr[1] undef then…
Value *PtrN;		Value *PtrN;
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
Ptr0 = PointerOps.front();		Ptr0 = PointerOps.front();
PtrN = PointerOps.back();		PtrN = PointerOps.back();
} else {		} else {
Ptr0 = PointerOps[CurrentOrder.front()];		Ptr0 = PointerOps[CurrentOrder.front()];
PtrN = PointerOps[CurrentOrder.back()];		PtrN = PointerOps[CurrentOrder.back()];
}		}
const SCEV *Scev0 = SE->getSCEV(Ptr0);		Optional<int> Dist = getPointersDiff(Ptr0, PtrN, DL, SE);
const SCEV *ScevN = SE->getSCEV(PtrN);
const auto *Diff =
dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
uint64_t Size = DL->getTypeAllocSize(ScalarTy);
// Check that the sorted pointer operands are consecutive.		// Check that the sorted pointer operands are consecutive.
if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {		int NormalizedSize = NumberOfInstructions - 1;
		if (Dist && *Dist >= NormalizedSize &&
		*Dist <= static_cast<int>(VL.size() - 1)) {
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
		TreeEntry *TE;
		if (NumberOfInstructions == VL.size() && isUniform(OriginalOrder)) {
// Original stores are consecutive and does not require reordering.		// Original stores are consecutive and does not require reordering.
++NumOpsWantToKeepOriginalOrder;		TE = newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S,		ReuseShuffleIndicies);
UserTreeIdx, ReuseShuffleIndicies);		} else {
TE->setOperandsInOrder();		// Need to extend.
		OrdersType NormalizedOrder(VL.size(), VL.size());
		for (int I = 0, E = OriginalOrder.size(); I < E; ++I) {
		NormalizedOrder[getPointersDiff(Ptr0, PointerOps[I], DL, *SE)] =
		OriginalOrder[I];
		}
		TE = newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
		ReuseShuffleIndicies, NormalizedOrder);
		}
		TE->setOperandsInOrder(VL0);
buildTree_rec(Operands, Depth + 1, {TE, 0});		buildTree_rec(Operands, Depth + 1, {TE, 0});
		++NumOpsWantToKeepOriginalOrder;
LLVM_DEBUG(dbgs() << "SLP: added a vector of stores.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of stores.\n");
} else {		} else {
		OrdersType NormalizedOrder(VL.size(), VL.size());
		SmallVector<int, 4> Orders(CurrentOrder.size());
		inversePermutation(CurrentOrder, Orders);
		for (int I = 0, E = CurrentOrder.size(); I < E; ++I) {
		NormalizedOrder[getPointersDiff(Ptr0, PointerOps[Orders[I]], DL,
		*SE)] = OriginalOrder[Orders[I]];
		}
TreeEntry *TE =		TreeEntry *TE =
newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies, CurrentOrder);		ReuseShuffleIndicies, NormalizedOrder);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
buildTree_rec(Operands, Depth + 1, {TE, 0});		buildTree_rec(Operands, Depth + 1, {TE, 0});
LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled stores.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled stores.\n");
findRootOrder(CurrentOrder);		// No need to reorder if still need to shuffle reuses.
++NumOpsWantToKeepOrder[CurrentOrder];		if (ReuseShuffleIndicies.empty()) {
		findRootOrder(NormalizedOrder);
		++NumOpsWantToKeepOrder[NormalizedOrder];
		} else {
		++NumOpsWantToKeepOriginalOrder;
}		}
return;
}		}
		return;
}		}

BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Non-consecutive store.\n");		LLVM_DEBUG(dbgs() << "SLP: Non-consecutive store.\n");
return;		return;
}		}
case Instruction::Call: {		case Instruction::Call: {
// Check if the calls are all to the same vectorizable intrinsic or		// Check if the calls are all to the same vectorizable intrinsic or
// library function.		// library function.
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

VFShape Shape = VFShape::get(		VFShape Shape =
*CI, ElementCount::getFixed(static_cast<unsigned int>(VL.size())),		VFShape::get(*CI,
		ElementCount::getFixed(static_cast<unsigned int>(
		PowerOf2Ceil(NumberOfInstructions))),
false /HasGlobalPred/);		false /HasGlobalPred/);
Function VecFunc = VFDatabase(CI).getVectorizedFunction(Shape);		Function VecFunc = VFDatabase(CI).getVectorizedFunction(Shape);

if (!VecFunc && !isTriviallyVectorizable(ID)) {		if (!VecFunc && !isTriviallyVectorizable(ID)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");		LLVM_DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");
return;		return;
}		}
Function *F = CI->getCalledFunction();		Function *F = CI->getCalledFunction();
unsigned NumArgs = CI->getNumArgOperands();		unsigned NumArgs = CI->getNumArgOperands();
SmallVector<Value*, 4> ScalarArgs(NumArgs, nullptr);		SmallVector<Value*, 4> ScalarArgs(NumArgs, nullptr);
for (unsigned j = 0; j != NumArgs; ++j)		for (unsigned j = 0; j != NumArgs; ++j)
if (hasVectorInstrinsicScalarOpd(ID, j))		if (hasVectorInstrinsicScalarOpd(ID, j))
ScalarArgs[j] = CI->getArgOperand(j);		ScalarArgs[j] = CI->getArgOperand(j);
for (Value *V : VL) {		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
CallInst *CI2 = dyn_cast<CallInst>(V);		CallInst *CI2 = dyn_cast<CallInst>(V);
if (!CI2 \|\| CI2->getCalledFunction() != F \|\|		if (!CI2 \|\| CI2->getCalledFunction() != F \|\|
getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|		getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|
(VecFunc &&		(VecFunc &&
VecFunc != VFDatabase(*CI2).getVectorizedFunction(Shape)) \|\|		VecFunc != VFDatabase(*CI2).getVectorizedFunction(Shape)) \|\|
!CI->hasIdenticalOperandBundleSchema(*CI2)) {		!CI->hasIdenticalOperandBundleSchema(*CI2)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);		LLVM_DEBUG(dbgs()
LLVM_DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << V		<< "SLP: mismatched calls:" << CI << "!=" << V << "\n");
<< "\n");
return;		return;
}		}
// Some intrinsics have scalar arguments and should be same in order for		// Some intrinsics have scalar arguments and should be same in order for
// them to be vectorized.		// them to be vectorized.
for (unsigned j = 0; j != NumArgs; ++j) {		for (unsigned j = 0; j != NumArgs; ++j) {
if (hasVectorInstrinsicScalarOpd(ID, j)) {		if (hasVectorInstrinsicScalarOpd(ID, j)) {
Value *A1J = CI2->getArgOperand(j);		Value *A1J = CI2->getArgOperand(j);
if (ScalarArgs[j] != A1J) {		if (ScalarArgs[j] != A1J) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI		LLVM_DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI
<< " argument " << ScalarArgs[j] << "!=" << A1J		<< " argument " << ScalarArgs[j] << "!=" << A1J
<< "\n");		<< "\n");
return;		return;
}		}
}		}
}		}
// Verify that the bundle operands are identical between the two calls.		// Verify that the bundle operands are identical between the two calls.
if (CI->hasOperandBundles() &&		if (CI->hasOperandBundles() &&
!std::equal(CI->op_begin() + CI->getBundleOperandsStartIndex(),		!std::equal(CI->op_begin() + CI->getBundleOperandsStartIndex(),
CI->op_begin() + CI->getBundleOperandsEndIndex(),		CI->op_begin() + CI->getBundleOperandsEndIndex(),
CI2->op_begin() + CI2->getBundleOperandsStartIndex())) {		CI2->op_begin() + CI2->getBundleOperandsStartIndex())) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:"		LLVM_DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:"
<< CI << "!=" << V << '\n');		<< CI << "!=" << V << '\n');
return;		return;
}		}
}		}
		SmallVector<Value *, 4> NormalizedCalls(VL.size(),
		UndefValue::get(CI->getType()));
		copy(VL, NormalizedCalls.begin());
		for (int I = NumberOfInstructions, E = PowerOf2Ceil(NumberOfInstructions);
		I < E; ++I)
		NormalizedCalls[I] = CI;

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {		for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL) {		for (Value *V : NormalizedCalls) {
		if (isa<UndefValue>(V)) {
		Operands.push_back(UndefValue::get(CI->getOperand(i)->getType()));
		continue;
		}
auto *CI2 = cast<CallInst>(V);		auto *CI2 = cast<CallInst>(V);
Operands.push_back(CI2->getArgOperand(i));		Operands.push_back(CI2->getArgOperand(i));
}		}
buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
// If this is not an alternate sequence of opcode like add-sub		// If this is not an alternate sequence of opcode like add-sub
// then do not vectorize this instruction.		// then do not vectorize this instruction.
if (!S.isAltShuffle()) {		if (!S.isAltShuffle()) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");		LLVM_DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");
return;		return;
}		}
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");		LLVM_DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");

// Reorder operands if reordering would enable vectorization.		// Reorder operands if reordering would enable vectorization.
if (isa<BinaryOperator>(VL0)) {		if (isa<BinaryOperator>(VL0)) {
ValueList Left, Right;		ValueList Left, Right;
reorderInputsAccordingToOpcode(VL, Left, Right, DL, SE, *this);		reorderInputsAccordingToOpcode(VL0, VL, Left, Right, DL, SE, this);
TE->setOperand(0, Left);		TE->setOperand(0, Left);
TE->setOperand(1, Right);		TE->setOperand(1, Right);
buildTree_rec(Left, Depth + 1, {TE, 0});		buildTree_rec(Left, Depth + 1, {TE, 0});
buildTree_rec(Right, Depth + 1, {TE, 1});		buildTree_rec(Right, Depth + 1, {TE, 1});
return;		return;
}		}

TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<Instruction>(V)->getOperand(i));		Operands.push_back(
		isa<UndefValue>(V)
		? UndefValue::get(VL0->getOperand(i)->getType())
		: cast<Instruction>(V)->getOperand(i));

buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
default:		default:
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(OriginalVL, None /not vectorized/, S, UserTreeIdx);
ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");
return;		return;
}		}
}		}

unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {		unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {
unsigned N = 1;		unsigned N = 1;
Type *EltTy = T;		Type *EltTy = T;
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if (E0->getOpcode() == Instruction::ExtractValue) {
// Check if load can be rewritten as load of vector.		// Check if load can be rewritten as load of vector.
LoadInst *LI = dyn_cast<LoadInst>(Vec);		LoadInst *LI = dyn_cast<LoadInst>(Vec);
if (!LI \|\| !LI->isSimple() \|\| !LI->hasNUses(VL.size()))		if (!LI \|\| !LI->isSimple() \|\| !LI->hasNUses(VL.size()))
return false;		return false;
} else {		} else {
NElts = cast<FixedVectorType>(Vec->getType())->getNumElements();		NElts = cast<FixedVectorType>(Vec->getType())->getNumElements();
}		}

if (NElts != VL.size())		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
return false;		const unsigned NumOfInstructions =
		std::distance(InstructionsOnly.begin(), InstructionsOnly.end());

// Check that all of the indices extract from the correct offset.		// Check that all of the indices extract from the correct offset.
bool ShouldKeepOrder = true;		bool ShouldKeepOrder = true;
unsigned E = VL.size();		unsigned E = VL.size();
// Assign to all items the initial value E + 1 so we can check if the extract		// Assign to all items the initial value E so we can check if the extract
// instruction index was used already.		// instruction index was used already.
// Also, later we can check that all the indices are used and we have a		// Also, later we can check that all the indices are used and we have a
// consecutive access in the extract instructions, by checking that no		// consecutive access in the extract instructions, by checking that no
// element of CurrentOrder still has value E + 1.		// element of CurrentOrder still has value E.
CurrentOrder.assign(E, E + 1);		CurrentOrder.assign(E, E);
unsigned I = 0;		unsigned I = 0;
for (; I < E; ++I) {		auto II = InstructionsOnly.begin();
		vdmitrieUnsubmitted Not Done Reply Inline Actions What is reasoning for this min? Imagine VL[0] and VL[1] are extracts of two subsequent elements from the same vector of size 2 and VL[2], VL[3] are extracts from another vector (which can even be of different size). NElts will be assigned 2 based on VL[0] while VL size is 4. The for loop at line 3300 will not visit 3th and 4th elements of the VL and final answer turns out "true" which is obviously incorrect as we must gather these extracts. vdmitrie: What is reasoning for this min? Imagine VL[0] and VL[1] are extracts of two subsequent elements…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Good catch, thanks! It is required to handle the case where 2 other elements are actually UndefvVslues. Just need to add a check for this here. ABataev: Good catch, thanks! It is required to handle the case where 2 other elements are actually…
auto *Inst = cast<Instruction>(VL[I]);		for (; I < NumOfInstructions; ++I, ++II) {
		auto Inst = cast<Instruction>(II);
if (Inst->getOperand(0) != Vec)		if (Inst->getOperand(0) != Vec)
break;		break;
Optional<unsigned> Idx = getExtractIndex(Inst);		Optional<unsigned> Idx = getExtractIndex(Inst);
if (!Idx)		if (!Idx)
break;		break;
const unsigned ExtIdx = *Idx;		const unsigned ExtIdx = *Idx;
		if (ExtIdx >= E)
		break;
if (ExtIdx != I) {		if (ExtIdx != I) {
if (ExtIdx >= E \|\| CurrentOrder[ExtIdx] != E + 1)		if (CurrentOrder[ExtIdx] != E)
break;		break;
ShouldKeepOrder = false;		ShouldKeepOrder = false;
CurrentOrder[ExtIdx] = I;		CurrentOrder[ExtIdx] = I;
} else {		} else {
if (CurrentOrder[I] != E + 1)		if (CurrentOrder[I] != E)
break;		break;
CurrentOrder[I] = I;		CurrentOrder[I] = I;
}		}
}		}
if (I < E) {		if (I < NumOfInstructions) {
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions Comment typo: `aggrgate`. anton-afanasyev: Comment typo: `aggrgate`.
CurrentOrder.clear();		CurrentOrder.clear();
return false;		return false;
}		}

return ShouldKeepOrder;		return ShouldKeepOrder;
}		}

bool BoUpSLP::areAllUsersVectorized(Instruction *I) const {		bool BoUpSLP::areAllUsersVectorized(Instruction *I) const {
Show All 30 Lines	if (!CI->isNoBuiltin() && VecFunc) {
// Calculate the cost of the vector library call.		// Calculate the cost of the vector library call.
// If the corresponding vector call is cheaper, return its cost.		// If the corresponding vector call is cheaper, return its cost.
LibCost = TTI->getCallInstrCost(nullptr, VecTy, VecTys,		LibCost = TTI->getCallInstrCost(nullptr, VecTy, VecTys,
TTI::TCK_RecipThroughput);		TTI::TCK_RecipThroughput);
}		}
return {IntrinsicCost, LibCost};		return {IntrinsicCost, LibCost};
}		}

		/// Returns the indecies for the first and the last instructions based on
		/// ordering.
		static std::pair<unsigned, unsigned>
		findMinMaxPos(ArrayRef<unsigned> ReorderedIndicies) {
		unsigned E = ReorderedIndicies.size();
		unsigned Min = E;
		unsigned Max = E;
		for (unsigned I = 0; I < E && (Min == E \|\| Max == E); ++I) {
		if (Min == E && ReorderedIndicies[I] < E)
		Min = I;
		if (Max == E && ReorderedIndicies[E - 1 - I] < E)
		Max = E - 1 - I;
		}
		return std::make_pair(Min, Max);
		}

		unsigned BoUpSLP::getEntryVF(const TreeEntry *E, SmallSet<unsigned, 4> &UserVFs,
		const TreeEntry *IE) {
		auto It = EntryVFs.find(E);
		if (It != EntryVFs.end())
		return It->second;
		auto &&GetVF = [](ArrayRef<Value *> Scalars,
		ArrayRef<unsigned> ReorderIndices,
		unsigned Opcode) -> unsigned {
		// For stores, the vectorization factor is the number of scalars, it is
		// aligned to the minimal/maximal size of the vector register.
		if (Opcode == Instruction::Store)
		return Scalars.size();
		unsigned NumValues =
		std::distance(Scalars.begin(), find_if(reverse(Scalars), [](Value *V) {
		return !isa<UndefValue>(V);
		}).base());
		if (!ReorderIndices.empty()) {
		unsigned MinPos, MaxPos;
		std::tie(MinPos, MaxPos) = findMinMaxPos(ReorderIndices);
		NumValues = std::max(NumValues, MaxPos + 1);
		}

		return PowerOf2Ceil(NumValues);
		};
		unsigned SelfVF = GetVF(E->Scalars, E->ReorderIndices, E->getOpcode());
		bool IsGather = E->State == TreeEntry::NeedToGather;
		EntryVFs.try_emplace(E, IsGather ? 0 : std::min<unsigned>(2, SelfVF));
		unsigned MinVF = E->Scalars.size();
		// Fill users vectorization factors to calculate shuffle cost correctly.
		for (const EdgeInfo &EI : E->UserTreeIndices) {
		if (!EI.UserTE \|\| EI.UserTE == IE)
		continue;
		SmallSet<unsigned, 4> UserUserVFs;
		if (unsigned UserVF = getEntryVF(EI.UserTE, UserUserVFs, IE)) {
		UserVFs.insert(UserVF);
		MinVF = std::max(std::min(MinVF, UserVF), SelfVF);
		}
		}
		if (SelfVF <= 1 \|\|
		(!IsGather && E->getNumOperands() < 1 && !UserVFs.contains(SelfVF)))
		SelfVF = std::max<unsigned>(2, MinVF);
		if (IsGather && SelfVF < MinVF)
		SelfVF = MinVF;
		EntryVFs[E] = SelfVF;
		return SelfVF;
		}

InstructionCost BoUpSLP::getEntryCost(TreeEntry *E) {		InstructionCost BoUpSLP::getEntryCost(TreeEntry *E) {
ArrayRef<Value*> VL = E->Scalars;		ArrayRef<Value*> VL = E->Scalars;

		SmallSet<unsigned, 4> UserVFs;
		// Original vectorization factor.
		unsigned SelfVF = getEntryVF(E, UserVFs, E);
		RKSimonUnsubmitted Not Done Reply Inline Actions can we use llvm::size(InstructionsOnly) ? RKSimon: can we use llvm::size(InstructionsOnly) ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, it does not work, `llvm::size` works only if it can be calculated in `O(1)`. Here it is not, since `InstructionsOnly` may have "holes". ABataev: No, it does not work, `llvm::size` works only if it can be calculated in `O(1)`. Here it is not…
		craig.topperUnsubmitted Done Reply Inline Actions Would using std::distance directly be more clear? You'd have to explicitly write begin()/end() though? craig.topper: Would using std::distance directly be more clear? You'd have to explicitly write begin()/end()…
		unsigned ShuffleVF = SelfVF;
		// Final vectorization factor after shuffling reuses.
		if (!E->ReuseShuffleIndices.empty()) {
		int Limit = VL.size();
		ShuffleVF = std::max<unsigned>(
		SelfVF, PowerOf2Ceil(std::distance(
		E->ReuseShuffleIndices.begin(),
		find_if(reverse(E->ReuseShuffleIndices), [Limit](int I) {
		return I < Limit;
		}).base())));
		}
		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
		const unsigned NumOfInstructions =
		std::distance(InstructionsOnly.begin(), InstructionsOnly.end());
		Value *V0;
Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		FixedVectorType *VecTy;
		FixedVectorType *FinalVecTy;
		if (!llvm::empty(InstructionsOnly)) {
		V0 = *InstructionsOnly.begin();
		if (StoreInst *SI = dyn_cast<StoreInst>(V0))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))		else if (CmpInst *CI = dyn_cast<CmpInst>(V0))
ScalarTy = CI->getOperand(0)->getType();		ScalarTy = CI->getOperand(0)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

// If we have computed a smaller type for the expression, update VecTy so		// If we have computed a smaller type for the expression, update VecTy so
// that the costs will be accurate.		// that the costs will be accurate.
if (MinBWs.count(VL[0]))		auto MinBWI = MinBWs.find(V0);
		if (MinBWI != MinBWs.end()) {
VecTy = FixedVectorType::get(		VecTy = FixedVectorType::get(
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());		IntegerType::get(F->getContext(), MinBWI->second.first), SelfVF);
		FinalVecTy = FixedVectorType::get(
		IntegerType::get(F->getContext(), MinBWI->second.first), ShuffleVF);
		} else {
		VecTy = FixedVectorType::get(ScalarTy, SelfVF);
		FinalVecTy = FixedVectorType::get(ScalarTy, ShuffleVF);
		}
		} else {
		VecTy = FixedVectorType::get(ScalarTy, SelfVF);
		FinalVecTy = FixedVectorType::get(ScalarTy, ShuffleVF);
		}
		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

unsigned ReuseShuffleNumbers = E->ReuseShuffleIndices.size();		unsigned ReuseShuffleNumbers = E->ReuseShuffleIndices.size();
bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
InstructionCost ReuseShuffleCost = 0;		InstructionCost ReuseShuffleCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost =		ReuseShuffleCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy,		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
E->ReuseShuffleIndices);		FinalVecTy, E->ReuseShuffleIndices);
		}
		for (unsigned UserVF : UserVFs) {
		if (UserVF == ShuffleVF)
		continue;
		if (UserVF > ShuffleVF) {
		ReuseShuffleCost +=
		TTI->getShuffleCost(TargetTransformInfo::SK_InsertSubvector,
		FixedVectorType::get(ScalarTy, UserVF), None,
		/Index=/0, FinalVecTy);
		} else {
		ReuseShuffleCost += TTI->getShuffleCost(
		TargetTransformInfo::SK_ExtractSubvector, FinalVecTy, None,
		/Index=/0, FixedVectorType::get(ScalarTy, UserVF));
		}
}		}
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))		if (allConstant(VL))
return 0;		return 0;
if (isSplat(VL)) {		if (isSplat(VL)) {
return ReuseShuffleCost +		return ReuseShuffleCost +
TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, None,		TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, None,
0);		0);
}		}
if (E->getOpcode() == Instruction::ExtractElement &&		if (E->getOpcode() == Instruction::ExtractElement && allSameType(VL) &&
allSameType(VL) && allSameBlock(VL)) {		allSameBlock(InstructionsOnly)) {
SmallVector<int> Mask;		SmallVector<int> Mask;
Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =		Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =
isShuffle(VL, Mask);		NumOfInstructions > 1
if (ShuffleKind.hasValue()) {		? isShuffle(llvm::to_vector<4>(InstructionsOnly), Mask)
		: None;
		if (NumOfInstructions == 1 \|\| ShuffleKind) {
InstructionCost Cost =		InstructionCost Cost =
TTI->getShuffleCost(ShuffleKind.getValue(), VecTy, Mask);		NumOfInstructions > 1
for (auto *V : VL) {		? TTI->getShuffleCost(*ShuffleKind, VecTy, Mask)
		: 0;
		for (Value *V : InstructionsOnly) {
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
if (areAllUsersVectorized(cast<Instruction>(V)) &&		if (areAllUsersVectorized(cast<Instruction>(V)) &&
!ScalarToTreeEntry.count(V)) {		!ScalarToTreeEntry.count(V)) {
auto *IO = cast<ConstantInt>(		auto *EE = cast<ExtractElementInst>(V);
cast<ExtractElementInst>(V)->getIndexOperand());		auto *IO = cast<ConstantInt>(EE->getIndexOperand());
Cost -= TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy,		Cost -= TTI->getVectorInstrCost(Instruction::ExtractElement,
		EE->getVectorOperandType(),
IO->getZExtValue());		IO->getZExtValue());
}		}
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
return ReuseShuffleCost + Cost;		return ReuseShuffleCost + Cost;
}		}
}		}
return ReuseShuffleCost + getGatherCost(VL);		return ReuseShuffleCost + getGatherCost(VL, SelfVF);
}		}
assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
assert(E->getOpcode() && allSameType(VL) && allSameBlock(VL) && "Invalid VL");		assert(E->getOpcode() && allSameType(VL) && allSameBlock(InstructionsOnly) &&
		"Invalid VL");
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI:		case Instruction::PHI:
return 0;		return 0;

case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
// The common cost of removal ExtractElement/ExtractValue instructions +		// The common cost of removal ExtractElement/ExtractValue instructions +
// the cost of shuffles, if required to resuffle the original vector.		// the cost of shuffles, if required to resuffle the original vector.
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
unsigned Idx = 0;		unsigned Idx = 0;
for (unsigned I : E->ReuseShuffleIndices) {		for (unsigned I : E->ReuseShuffleIndices) {
		if (I >= VL.size() \|\| isa<UndefValue>(VL[I]))
		continue;
if (ShuffleOrOp == Instruction::ExtractElement) {		if (ShuffleOrOp == Instruction::ExtractElement) {
auto *IO = cast<ConstantInt>(		auto *EE = cast<ExtractElementInst>(VL[I]);
cast<ExtractElementInst>(VL[I])->getIndexOperand());		auto *IO = cast<ConstantInt>(EE->getIndexOperand());
Idx = IO->getZExtValue();		Idx = IO->getZExtValue();
ReuseShuffleCost -= TTI->getVectorInstrCost(		ReuseShuffleCost -= TTI->getVectorInstrCost(
Instruction::ExtractElement, VecTy, Idx);		Instruction::ExtractElement, EE->getVectorOperandType(), Idx);
} else {		} else {
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
ReuseShuffleCost -= TTI->getVectorInstrCost(		ReuseShuffleCost -= TTI->getVectorInstrCost(
Instruction::ExtractElement, VecTy, Idx);		Instruction::ExtractElement, VecTy, Idx);
++Idx;		++Idx;
}		}
}		}
Idx = ReuseShuffleNumbers;		Idx = ReuseShuffleNumbers;
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
if (ShuffleOrOp == Instruction::ExtractElement) {		if (ShuffleOrOp == Instruction::ExtractElement) {
auto *IO = cast<ConstantInt>(		auto *EE = cast<ExtractElementInst>(V);
cast<ExtractElementInst>(V)->getIndexOperand());		auto *IO = cast<ConstantInt>(EE->getIndexOperand());
Idx = IO->getZExtValue();		Idx = IO->getZExtValue();
		ReuseShuffleCost += TTI->getVectorInstrCost(
		Instruction::ExtractElement, EE->getVectorOperandType(), Idx);
} else {		} else {
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
--Idx;		--Idx;
		ReuseShuffleCost += TTI->getVectorInstrCost(
		Instruction::ExtractElement, VecTy, Idx);
}		}
ReuseShuffleCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, Idx);
}		}
CommonCost = ReuseShuffleCost;		CommonCost = ReuseShuffleCost;
} else if (!E->ReorderIndices.empty()) {		} else if (!E->ReorderIndices.empty()) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
inversePermutation(E->ReorderIndices, NewMask);		inversePermutation(E->ReorderIndices, NewMask);
CommonCost = TTI->getShuffleCost(		CommonCost = TTI->getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);		TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);
}		}
		#ifndef NDEBUG
		OrdersType CurrentOrder;
		bool Reuse = canReuseExtract(VL, VL0, CurrentOrder);
		assert(Reuse && E->ReorderIndices.empty() \|\|
		(!Reuse && CurrentOrder.size() == E->ReorderIndices.size() &&
		std::equal(CurrentOrder.begin(), CurrentOrder.end(),
		E->ReorderIndices.begin())) &&
		"The sequence of extract elements must be reused or shuffled "
		"with the same mask.");
		#endif
for (unsigned I = 0, E = VL.size(); I < E; ++I) {		for (unsigned I = 0, E = VL.size(); I < E; ++I) {
Instruction *EI = cast<Instruction>(VL[I]);		if (isa<UndefValue>(VL[I]))
// If all users are going to be vectorized, instruction can be		continue;
// considered as dead.		auto *EI = cast<Instruction>(VL[I]);
// The same, if have only one user, it will be vectorized for sure.
if (areAllUsersVectorized(EI)) {
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EI->hasOneUse()) {		if (EI->hasOneUse()) {
Instruction *Ext = EI->user_back();		Instruction *Ext = EI->user_back();
if ((isa<SExtInst>(Ext) \|\| isa<ZExtInst>(Ext)) &&		if ((isa<SExtInst>(Ext) \|\| isa<ZExtInst>(Ext)) &&
all_of(Ext->users(),		all_of(Ext->users(),
[](User *U) { return isa<GetElementPtrInst>(U); })) {		[](User *U) { return isa<GetElementPtrInst>(U); })) {
// Use getExtractWithExtendCost() to calculate the cost of		// Use getExtractWithExtendCost() to calculate the cost of
// extractelement/ext pair.		// extractelement/ext pair.
CommonCost -= TTI->getExtractWithExtendCost(		CommonCost -= TTI->getExtractWithExtendCost(
Ext->getOpcode(), Ext->getType(), VecTy, I);		Ext->getOpcode(), Ext->getType(), VecTy, I);
// Add back the cost of s\|zext which is subtracted separately.		// Add back the cost of s\|zext which is subtracted separately.
CommonCost += TTI->getCastInstrCost(		CommonCost += TTI->getCastInstrCost(
Ext->getOpcode(), Ext->getType(), EI->getType(),		Ext->getOpcode(), Ext->getType(), EI->getType(),
TTI::getCastContextHint(Ext), CostKind, Ext);		TTI::getCastContextHint(Ext), CostKind, Ext);
continue;		continue;
}		}
}		}
		if (ShuffleOrOp == Instruction::ExtractElement) {
		auto *EE = cast<ExtractElementInst>(EI);
		auto *IO = cast<ConstantInt>(EE->getIndexOperand());
		unsigned Idx = IO->getZExtValue();
		CommonCost -= TTI->getVectorInstrCost(
		Instruction::ExtractElement, EE->getVectorOperandType(), Idx);
		} else {
CommonCost -=		CommonCost -=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);
}		}
}		}
return CommonCost;		return CommonCost;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy,		TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy,
TTI::getCastContextHint(VL0), CostKind, VL0);		TTI::getCastContextHint(VL0), CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -=
		(ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}

// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
InstructionCost ScalarCost = VL.size() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;

auto *SrcVecTy = FixedVectorType::get(SrcTy, VL.size());		auto *SrcVecTy = FixedVectorType::get(SrcTy, SelfVF);
InstructionCost VecCost = 0;		InstructionCost VecCost = 0;
// Check if the values are candidates to demote.		// Check if the values are candidates to demote.
if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {		if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {
VecCost =		VecCost =
ReuseShuffleCost +		ReuseShuffleCost +
TTI->getCastInstrCost(E->getOpcode(), VecTy, SrcVecTy,		TTI->getCastInstrCost(E->getOpcode(), VecTy, SrcVecTy,
TTI::getCastContextHint(VL0), CostKind, VL0);		TTI::getCastContextHint(VL0), CostKind, VL0);
}		}
LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Select: {		case Instruction::Select: {
// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getCmpSelInstrCost(E->getOpcode(), ScalarTy, Builder.getInt1Ty(),		TTI->getCmpSelInstrCost(E->getOpcode(), ScalarTy, Builder.getInt1Ty(),
CmpInst::BAD_ICMP_PREDICATE, CostKind, VL0);		CmpInst::BAD_ICMP_PREDICATE, CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -=
		(ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
auto *MaskTy = FixedVectorType::get(Builder.getInt1Ty(), VL.size());		auto *MaskTy = FixedVectorType::get(Builder.getInt1Ty(), SelfVF);
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;

// Check if all entries in VL are either compares or selects with compares		// Check if all entries in VL are either compares or selects with compares
// as condition that have the same predicates.		// as condition that have the same predicates.
CmpInst::Predicate VecPred = CmpInst::BAD_ICMP_PREDICATE;		CmpInst::Predicate VecPred = CmpInst::BAD_ICMP_PREDICATE;
bool First = true;		bool First = true;
for (auto *V : VL) {		for (auto *V : VL) {
CmpInst::Predicate CurrentPred;		CmpInst::Predicate CurrentPred;
auto MatchCmp = m_Cmp(CurrentPred, m_Value(), m_Value());		auto MatchCmp = m_Cmp(CurrentPred, m_Value(), m_Value());
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
TargetTransformInfo::OperandValueProperties Op2VP =		TargetTransformInfo::OperandValueProperties Op2VP =
TargetTransformInfo::OP_PowerOf2;		TargetTransformInfo::OP_PowerOf2;

// If all operands are exactly the same ConstantInt then set the		// If all operands are exactly the same ConstantInt then set the
// operand kind to OK_UniformConstantValue.		// operand kind to OK_UniformConstantValue.
// If instead not all operands are constants, then set the operand kind		// If instead not all operands are constants, then set the operand kind
// to OK_AnyValue. If all operands are constants but not the same,		// to OK_AnyValue. If all operands are constants but not the same,
// then set the operand kind to OK_NonUniformConstantValue.		// then set the operand kind to OK_NonUniformConstantValue.
ConstantInt *CInt0 = nullptr;		Constant *C0 = nullptr;
for (unsigned i = 0, e = VL.size(); i < e; ++i) {		for (unsigned i = 0, e = VL.size(); i < e; ++i) {
		if (isa<UndefValue>(VL[i]))
		continue;
const Instruction *I = cast<Instruction>(VL[i]);		const Instruction *I = cast<Instruction>(VL[i]);
unsigned OpIdx = isa<BinaryOperator>(I) ? 1 : 0;		unsigned OpIdx = isa<BinaryOperator>(I) ? 1 : 0;
ConstantInt *CInt = dyn_cast<ConstantInt>(I->getOperand(OpIdx));		ConstantInt *CInt = dyn_cast<ConstantInt>(I->getOperand(OpIdx));
if (!CInt) {		Constant *UV = dyn_cast<UndefValue>(I->getOperand(OpIdx));
		if (!CInt && !UV) {
Op2VK = TargetTransformInfo::OK_AnyValue;		Op2VK = TargetTransformInfo::OK_AnyValue;
Op2VP = TargetTransformInfo::OP_None;		Op2VP = TargetTransformInfo::OP_None;
break;		break;
}		}
if (Op2VP == TargetTransformInfo::OP_PowerOf2 &&		if (Op2VP == TargetTransformInfo::OP_PowerOf2 &&
!CInt->getValue().isPowerOf2())		(UV \|\| !cast<ConstantInt>(CInt)->getValue().isPowerOf2()))
Op2VP = TargetTransformInfo::OP_None;		Op2VP = TargetTransformInfo::OP_None;
if (i == 0) {		if (i == 0) {
CInt0 = CInt;		C0 = CInt ? CInt : UV;
continue;		continue;
}		}
if (CInt0 != CInt)		if (C0 != (CInt ? CInt : UV))
Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;		Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
}		}

SmallVector<const Value *, 4> Operands(VL0->operand_values());		SmallVector<const Value *, 4> Operands(VL0->operand_values());
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -=
		(ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;
InstructionCost VecCost =		InstructionCost VecCost =
TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;

InstructionCost ScalarEltCost = TTI->getArithmeticInstrCost(		InstructionCost ScalarEltCost = TTI->getArithmeticInstrCost(
Instruction::Add, ScalarTy, CostKind, Op1VK, Op2VK);		Instruction::Add, ScalarTy, CostKind, Op1VK, Op2VK);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -=
		(ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;
InstructionCost VecCost = TTI->getArithmeticInstrCost(		InstructionCost VecCost = TTI->getArithmeticInstrCost(
Instruction::Add, VecTy, CostKind, Op1VK, Op2VK);		Instruction::Add, VecTy, CostKind, Op1VK, Op2VK);
LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Cost of wide load - cost of scalar loads.		// Cost of wide load - cost of scalar loads.
Align alignment = cast<LoadInst>(VL0)->getAlign();		Align alignment = cast<LoadInst>(VL0)->getAlign();
InstructionCost ScalarEltCost = TTI->getMemoryOpCost(		InstructionCost ScalarEltCost = TTI->getMemoryOpCost(
Instruction::Load, ScalarTy, alignment, 0, CostKind, VL0);		Instruction::Load, ScalarTy, alignment, 0, CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -=
		(ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarLdCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarLdCost = NumOfInstructions * ScalarEltCost;

InstructionCost VecLdCost;		InstructionCost VecLdCost;
		bool ShuffledLoadInstructions = false;
if (E->State == TreeEntry::Vectorize) {		if (E->State == TreeEntry::Vectorize) {
VecLdCost = TTI->getMemoryOpCost(Instruction::Load, VecTy, alignment, 0,		unsigned MinIdx;
CostKind, VL0);		unsigned MaxIdx;
		spatelUnsubmitted Not Done Reply Inline Actions Are we always creating a masked load for a vector with 2 elements? This logic needs a code comment to explain the cases. spatel: Are we always creating a masked load for a vector with 2 elements? This logic needs a code…
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, no need to do it for 2 elements, removed it. ABataev: No, no need to do it for 2 elements, removed it.
		if (E->ReorderIndices.empty()) {
		MinIdx = std::distance(VL.begin(), find_if(VL, Instruction::classof));
		MaxIdx =
		std::distance(VL.begin(),
		find_if(reverse(VL), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		Align CommonAlign;
		if (E->ReorderIndices.empty())
		CommonAlign = alignment;
		else
		CommonAlign =
		cast<LoadInst>(VL[E->ReorderIndices[MinIdx]])->getAlign();
		unsigned InstrDist = MaxIdx - MinIdx + 1;
		unsigned Sz = DL->getTypeStoreSize(ScalarTy);
		// Check if we can use load instead of masked load, i.e. we can directly
		// load aligned data.
		unsigned AlignedInstrDist = std::min(
		PowerOf2Ceil(InstrDist), alignTo(InstrDist * Sz, CommonAlign) / Sz);
		if (isPowerOf2_32(AlignedInstrDist)) {
		CommonAlign =
		commonAlignment(CommonAlign, CommonAlign.value() -
		(AlignedInstrDist - InstrDist));
		auto *LoadVecTy = VecTy;
		if (AlignedInstrDist != SelfVF)
		LoadVecTy = FixedVectorType::get(ScalarTy, AlignedInstrDist);
		VecLdCost = TTI->getMemoryOpCost(Instruction::Load, LoadVecTy,
		CommonAlign, 0, CostKind, VL0);
		if (!NeedToShuffleReuses && AlignedInstrDist != SelfVF) {
		VecLdCost += TTI->getShuffleCost(
		TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
		ShuffledLoadInstructions = true;
		}
		} else {
		VecLdCost = TTI->getMaskedMemoryOpCost(Instruction::Load, VecTy,
		alignment, 0, CostKind);
		}
} else {		} else {
assert(E->State == TreeEntry::ScatterVectorize && "Unknown EntryState");		assert(E->State == TreeEntry::ScatterVectorize && "Unknown EntryState");
		Align CommonAlignment = alignment;
		for (Value *V : InstructionsOnly)
		CommonAlignment =
		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
		unsigned NormalizedSz = llvm::PowerOf2Ceil(NumOfInstructions);
VecLdCost = TTI->getGatherScatterOpCost(		VecLdCost = TTI->getGatherScatterOpCost(
Instruction::Load, VecTy, cast<LoadInst>(VL0)->getPointerOperand(),		Instruction::Load, FixedVectorType::get(ScalarTy, NormalizedSz),
/VariableMask=/false, alignment, CostKind, VL0);		cast<LoadInst>(VL0)->getPointerOperand(),
		/VariableMask=/false, CommonAlignment, CostKind, VL0);
		// Cost of resizing the loaded elements to the size of the vector.
		if (!NeedToShuffleReuses && NormalizedSz != SelfVF) {
		VecLdCost = TTI->getShuffleCost(
		TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
		ShuffledLoadInstructions = true;
		}
}		}
if (!NeedToShuffleReuses && !E->ReorderIndices.empty()) {		if (!NeedToShuffleReuses && !E->ReorderIndices.empty() &&
		!ShuffledLoadInstructions) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
inversePermutation(E->ReorderIndices, NewMask);		inversePermutation(E->ReorderIndices, NewMask);
VecLdCost += TTI->getShuffleCost(		VecLdCost += TTI->getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);		TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);
}		}
LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecLdCost, ScalarLdCost));		LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecLdCost, ScalarLdCost));
return ReuseShuffleCost + VecLdCost - ScalarLdCost;		return ReuseShuffleCost + VecLdCost - ScalarLdCost;
}		}
case Instruction::Store: {		case Instruction::Store: {
// We know that we can merge the stores. Calculate the cost.		// We know that we can merge the stores. Calculate the cost.
bool IsReorder = !E->ReorderIndices.empty();		bool IsReorder = !E->ReorderIndices.empty();
auto *SI =		auto *SI = cast<StoreInst>(VL0);
cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);
Align Alignment = SI->getAlign();		Align Alignment = SI->getAlign();
InstructionCost ScalarEltCost = TTI->getMemoryOpCost(		InstructionCost ScalarEltCost = TTI->getMemoryOpCost(
Instruction::Store, ScalarTy, Alignment, 0, CostKind, VL0);		Instruction::Store, ScalarTy, Alignment, 0, CostKind, VL0);
InstructionCost ScalarStCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarStCost = NumOfInstructions * ScalarEltCost;
InstructionCost VecStCost = TTI->getMemoryOpCost(		InstructionCost VecStCost;
Instruction::Store, VecTy, Alignment, 0, CostKind, VL0);		unsigned MinIdx;
		unsigned MaxIdx;
		if (E->ReorderIndices.empty()) {
		MinIdx = std::distance(VL.begin(), find_if(VL, Instruction::classof));
		MaxIdx =
		std::distance(VL.begin(),
		find_if(reverse(VL), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		if (NumOfInstructions != SelfVF) {
		VecStCost = TTI->getMaskedMemoryOpCost(Instruction::Store, VecTy,
		Alignment, 0, CostKind);
		if (IsReorder) {
		SmallVector<int> NewMask;
		inversePermutation(E->ReorderIndices, NewMask);
		VecStCost += TTI->getShuffleCost(
		TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);
		}
		} else {
		VecStCost = TTI->getMemoryOpCost(Instruction::Store, VecTy, Alignment,
		0, CostKind, VL0);
if (IsReorder) {		if (IsReorder) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
inversePermutation(E->ReorderIndices, NewMask);		inversePermutation(E->ReorderIndices, NewMask);
VecStCost += TTI->getShuffleCost(		VecStCost += TTI->getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);		TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);
}		}
		}
LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecStCost, ScalarStCost));		LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecStCost, ScalarStCost));
return VecStCost - ScalarStCost;		return VecStCost - ScalarStCost;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// Calculate the cost of the scalar and vector calls.		// Calculate the cost of the scalar and vector calls.
IntrinsicCostAttributes CostAttrs(ID, *CI, 1);		IntrinsicCostAttributes CostAttrs(ID, *CI, 1);
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getIntrinsicInstrCost(CostAttrs, CostKind);		TTI->getIntrinsicInstrCost(CostAttrs, CostKind);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -=
		(ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarCallCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCallCost = NumOfInstructions * ScalarEltCost;

auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI);		auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI);
InstructionCost VecCallCost =		InstructionCost VecCallCost =
std::min(VecCallCosts.first, VecCallCosts.second);		std::min(VecCallCosts.first, VecCallCosts.second);

LLVM_DEBUG(dbgs() << "SLP: Call cost " << VecCallCost - ScalarCallCost		LLVM_DEBUG(dbgs() << "SLP: Call cost " << VecCallCost - ScalarCallCost
<< " (" << VecCallCost << "-" << ScalarCallCost << ")"		<< " (" << VecCallCost << "-" << ScalarCallCost << ")"
<< " for " << *CI << "\n");		<< " for " << *CI << "\n");

return ReuseShuffleCost + VecCallCost - ScalarCallCost;		return ReuseShuffleCost + VecCallCost - ScalarCallCost;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
assert(E->isAltShuffle() &&		assert(E->isAltShuffle() &&
((Instruction::isBinaryOp(E->getOpcode()) &&		((Instruction::isBinaryOp(E->getOpcode()) &&
Instruction::isBinaryOp(E->getAltOpcode())) \|\|		Instruction::isBinaryOp(E->getAltOpcode())) \|\|
(Instruction::isCast(E->getOpcode()) &&		(Instruction::isCast(E->getOpcode()) &&
Instruction::isCast(E->getAltOpcode()))) &&		Instruction::isCast(E->getAltOpcode()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");
InstructionCost ScalarCost = 0;		InstructionCost ScalarCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
for (unsigned Idx : E->ReuseShuffleIndices) {		for (unsigned Idx : E->ReuseShuffleIndices) {
Instruction *I = cast<Instruction>(VL[Idx]);		if (Idx >= VL.size() \|\| isa<UndefValue>(VL[Idx]))
		continue;
		auto *I = cast<Instruction>(VL[Idx]);
ReuseShuffleCost -= TTI->getInstructionCost(I, CostKind);		ReuseShuffleCost -= TTI->getInstructionCost(I, CostKind);
}		}
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
Instruction *I = cast<Instruction>(V);		Instruction *I = cast<Instruction>(V);
ReuseShuffleCost += TTI->getInstructionCost(I, CostKind);		ReuseShuffleCost += TTI->getInstructionCost(I, CostKind);
}		}
}		}
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
		RKSimonUnsubmitted Done Reply Inline Actions InstructionsOnly ? RKSimon: InstructionsOnly ?
Instruction *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
assert(E->isOpcodeOrAlt(I) && "Unexpected main/alternate opcode");		assert(E->isOpcodeOrAlt(I) && "Unexpected main/alternate opcode");
ScalarCost += TTI->getInstructionCost(I, CostKind);		ScalarCost += TTI->getInstructionCost(I, CostKind);
}		}
// VecCost is equal to sum of the cost of creating 2 vectors		// VecCost is equal to sum of the cost of creating 2 vectors
// and the cost of creating shuffle.		// and the cost of creating shuffle.
InstructionCost VecCost = 0;		InstructionCost VecCost = 0;
if (Instruction::isBinaryOp(E->getOpcode())) {		if (Instruction::isBinaryOp(E->getOpcode())) {
VecCost = TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind);		VecCost = TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind);
VecCost += TTI->getArithmeticInstrCost(E->getAltOpcode(), VecTy,		VecCost += TTI->getArithmeticInstrCost(E->getAltOpcode(), VecTy,
CostKind);		CostKind);
} else {		} else {
Type *Src0SclTy = E->getMainOp()->getOperand(0)->getType();		Type *Src0SclTy = E->getMainOp()->getOperand(0)->getType();
Type *Src1SclTy = E->getAltOp()->getOperand(0)->getType();		Type *Src1SclTy = E->getAltOp()->getOperand(0)->getType();
auto *Src0Ty = FixedVectorType::get(Src0SclTy, VL.size());		auto *Src0Ty = FixedVectorType::get(Src0SclTy, SelfVF);
auto *Src1Ty = FixedVectorType::get(Src1SclTy, VL.size());		auto *Src1Ty = FixedVectorType::get(Src1SclTy, SelfVF);
VecCost = TTI->getCastInstrCost(E->getOpcode(), VecTy, Src0Ty,		VecCost = TTI->getCastInstrCost(E->getOpcode(), VecTy, Src0Ty,
TTI::CastContextHint::None, CostKind);		TTI::CastContextHint::None, CostKind);
VecCost += TTI->getCastInstrCost(E->getAltOpcode(), VecTy, Src1Ty,		VecCost += TTI->getCastInstrCost(E->getAltOpcode(), VecTy, Src1Ty,
TTI::CastContextHint::None, CostKind);		TTI::CastContextHint::None, CostKind);
}		}

SmallVector<int> Mask(E->Scalars.size());		SmallVector<int> Mask(E->Scalars.size());
for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I) {		for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I) {
		if (isa<UndefValue>(E->Scalars[I])) {
		Mask[I] = UndefMaskElem;
		continue;
		}
auto *OpInst = cast<Instruction>(E->Scalars[I]);		auto *OpInst = cast<Instruction>(E->Scalars[I]);
assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");		assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");
Mask[I] = I + (OpInst->getOpcode() == E->getAltOpcode() ? End : 0);		Mask[I] = I + (OpInst->getOpcode() == E->getAltOpcode() ? End : 0);
}		}
VecCost +=		VecCost +=
TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask, 0);		TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask, 0);
LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
Show All 10 Lines	bool BoUpSLP::isFullyVectorizableTinyTree() const {
// We only handle trees of heights 1 and 2.		// We only handle trees of heights 1 and 2.
if (VectorizableTree.size() == 1 &&		if (VectorizableTree.size() == 1 &&
VectorizableTree[0]->State == TreeEntry::Vectorize)		VectorizableTree[0]->State == TreeEntry::Vectorize)
return true;		return true;

if (VectorizableTree.size() != 2)		if (VectorizableTree.size() != 2)
return false;		return false;

// Handle splat and all-constants stores.		// Handle splat, all-constants stores and extractelement stores.
if (VectorizableTree[0]->State == TreeEntry::Vectorize &&		if (VectorizableTree[0]->State == TreeEntry::Vectorize &&
(allConstant(VectorizableTree[1]->Scalars) \|\|		(allConstant(VectorizableTree[1]->Scalars) \|\|
isSplat(VectorizableTree[1]->Scalars)))		isSplat(VectorizableTree[1]->Scalars) \|\|
		all_of(VectorizableTree[1]->Scalars, [](Value *V) {
		return isa<UndefValue>(V) \|\| isa<ExtractElementInst>(V);
		})))
return true;		return true;

// Gathering cost would be too much for tiny trees.		// Gathering cost would be too much for tiny trees.
if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|		if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|
VectorizableTree[1]->State == TreeEntry::NeedToGather)		VectorizableTree[1]->State == TreeEntry::NeedToGather)
return false;		return false;

return true;		return true;
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	InstructionCost BoUpSLP::getSpillCost() const {
return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getTreeCost() {		InstructionCost BoUpSLP::getTreeCost() {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "		LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
<< VectorizableTree.size() << ".\n");		<< VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0]->Scalars.size();

for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {		for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I].get();		TreeEntry &TE = *VectorizableTree[I].get();

// We create duplicate tree entries for gather sequences that have multiple		// We create duplicate tree entries for gather sequences that have multiple
// uses. However, we should not compute the cost of duplicate sequences.		// uses. However, we should not compute the cost of duplicate sequences.
// For example, if we have a build vector (i.e., insertelement sequence)		// For example, if we have a build vector (i.e., insertelement sequence)
// that is used by more than one vector instruction, we only need to		// that is used by more than one vector instruction, we only need to
// compute the cost of the insertelement instructions once. The redundant		// compute the cost of the insertelement instructions once. The redundant
// instructions will be eliminated by CSE.		// instructions will be eliminated by CSE.
//		//
// We should consider not creating duplicate tree entries for gather		// We should consider not creating duplicate tree entries for gather
// sequences, and instead add additional edges to the tree representing		// sequences, and instead add additional edges to the tree representing
// their uses. Since such an approach results in fewer total entries,		// their uses. Since such an approach results in fewer total entries,
// existing heuristics based on tree size may yield different results.		// existing heuristics based on tree size may yield different results.
//		//
		// Also, need to exclude the cost for gather nodes, gcreate for gathered
		// loads. These loads are already gathered and no need to count them again,
		// if we were unable to vectorize them.
if (TE.State == TreeEntry::NeedToGather &&		if (TE.State == TreeEntry::NeedToGather &&
std::any_of(std::next(VectorizableTree.begin(), I + 1),		std::any_of(std::next(VectorizableTree.begin(), I + 1),
VectorizableTree.end(),		GatheredLoadsEntriesFirst >= 0
		? std::next(VectorizableTree.begin(),
		GatheredLoadsEntriesFirst)
		: VectorizableTree.end(),
[TE](const std::unique_ptr<TreeEntry> &EntryPtr) {		[TE](const std::unique_ptr<TreeEntry> &EntryPtr) {
return EntryPtr->State == TreeEntry::NeedToGather &&		return EntryPtr->State == TreeEntry::NeedToGather &&
EntryPtr->isSame(TE.Scalars);		EntryPtr->isSame(TE.Scalars);
}))		}))
continue;		continue;
		// Exclude cost of gather loads nodes which are not used.
		if (GatheredLoadsEntriesFirst >= 0 &&
		I >= static_cast<unsigned>(GatheredLoadsEntriesFirst) &&
		TE.State == TreeEntry::NeedToGather) {
		assert(all_of(TE.Scalars,
		[this](Value *V) {
		return (isa<LoadInst>(V) && MustGather.contains(V)) \|\|
		isa<Constant>(V) \|\|
		V->getType()->isPtrOrPtrVectorTy();
		}) &&
		"Expected loads, pointers or constants only.");
		continue;
		}

InstructionCost C = getEntryCost(&TE);		InstructionCost C = getEntryCost(&TE);
Cost += C;		Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n"		<< ".\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

SmallPtrSet<Value *, 16> ExtractCostCalculated;		SmallPtrSet<Value *, 16> ExtractCostCalculated;
InstructionCost ExtractCost = 0;		InstructionCost ExtractCost = 0;
for (ExternalUser &EU : ExternalUses) {		for (ExternalUser &EU : ExternalUses) {
// We only add extract cost once for the same scalar.		// We only add extract cost once for the same scalar.
if (!ExtractCostCalculated.insert(EU.Scalar).second)		if (!ExtractCostCalculated.insert(EU.Scalar).second)
continue;		continue;

// Uses by ephemeral values are free (because the ephemeral value will be		// Uses by ephemeral values are free (because the ephemeral value will be
// removed prior to code generation, and so the extraction will be		// removed prior to code generation, and so the extraction will be
// removed as well).		// removed as well).
if (EphValues.count(EU.User))		if (EphValues.count(EU.User))
continue;		continue;

		// BundleWidth varies in the treee, need to get the VF for each tree node.
		const TreeEntry *TE = getTreeEntry(EU.Scalar);
		SmallSet<unsigned int, 4> UserVFs;
		unsigned BundleWidth = getEntryVF(TE, UserVFs, TE);
		if (!TE->ReuseShuffleIndices.empty()) {
		int Limit = TE->ReuseShuffleIndices.size();
		BundleWidth = std::max<unsigned>(
		BundleWidth,
		PowerOf2Ceil(std::distance(
		TE->ReuseShuffleIndices.begin(),
		find_if(reverse(TE->ReuseShuffleIndices), [Limit](int I) {
		return I < Limit;
		}).base())));
		}

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);		auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);
auto *ScalarRoot = VectorizableTree[0]->Scalars[0];		auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
auto Extend =		auto Extend =
Show All 22 Lines	#ifndef NDEBUG
if (ViewSLPTree)		if (ViewSLPTree)
ViewGraph(this, "SLP" + F->getName(), false, Str);		ViewGraph(this, "SLP" + F->getName(), false, Str);
#endif		#endif

return Cost;		return Cost;
}		}

InstructionCost		InstructionCost
BoUpSLP::getGatherCost(FixedVectorType *Ty,		BoUpSLP::getGatherCost(FixedVectorType *Ty,
		RKSimonUnsubmitted Done Reply Inline Actions IgnoredIndices might be cheaper as a SparseBitVector ? RKSimon: IgnoredIndices might be cheaper as a SparseBitVector ?
const DenseSet<unsigned> &ShuffledIndices) const {		const DenseSet<unsigned> &ShuffledIndices,
		bool NeedShuffleCost) const {
unsigned NumElts = Ty->getNumElements();		unsigned NumElts = Ty->getNumElements();
APInt DemandedElts = APInt::getNullValue(NumElts);		APInt DemandedElts = APInt::getNullValue(NumElts);
for (unsigned I = 0; I < NumElts; ++I)		for (unsigned I = 0; I < NumElts; ++I)
		RKSimonUnsubmitted Not Done Reply Inline Actions trivial style refactor - pull out of patch? RKSimon: trivial style refactor - pull out of patch?
if (!ShuffledIndices.count(I))		if (!ShuffledIndices.count(I))
DemandedElts.setBit(I);		DemandedElts.setBit(I);
InstructionCost Cost =		InstructionCost Cost =
TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,		TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,
/Extract/ false);		/Extract/ false);
if (!ShuffledIndices.empty())		if (NeedShuffleCost)
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, Ty);		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, Ty);
return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL) const {		InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
		unsigned VF) const {
// Find the type of the operands in VL.		// Find the type of the operands in VL.
Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());		auto *VecTy = FixedVectorType::get(ScalarTy, VF);
// Find the cost of inserting/extracting values from the vector.		// Find the cost of inserting/extracting values from the vector.
// Check if the same elements are inserted several times and count them as		// Check if the same elements are inserted several times and count them as
// shuffle candidates.		// shuffle candidates.
DenseSet<unsigned> ShuffledElements;		DenseSet<unsigned> ShuffledElements;
DenseSet<Value *> UniqueElements;		DenseSet<Value *> UniqueElements;
// Iterate in reverse order to consider insert elements with the high cost.		// Iterate in reverse order to consider insert elements with the high cost.
for (unsigned I = VL.size(); I > 0; --I) {		bool NeedShuffleCost = false;
		for (int I = VF; I > 0; --I) {
unsigned Idx = I - 1;		unsigned Idx = I - 1;
if (!UniqueElements.insert(VL[Idx]).second)		if (isa<Constant>(VL[Idx])) {
		// Ignore constant data elements.
		ShuffledElements.insert(Idx);
		continue;
		}
		if (!UniqueElements.insert(VL[Idx]).second) {
ShuffledElements.insert(Idx);		ShuffledElements.insert(Idx);
		NeedShuffleCost = true;
		}
}		}
return getGatherCost(VecTy, ShuffledElements);		return getGatherCost(VecTy, ShuffledElements, NeedShuffleCost);
}		}

// Perform operand reordering on the instructions in VL and return the reordered		// Perform operand reordering on the instructions in VL and return the reordered
// operands in Left and Right.		// operands in Left and Right.
void BoUpSLP::reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		void BoUpSLP::reorderInputsAccordingToOpcode(
SmallVectorImpl<Value *> &Left,		Instruction &VL0, ArrayRef<Value > VL, SmallVectorImpl<Value > &Left,
SmallVectorImpl<Value *> &Right,		SmallVectorImpl<Value *> &Right, const DataLayout &DL, ScalarEvolution &SE,
const DataLayout &DL,
ScalarEvolution &SE,
const BoUpSLP &R) {		const BoUpSLP &R) {
if (VL.empty())		if (VL.empty())
return;		return;
VLOperands Ops(VL, DL, SE, R);		VLOperands Ops(VL0, VL, DL, SE, R);
// Reorder the operands in place.		// Reorder the operands in place.
Ops.reorder();		Ops.reorder();
Left = Ops.getVL(0);		Left = Ops.getVL(0);
Right = Ops.getVL(1);		Right = Ops.getVL(1);
}		}

void BoUpSLP::setInsertPointAfterBundle(TreeEntry *E) {		void BoUpSLP::setInsertPointAfterBundle(TreeEntry *E) {
		auto InstructionsOnly = make_filter_range(E->Scalars, Instruction::classof);
		if (llvm::empty(InstructionsOnly))
		return;
// Get the basic block this bundle is in. All instructions in the bundle		// Get the basic block this bundle is in. All instructions in the bundle
// should be in this block.		// should be in this block.
auto *Front = E->getMainOp();		auto *Front = E->getMainOp();
auto *BB = Front->getParent();		auto *BB = Front->getParent();
assert(llvm::all_of(E->Scalars, [=](Value *V) -> bool {		assert(llvm::all_of(InstructionsOnly, [=](Value *V) -> bool {
auto *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;		return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;
}));		}));

// The last instruction in the bundle in program order.		// The last instruction in the bundle in program order.
Instruction *LastInst = nullptr;		Instruction *LastInst = nullptr;

// Find the last instruction. The common case should be that BB has been		// Find the last instruction. The common case should be that BB has been
// scheduled, and the last instruction is VL.back(). So we start with		// scheduled, and the last instruction is VL.back(). So we start with
// VL.back() and iterate over schedule data until we reach the end of the		// VL.back() and iterate over schedule data until we reach the end of the
// bundle. The end of the bundle is marked by null ScheduleData.		// bundle. The end of the bundle is marked by null ScheduleData.
if (BlocksSchedules.count(BB)) {		if (BlocksSchedules.count(BB)) {
auto *Bundle =		auto *Bundle = BlocksSchedules[BB]->getScheduleData(
BlocksSchedules[BB]->getScheduleData(E->isOneOf(E->Scalars.back()));		E->isOneOf(*llvm::reverse(InstructionsOnly).begin()));
if (Bundle && Bundle->isPartOfBundle())		if (Bundle && Bundle->isPartOfBundle())
for (; Bundle; Bundle = Bundle->NextInBundle)		for (; Bundle; Bundle = Bundle->NextInBundle)
if (Bundle->OpValue == Bundle->Inst)		if (Bundle->OpValue == Bundle->Inst)
LastInst = Bundle->Inst;		LastInst = Bundle->Inst;
}		}

// LastInst can still be null at this point if there's either not an entry		// LastInst can still be null at this point if there's either not an entry
// for BB in BlocksSchedules or there's no ScheduleData available for		// for BB in BlocksSchedules or there's no ScheduleData available for
Show All 9 Lines	void BoUpSLP::setInsertPointAfterBundle(TreeEntry *E) {
// will visit all the remaining instructions in the block.		// will visit all the remaining instructions in the block.
//		//
// One of the reasons we exit early from buildTree_rec is to place an upper		// One of the reasons we exit early from buildTree_rec is to place an upper
// bound on compile-time. Thus, taking an additional compile-time hit here is		// bound on compile-time. Thus, taking an additional compile-time hit here is
// not ideal. However, this should be exceedingly rare since it requires that		// not ideal. However, this should be exceedingly rare since it requires that
// we both exit early from buildTree_rec and that the bundle be out-of-order		// we both exit early from buildTree_rec and that the bundle be out-of-order
// (causing us to iterate all the way to the end of the block).		// (causing us to iterate all the way to the end of the block).
if (!LastInst) {		if (!LastInst) {
SmallPtrSet<Value *, 16> Bundle(E->Scalars.begin(), E->Scalars.end());		SmallPtrSet<Value *, 16> Bundle(InstructionsOnly.begin(),
		InstructionsOnly.end());
for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {		for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {
if (Bundle.erase(&I) && E->isOpcodeOrAlt(&I))		if (Bundle.erase(&I) && E->isOpcodeOrAlt(&I))
LastInst = &I;		LastInst = &I;
if (Bundle.empty())		if (Bundle.empty())
break;		break;
}		}
}		}
assert(LastInst && "Failed to find last instruction in bundle");		assert(LastInst && "Failed to find last instruction in bundle");

// Set the insertion point after the last instruction in the bundle. Set the		// Set the insertion point after the last instruction in the bundle. Set the
// debug location to Front.		// debug location to Front.
Builder.SetInsertPoint(BB, ++LastInst->getIterator());		Builder.SetInsertPoint(BB, ++LastInst->getIterator());
Builder.SetCurrentDebugLocation(Front->getDebugLoc());		Builder.SetCurrentDebugLocation(Front->getDebugLoc());
}		}

Value BoUpSLP::gather(ArrayRef<Value > VL) {		Value BoUpSLP::gather(ArrayRef<Value > VL) {
		spatelUnsubmitted Not Done Reply Inline Actions I did some clean-ups while trying to understand the behavior of this code, so this patch will need a (hopefully reduced diff) update: rG7451bf0b0b6d rG062276c69109 This one may also require rebase: rGa44238cb443f spatel: I did some clean-ups while trying to understand the behavior of this code, so this patch will…
Value *Val0 =		// List of instructions/lanes from current block and/or the blocks which are
isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];		// part of the current loop. These instructions will be inserted at the end to
FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());		// make it possible to optimize loops and hoist invariant instructions out of
Value *Vec = PoisonValue::get(VecTy);		// the loops body with better chances for success.
unsigned InsIndex = 0;		SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;
for (Value *Val : VL) {		SmallSet<int, 4> PostponedIndices;
Vec = Builder.CreateInsertElement(Vec, Val, Builder.getInt32(InsIndex++));		for (int I = 0, E = VL.size(); I < E; ++I) {
		if (auto *Inst = dyn_cast<Instruction>(VL[I]))
		if (Inst->getParent() == Builder.GetInsertBlock() &&
		PostponedIndices.insert(I).second)
		PostponedInsts.emplace_back(Inst, I);
		}
		if (Loop *L = LI->getLoopFor(Builder.GetInsertBlock())) {
		for (int I = 0, E = VL.size(); I < E; ++I) {
		if (auto *Inst = dyn_cast<Instruction>(VL[I]))
		if (L->contains(Inst) && PostponedIndices.insert(I).second)
		PostponedInsts.emplace_back(Inst, I);
		}
		}

		auto &&CreateInsertElement = [this](Value Vec, Value V, unsigned Pos) {
		// No need to insert undefs elements - exit.
		if (isa<UndefValue>(V))
		return Vec;
		Vec = Builder.CreateInsertElement(Vec, V, Builder.getInt32(Pos));
auto *InsElt = dyn_cast<InsertElementInst>(Vec);		auto *InsElt = dyn_cast<InsertElementInst>(Vec);
if (!InsElt)		if (!InsElt)
continue;		return Vec;
GatherSeq.insert(InsElt);		GatherSeq.insert(InsElt);
CSEBlocks.insert(InsElt->getParent());		CSEBlocks.insert(InsElt->getParent());
// Add to our 'need-to-extract' list.		// Add to our 'need-to-extract' list.
if (TreeEntry *Entry = getTreeEntry(Val)) {		if (TreeEntry *Entry = getTreeEntry(V)) {
// Find which lane we need to extract.		// Find which lane we need to extract.
unsigned FoundLane = std::distance(Entry->Scalars.begin(),		unsigned FoundLane =
find(Entry->Scalars, Val));		std::distance(Entry->Scalars.begin(), find(Entry->Scalars, V));
assert(FoundLane < Entry->Scalars.size() && "Couldn't find extract lane");		assert(FoundLane < Entry->Scalars.size() && "Couldn't find extract lane");
if (!Entry->ReuseShuffleIndices.empty()) {		if (!Entry->ReuseShuffleIndices.empty()) {
FoundLane = std::distance(Entry->ReuseShuffleIndices.begin(),		FoundLane = std::distance(Entry->ReuseShuffleIndices.begin(),
find(Entry->ReuseShuffleIndices, FoundLane));		find(Entry->ReuseShuffleIndices, FoundLane));
}		}
ExternalUses.push_back(ExternalUser(Val, InsElt, FoundLane));		ExternalUses.emplace_back(V, InsElt, FoundLane);
}
}		}

return Vec;		return Vec;
}		};

Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL) {
InstructionsState S = getSameOpcode(VL);
if (S.getOpcode()) {
if (TreeEntry *E = getTreeEntry(S.OpValue)) {
if (E->isSame(VL)) {
Value *V = vectorizeTree(E);
if (VL.size() == E->Scalars.size() && !E->ReuseShuffleIndices.empty()) {
// Reshuffle to get only unique values.
// If some of the scalars are duplicated in the vectorization tree
// entry, we do not vectorize them but instead generate a mask for the
// reuses. But if there are several users of the same entry, they may
// have different vectorization factors. This is especially important
// for PHI nodes. In this case, we need to adapt the resulting
// instruction for the user vectorization factor and have to reshuffle
// it again to take only unique elements of the vector. Without this
// code the function incorrectly returns reduced vector instruction
// with the same elements, not with the unique ones.
// block:
// %phi = phi <2 x > { .., %entry} {%shuffle, %block}
// %2 = shuffle <2 x > %phi, %poison, <4 x > <0, 0, 1, 1>
// ... (use %2)
// %shuffle = shuffle <2 x> %2, poison, <2 x> {0, 2}
// br %block
SmallVector<int, 4> UniqueIdxs;
SmallSet<int, 4> UsedIdxs;
int Pos = 0;
for (int Idx : E->ReuseShuffleIndices) {
if (UsedIdxs.insert(Idx).second)
UniqueIdxs.emplace_back(Pos);
++Pos;
}
V = Builder.CreateShuffleVector(V, UniqueIdxs, "shrink.shuffle");
}
return V;
}
}
}

// Check that every instruction appears once in this bundle.		Value *Val0 =
SmallVector<int, 4> ReuseShuffleIndicies;		isa<StoreInst>(VL[0]) ? cast<StoreInst>(VL[0])->getValueOperand() : VL[0];
SmallVector<Value *, 4> UniqueValues;		FixedVectorType *VecTy = FixedVectorType::get(Val0->getType(), VL.size());
if (VL.size() > 2) {		Value *Vec = PoisonValue::get(VecTy);
DenseMap<Value *, unsigned> UniquePositions;		for (int I = 0, E = VL.size(); I < E; ++I) {
for (Value *V : VL) {		if (PostponedIndices.contains(I))
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		continue;
ReuseShuffleIndicies.emplace_back(Res.first->second);		Vec = CreateInsertElement(Vec, VL[I], I);
if (Res.second \|\| isa<Constant>(V))
UniqueValues.emplace_back(V);
}
// Do not shuffle single element or if number of unique values is not power
// of 2.
if (UniqueValues.size() == VL.size() \|\| UniqueValues.size() <= 1 \|\|
!llvm::isPowerOf2_32(UniqueValues.size()))
ReuseShuffleIndicies.clear();
else
VL = UniqueValues;
}		}
		// Append instructions, which are/may be part of the loop, in the end to make
		// it possible to hoist non-loop-based instructions.
		for (const std::pair<Value *, unsigned> &Pair : PostponedInsts)
		Vec = CreateInsertElement(Vec, Pair.first, Pair.second);

Value *Vec = gather(VL);
if (!ReuseShuffleIndicies.empty()) {
Vec = Builder.CreateShuffleVector(Vec, ReuseShuffleIndicies, "shuffle");
if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherSeq.insert(I);
CSEBlocks.insert(I->getParent());
}
}
return Vec;		return Vec;
}		}

namespace {		namespace {
/// Merges shuffle masks and emits final shuffle instruction, if required.		/// Merges shuffle masks and emits final shuffle instruction, if required.
class ShuffleInstructionBuilder {		class ShuffleInstructionBuilder {
IRBuilderBase &Builder;		IRBuilderBase &Builder;
		unsigned VF = 0;
bool IsFinalized = false;		bool IsFinalized = false;
SmallVector<int, 4> Mask;		SmallVector<int, 4> Mask;

public:		public:
ShuffleInstructionBuilder(IRBuilderBase &Builder) : Builder(Builder) {}		ShuffleInstructionBuilder(IRBuilderBase &Builder, unsigned VF)
		: Builder(Builder), VF(VF) {}

/// Adds a mask, inverting it before applying.		/// Adds a mask, inverting it before applying.
void addInversedMask(ArrayRef<unsigned> SubMask) {		void addInversedMask(ArrayRef<unsigned> SubMask) {
if (SubMask.empty())		if (SubMask.empty())
return;		return;
SmallVector<int, 4> NewMask;		SmallVector<int, 4> NewMask;
inversePermutation(SubMask, NewMask);		inversePermutation(SubMask, NewMask);
addMask(NewMask);		addMask(NewMask);
Show All 10 Lines	if (SubMask.empty())
return;		return;
if (Mask.empty()) {		if (Mask.empty()) {
Mask.append(SubMask.begin(), SubMask.end());		Mask.append(SubMask.begin(), SubMask.end());
return;		return;
}		}
SmallVector<int, 4> NewMask(SubMask.size(), SubMask.size());		SmallVector<int, 4> NewMask(SubMask.size(), SubMask.size());
int TermValue = std::min(Mask.size(), SubMask.size());		int TermValue = std::min(Mask.size(), SubMask.size());
for (int I = 0, E = SubMask.size(); I < E; ++I) {		for (int I = 0, E = SubMask.size(); I < E; ++I) {
if (SubMask[I] >= TermValue \|\| Mask[SubMask[I]] >= TermValue) {		if (SubMask[I] >= TermValue \|\| SubMask[I] == UndefMaskElem \|\|
NewMask[I] = E;		Mask[SubMask[I]] >= TermValue) {
		NewMask[I] = UndefMaskElem;
continue;		continue;
}		}
NewMask[I] = Mask[SubMask[I]];		NewMask[I] = Mask[SubMask[I]];
}		}
Mask.swap(NewMask);		Mask.swap(NewMask);
}		}

Value finalize(Value V) {		Value finalize(Value V) {
IsFinalized = true;		IsFinalized = true;
if (Mask.empty())		if (VF == cast<FixedVectorType>(V->getType())->getNumElements() &&
		Mask.empty())
		return V;
		SmallVector<int, 4> NormalizedMask(VF, UndefMaskElem);
		std::iota(NormalizedMask.begin(), NormalizedMask.end(), 0);
		addMask(NormalizedMask);

		if (VF == cast<FixedVectorType>(V->getType())->getNumElements() &&
		isUniform(Mask))
return V;		return V;
return Builder.CreateShuffleVector(V, Mask, "shuffle");		return Builder.CreateShuffleVector(V, Mask, "shuffle");
}		}

~ShuffleInstructionBuilder() {		~ShuffleInstructionBuilder() {
assert((IsFinalized \|\| Mask.empty()) &&		assert((IsFinalized \|\| Mask.empty()) &&
"Shuffle construction must be finalized.");		"Shuffle construction must be finalized.");
}		}
};		};
} // namespace		} // namespace

		Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL, unsigned VF) {
		InstructionsState S = getSameOpcode(VL);
		if (S.getOpcode()) {
		if (TreeEntry *E = getTreeEntry(S.OpValue))
		if (VL.size() == E->Scalars.size() && E->isSame(VL)) {
		Value *V = vectorizeTree(E);
		if (VF != cast<FixedVectorType>(V->getType())->getNumElements()) {
		if (!E->ReuseShuffleIndices.empty()) {
		// Reshuffle to get only unique values.
		// If some of the scalars are duplicated in the vectorization tree
		// entry, we do not vectorize them but instead generate a mask for
		// the reuses. But if there are several users of the same entry,
		// they may have different vectorization factors. This is especially
		// important for PHI nodes. In this case, we need to adapt the
		// resulting instruction for the user vectorization factor and have
		// to reshuffle it again to take only unique elements of the vector.
		// Without this code the function incorrectly returns reduced vector
		// instruction with the same elements, not with the unique ones.

		// block:
		// %phi = phi <2 x > { .., %entry} {%shuffle, %block}
		// %2 = shuffle <2 x > %phi, %poison, <4 x > <0, 0, 1, 1>
		// ... (use %2)
		// %shuffle = shuffle <2 x> %2, poison, <2 x> {0, 2}
		// br %block
		SmallVector<int, 4> UniqueIdxs;
		SmallSet<int, 4> UsedIdxs;
		int Pos = 0;
		int Sz = VL.size();
		for (int Idx : E->ReuseShuffleIndices) {
		if (Idx != Sz && UsedIdxs.insert(Idx).second)
		UniqueIdxs.emplace_back(Pos);
		++Pos;
		}
		assert(VF >= UsedIdxs.size() && "Expected vectorization factor "
		"less than original vector size.");
		UniqueIdxs.append(VF - UsedIdxs.size(), UndefMaskElem);
		V = Builder.CreateShuffleVector(V, UniqueIdxs, "shrink.shuffle");
		} else {
		assert(VF < cast<FixedVectorType>(V->getType())->getNumElements() &&
		"Expected vectorization factor less "
		"than original vector size.");
		SmallVector<int, 4> UniformMask(VF, 0);
		std::iota(UniformMask.begin(), UniformMask.end(), 0);
		V = Builder.CreateShuffleVector(V, UniformMask, "shrink.shuffle");
		}
		}
		return V;
		}
		}

		// Check that every instruction appears once in this bundle.
		SmallVector<int, 4> ReuseShuffleIndicies;
		SmallVector<Value *, 4> UniqueValues;
		if (VL.size() > 2) {
		DenseMap<Value *, unsigned> UniquePositions;
		unsigned NumValues =
		std::distance(VL.begin(), find_if(reverse(VL), [](Value *V) {
		return !isa<UndefValue>(V);
		}).base());
		VF = std::max<unsigned>(VF, PowerOf2Ceil(NumValues));
		int UniqueVals = 0;
		bool HasUndefs = false;
		for (Value *V : VL.drop_back(VL.size() - VF)) {
		if (isa<UndefValue>(V)) {
		ReuseShuffleIndicies.emplace_back(UndefMaskElem);
		HasUndefs = true;
		continue;
		}
		if (isa<Constant>(V)) {
		ReuseShuffleIndicies.emplace_back(UniqueValues.size());
		UniqueValues.emplace_back(V);
		continue;
		}
		auto Res = UniquePositions.try_emplace(V, UniqueValues.size());
		ReuseShuffleIndicies.emplace_back(Res.first->second);
		if (Res.second) {
		UniqueValues.emplace_back(V);
		++UniqueVals;
		}
		}
		if (HasUndefs && UniqueVals == 1 && UniqueValues.size() == 1) {
		// Emit pure splat vector.
		ReuseShuffleIndicies.assign(VF, 0);
		} else if (UniqueValues.size() >= VF - 1 \|\| UniqueValues.size() <= 1) {
		ReuseShuffleIndicies.clear();
		UniqueValues.clear();
		UniqueValues.append(VL.begin(), std::next(VL.begin(), NumValues));
		}
		UniqueValues.append(VF - UniqueValues.size(),
		UndefValue::get(VL[0]->getType()));
		VL = UniqueValues;
		}

		ShuffleInstructionBuilder ShuffleBuilder(Builder, VF);
		Value *Vec = gather(VL);
		if (!ReuseShuffleIndicies.empty()) {
		ShuffleBuilder.addMask(ReuseShuffleIndicies);
		Vec = ShuffleBuilder.finalize(Vec);
		if (auto *I = dyn_cast<Instruction>(Vec)) {
		GatherSeq.insert(I);
		CSEBlocks.insert(I->getParent());
		}
		}
		return Vec;
		}

Value BoUpSLP::vectorizeTree(TreeEntry E) {		Value BoUpSLP::vectorizeTree(TreeEntry E) {
IRBuilder<>::InsertPointGuard Guard(Builder);		IRBuilder<>::InsertPointGuard Guard(Builder);

		Instruction *VL0 = E->getMainOp();
if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

ShuffleInstructionBuilder ShuffleBuilder(Builder);		SmallSet<unsigned, 4> UserVFs;
		unsigned SelfVF = getEntryVF(E, UserVFs, E);
		unsigned ShuffleVF = SelfVF;
		if (!E->ReuseShuffleIndices.empty()) {
		int Limit = E->Scalars.size();
		ShuffleVF = std::max<unsigned>(
		SelfVF, PowerOf2Ceil(std::distance(
		E->ReuseShuffleIndices.begin(),
		find_if(reverse(E->ReuseShuffleIndices), [Limit](int I) {
		return I < Limit;
		}).base())));
		}
		ShuffleInstructionBuilder ShuffleBuilder(Builder, ShuffleVF);
		Type *ScalarTy = VL0->getType();
		if (auto *Store = dyn_cast<StoreInst>(VL0))
		ScalarTy = Store->getValueOperand()->getType();
		auto *VecTy = FixedVectorType::get(ScalarTy, SelfVF);
		if (isa<UndefValue>(VL0))
		return UndefValue::get(VecTy);

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
Value *Vec = gather(E->Scalars);		Value *Vec = gather(makeArrayRef(E->Scalars).slice(0, SelfVF));
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
Vec = ShuffleBuilder.finalize(Vec);		Vec = ShuffleBuilder.finalize(Vec);
if (auto *I = dyn_cast<Instruction>(Vec)) {		if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherSeq.insert(I);		GatherSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
}		}
E->VectorizedValue = Vec;		E->VectorizedValue = Vec;
return Vec;		return Vec;
}		}

assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
		auto InstructionsOnly = make_filter_range(E->Scalars, Instruction::classof);
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
Instruction *VL0 = E->getMainOp();
Type *ScalarTy = VL0->getType();
if (auto *Store = dyn_cast<StoreInst>(VL0))
ScalarTy = Store->getValueOperand()->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
auto *PH = cast<PHINode>(VL0);		auto *PH = cast<PHINode>(VL0);
Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());		Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());		PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());
Value *V = NewPhi;		Value *V = NewPhi;
if (NeedToShuffleReuses)		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = Builder.CreateShuffleVector(V, E->ReuseShuffleIndices, "shuffle");		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;

// PHINodes may have multiple entries from the same block. We want to		// PHINodes may have multiple entries from the same block. We want to
// visit every block once.		// visit every block once.
SmallPtrSet<BasicBlock*, 4> VisitedBBs;		SmallPtrSet<BasicBlock*, 4> VisitedBBs;

for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
BasicBlock *IBB = PH->getIncomingBlock(i);		BasicBlock *IBB = PH->getIncomingBlock(i);

if (!VisitedBBs.insert(IBB).second) {		if (!VisitedBBs.insert(IBB).second) {
NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);		NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);
continue;		continue;
}		}

Builder.SetInsertPoint(IBB->getTerminator());		Builder.SetInsertPoint(IBB->getTerminator());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
Value *Vec = vectorizeTree(E->getOperand(i));		Value *Vec = vectorizeTree(E->getOperand(i), SelfVF);
NewPhi->addIncoming(Vec, IBB);		NewPhi->addIncoming(Vec, IBB);
}		}

assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&		assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&
"Invalid number of incoming values");		"Invalid number of incoming values");
return V;		return V;
}		}

case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
Value *V = E->getSingleOperand(0);		Value *V = E->getSingleOperand(0);
Builder.SetInsertPoint(VL0);		Builder.SetInsertPoint(VL0);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);
E->VectorizedValue = V;		E->VectorizedValue = V;
return V;		return V;
}		}
case Instruction::ExtractValue: {		case Instruction::ExtractValue: {
auto *LI = cast<LoadInst>(E->getSingleOperand(0));		auto *LI = cast<LoadInst>(VL0->getOperand(0));
Builder.SetInsertPoint(LI);		Builder.SetInsertPoint(LI);
auto *PtrTy = PointerType::get(VecTy, LI->getPointerAddressSpace());		auto *PtrTy = PointerType::get(VecTy, LI->getPointerAddressSpace());
Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);		Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);
LoadInst *V = Builder.CreateAlignedLoad(VecTy, Ptr, LI->getAlign());		LoadInst *V = Builder.CreateAlignedLoad(VecTy, Ptr, LI->getAlign());
Value *NewV = propagateMetadata(V, E->Scalars);		Value *NewV = propagateMetadata(V, to_vector<4>(InstructionsOnly));
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
NewV = ShuffleBuilder.finalize(NewV);		NewV = ShuffleBuilder.finalize(NewV);
E->VectorizedValue = NewV;		E->VectorizedValue = NewV;
return NewV;		return NewV;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *InVec = vectorizeTree(E->getOperand(0));		Value *InVec = vectorizeTree(E->getOperand(0), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

auto *CI = cast<CastInst>(VL0);		auto *CI = cast<CastInst>(VL0);
Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);		Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp: {		case Instruction::ICmp: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *L = vectorizeTree(E->getOperand(0));		Value *L = vectorizeTree(E->getOperand(0), SelfVF);
Value *R = vectorizeTree(E->getOperand(1));		Value *R = vectorizeTree(E->getOperand(1), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Value *V = Builder.CreateCmp(P0, L, R);		Value *V = Builder.CreateCmp(P0, L, R);
propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, E->Scalars, VL0);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Select: {		case Instruction::Select: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Cond = vectorizeTree(E->getOperand(0));		Value *Cond = vectorizeTree(E->getOperand(0), SelfVF);
Value *True = vectorizeTree(E->getOperand(1));		Value *True = vectorizeTree(E->getOperand(1), SelfVF);
Value *False = vectorizeTree(E->getOperand(2));		Value *False = vectorizeTree(E->getOperand(2), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateSelect(Cond, True, False);		Value *V = Builder.CreateSelect(Cond, True, False);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op = vectorizeTree(E->getOperand(0));		Value *Op = vectorizeTree(E->getOperand(0), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateUnOp(		Value *V = Builder.CreateUnOp(
static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);		static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);
Show All 24 Lines	switch (ShuffleOrOp) {
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *LHS = vectorizeTree(E->getOperand(0));		Value *LHS = vectorizeTree(E->getOperand(0), SelfVF);
Value *RHS = vectorizeTree(E->getOperand(1));		Value *RHS = vectorizeTree(E->getOperand(1), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateBinOp(		Value *V = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,		static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,
RHS);		RHS);
propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, E->Scalars, VL0);
if (auto *I = dyn_cast<Instruction>(V))		if (auto *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, llvm::to_vector<4>(InstructionsOnly));

ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;

return V;		return V;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Loads are inserted at the head of the tree because we don't want to		// Loads are inserted at the head of the tree because we don't want to
// sink them all the way down past store instructions.		// sink them all the way down past store instructions.
bool IsReorder = E->updateStateIfReorder();		bool IsReorder = E->updateStateIfReorder();
if (IsReorder)		if (IsReorder)
VL0 = E->getMainOp();		VL0 = E->getMainOp();
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

LoadInst *LI = cast<LoadInst>(VL0);		LoadInst *LI = cast<LoadInst>(VL0);
Instruction *NewLI;
unsigned AS = LI->getPointerAddressSpace();		unsigned AS = LI->getPointerAddressSpace();
Value *PO = LI->getPointerOperand();		Value *PO = LI->getPointerOperand();
		unsigned MinIdx;
		unsigned MaxIdx;
		if (E->ReorderIndices.empty()) {
		MinIdx = std::distance(E->Scalars.begin(),
		find_if(E->Scalars, Instruction::classof));
		MaxIdx =
		std::distance(
		E->Scalars.begin(),
		find_if(reverse(E->Scalars), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		unsigned NumOfInstructions = MaxIdx - MinIdx + 1;
		Value *VecPtr;
		Instruction *VecLI;
		Value *V;
		Align CommonAlignment = LI->getAlign();
if (E->State == TreeEntry::Vectorize) {		if (E->State == TreeEntry::Vectorize) {
		unsigned Sz = DL->getTypeStoreSize(ScalarTy);
Value *VecPtr = Builder.CreateBitCast(PO, VecTy->getPointerTo(AS));		unsigned AlignedNumOfInstructions =
		std::min(PowerOf2Ceil(NumOfInstructions),
		alignTo(NumOfInstructions * Sz, CommonAlignment) / Sz);
		if (isPowerOf2_32(AlignedNumOfInstructions)) {
		CommonAlignment =
		spatelUnsubmitted Not Done Reply Inline Actions Please add code comment/example to explain what the difference is between these 2 clauses. spatel: Please add code comment/example to explain what the difference is between these 2 clauses.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Fixed it, thanks. ABataev: Fixed it, thanks.
		commonAlignment(CommonAlignment, CommonAlignment.value() -
		(AlignedNumOfInstructions -
		NumOfInstructions));
		auto *LoadVecTy =
		FixedVectorType::get(ScalarTy, AlignedNumOfInstructions);
		VecPtr = Builder.CreateBitCast(PO, LoadVecTy->getPointerTo(AS));
		VecLI = Builder.CreateAlignedLoad(LoadVecTy, VecPtr, CommonAlignment);
		V = propagateMetadata(VecLI, llvm::to_vector<4>(InstructionsOnly));
		} else {
		VecPtr = Builder.CreateBitCast(PO, VecTy->getPointerTo(AS));
		SmallVector<Constant *, 4> Mask;
		Mask.reserve(SelfVF);
		Mask.append(NumOfInstructions, Builder.getInt1(/V=/true));
		Mask.append(SelfVF - NumOfInstructions, Builder.getInt1(/V=/false));
		VecLI = Builder.CreateMaskedLoad(VecPtr, CommonAlignment,
		ConstantVector::get(Mask));
		V = propagateMetadata(VecLI, llvm::to_vector<4>(InstructionsOnly));
		}
// The pointer operand uses an in-tree scalar so we add the new BitCast		// The pointer operand uses an in-tree scalar so we add the new BitCast
// to ExternalUses list to make sure that an extract will be generated		// to ExternalUses list to make sure that an extract will be generated
// in the future.		// in the future.
if (getTreeEntry(PO))		if (getTreeEntry(PO))
ExternalUses.emplace_back(PO, cast<User>(VecPtr), 0);		ExternalUses.emplace_back(PO, cast<User>(VecPtr), 0);

NewLI = Builder.CreateAlignedLoad(VecTy, VecPtr, LI->getAlign());
} else {		} else {
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions `emplace_back()` anton-afanasyev: `emplace_back()`
assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");		assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");
Value *VecPtr = vectorizeTree(E->getOperand(0));		for (Value *V : InstructionsOnly)
// Use the minimum alignment of the gathered loads.
Align CommonAlignment = LI->getAlign();
for (Value *V : E->Scalars)
CommonAlignment =		CommonAlignment =
commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
NewLI = Builder.CreateMaskedGather(VecPtr, CommonAlignment);		unsigned NormalizedSz = llvm::PowerOf2Ceil(NumOfInstructions);
		Value *VecPtr = vectorizeTree(E->getOperand(0), SelfVF);
		if (NormalizedSz != SelfVF) {
		spatelUnsubmitted Not Done Reply Inline Actions Is Passthrough a full vector of undef elements? If so, it should be created/named that way (or directly in the call to CreateMaskedLoad()) rather than in the loop. spatel: Is Passthrough a full vector of undef elements? If so, it should be created/named that way (or…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Fixed ABataev: Fixed
		// Reduce the original vector to optimize masked gather.
		SmallVector<int, 4> RedMask(NormalizedSz, 0);
		std::iota(RedMask.begin(), RedMask.end(), 0);
		VecPtr = Builder.CreateShuffleVector(VecPtr, RedMask);
		}
		SmallVector<Constant *, 4> Mask;
		Mask.reserve(SelfVF);
		RKSimonUnsubmitted Done Reply Inline Actions Isn't UndefValue is a type of Constant? Maybe add a comment explaining what you're doing here as its not clear, at least to me. RKSimon: Isn't UndefValue is a type of Constant? Maybe add a comment explaining what you're doing here…
		Mask.append(NumOfInstructions, Builder.getInt1(/V=/true));
		Mask.append(NormalizedSz - NumOfInstructions,
		Builder.getInt1(/V=/false));
		VecLI = Builder.CreateMaskedGather(VecPtr, CommonAlignment,
		ConstantVector::get(Mask));
		V = propagateMetadata(VecLI, llvm::to_vector<4>(InstructionsOnly));
}		}
Value *V = propagateMetadata(NewLI, E->Scalars);

ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Store: {		case Instruction::Store: {
bool IsReorder = !E->ReorderIndices.empty();		bool IsReorder = E->updateStateIfReorder();
auto *SI = cast<StoreInst>(		if (IsReorder)
IsReorder ? E->Scalars[E->ReorderIndices.front()] : VL0);		VL0 = E->getMainOp();
		auto *SI = cast<StoreInst>(VL0);
unsigned AS = SI->getPointerAddressSpace();		unsigned AS = SI->getPointerAddressSpace();

setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *VecValue = vectorizeTree(E->getOperand(0));		Value *VecValue =
		vectorizeTree(E->getOperand(0),
		PowerOf2Ceil(std::distance(InstructionsOnly.begin(),
		InstructionsOnly.end())));
ShuffleBuilder.addMask(E->ReorderIndices);		ShuffleBuilder.addMask(E->ReorderIndices);
VecValue = ShuffleBuilder.finalize(VecValue);		VecValue = ShuffleBuilder.finalize(VecValue);

Value *ScalarPtr = SI->getPointerOperand();		Value *ScalarPtr = SI->getPointerOperand();
Value *VecPtr = Builder.CreateBitCast(
ScalarPtr, VecValue->getType()->getPointerTo(AS));		Align Alignment = SI->getAlign();
StoreInst *ST = Builder.CreateAlignedStore(VecValue, VecPtr,		unsigned MinIdx;
SI->getAlign());		unsigned MaxIdx;
		if (E->ReorderIndices.empty()) {
		MinIdx = std::distance(E->Scalars.begin(),
		find_if(E->Scalars, Instruction::classof));
		MaxIdx =
		std::distance(
		E->Scalars.begin(),
		find_if(reverse(E->Scalars), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		Value *VecPtr;
		Instruction *VecSI;
		if (std::distance(InstructionsOnly.begin(), InstructionsOnly.end()) ==
		SelfVF) {
		VecPtr = Builder.CreateBitCast(
		ScalarPtr,
		FixedVectorType::get(ScalarTy, SelfVF)->getPointerTo(AS));
		spatelUnsubmitted Not Done Reply Inline Actions Similar to above (so can we add a helper function to avoid duplicating the code?): Please add code comment/example to explain what the difference is between these 2 clauses. spatel: Similar to above (so can we add a helper function to avoid duplicating the code?): Please add…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Fixed, thanks! ABataev: Fixed, thanks!
		VecSI = Builder.CreateAlignedStore(VecValue, VecPtr, Alignment);
		} else {
		VecPtr = Builder.CreateBitCast(ScalarPtr,
		VecValue->getType()->getPointerTo(AS));
		SmallVector<Constant , 4> Mask(SelfVF, Builder.getInt1(/V=*/false));
		for (unsigned I = 0; I < SelfVF; ++I) {
		if (E->ReorderIndices[I] != SelfVF)
		Mask[I] = Builder.getInt1(/V=/true);
		}
		VecSI = Builder.CreateMaskedStore(VecValue, VecPtr, Alignment,
		ConstantVector::get(Mask));
		}

// The pointer operand uses an in-tree scalar, so add the new BitCast to		// The pointer operand uses an in-tree scalar, so add the new BitCast to
// ExternalUses to make sure that an extract will be generated in the		// ExternalUses to make sure that an extract will be generated in the
// future.		// future.
if (getTreeEntry(ScalarPtr))		if (getTreeEntry(ScalarPtr))
ExternalUses.push_back(ExternalUser(ScalarPtr, cast<User>(VecPtr), 0));		ExternalUses.emplace_back(ScalarPtr, cast<User>(VecPtr), 0);

Value *V = propagateMetadata(ST, E->Scalars);		Value *V = propagateMetadata(VecSI, llvm::to_vector<4>(InstructionsOnly));

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op0 = vectorizeTree(E->getOperand(0));		Value *Op0 = vectorizeTree(E->getOperand(0), SelfVF);

std::vector<Value *> OpVecs;		std::vector<Value *> OpVecs;
for (int j = 1, e = cast<GetElementPtrInst>(VL0)->getNumOperands(); j < e;		for (int j = 1, e = cast<GetElementPtrInst>(VL0)->getNumOperands(); j < e;
++j) {		++j) {
ValueList &VL = E->getOperand(j);		ValueList &VL = E->getOperand(j);
// Need to cast all elements to the same type before vectorization to		// Need to cast all elements to the same type before vectorization to
// avoid crash.		// avoid crash.
Type *VL0Ty = VL0->getOperand(j)->getType();		Type *VL0Ty = VL0->getOperand(j)->getType();
Type *Ty = llvm::all_of(		Type *Ty = llvm::all_of(
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions hmm, it might be unsafe to try to obtain type here since any element of VL could be Undef? dtemirbulatov: hmm, it might be unsafe to try to obtain type here since any element of VL could be Undef?
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @e = dso_local local_unnamed_addr global i32 0, align 4 @f = dso_local local_unnamed_addr global i32 0, align 4 ; Function Attrs: nofree norecurse nounwind uwtable define dso_local i32 @g() local_unnamed_addr #0 { entry: %0 = load i32, i32* @e, align 4 %tobool.not19 = icmp eq i32 %0, 0 br i1 %tobool.not19, label %while.end, label %while.body while.body: ; preds = %entry, %while.body.backedge %c.022 = phi i32* [ %c.022.be, %while.body.backedge ], [ undef, %entry ] %b.021 = phi i32* [ %b.021.be, %while.body.backedge ], [ undef, %entry ] %a.020 = phi i32* [ %a.020.be, %while.body.backedge ], [ undef, %entry ] %incdec.ptr = getelementptr inbounds i32, i32* %c.022, i64 1 %1 = ptrtoint i32* %c.022 to i64 %2 = trunc i64 %1 to i32 %incdec.ptr1 = getelementptr inbounds i32, i32* %a.020, i64 1 %incdec.ptr2 = getelementptr inbounds i32, i32* %b.021, i64 1 switch i32 %2, label %while.body.backedge [ i32 2, label %sw.bb i32 4, label %sw.bb6 ] sw.bb: ; preds = %while.body %incdec.ptr3 = getelementptr inbounds i32, i32* %b.021, i64 2 %3 = ptrtoint i32* %incdec.ptr2 to i64 %4 = trunc i64 %3 to i32 %incdec.ptr4 = getelementptr inbounds i32, i32* %a.020, i64 2 store i32 %4, i32* %incdec.ptr1, align 4 %incdec.ptr5 = getelementptr inbounds i32, i32* %c.022, i64 2 br label %while.body.backedge sw.bb6: ; preds = %while.body %incdec.ptr7 = getelementptr inbounds i32, i32* %a.020, i64 2 %incdec.ptr8 = getelementptr inbounds i32, i32* %c.022, i64 2 %5 = ptrtoint i32* %incdec.ptr to i64 %6 = trunc i64 %5 to i32 %incdec.ptr9 = getelementptr inbounds i32, i32* %b.021, i64 2 store i32 %6, i32* %incdec.ptr2, align 4 br label %while.body.backedge while.body.backedge: ; preds = %sw.bb6, %while.body, %sw.bb %c.022.be = phi i32* [ %incdec.ptr, %while.body ], [ %incdec.ptr8, %sw.bb6 ], [ %incdec.ptr5, %sw.bb ] %b.021.be = phi i32* [ %incdec.ptr2, %while.body ], [ %incdec.ptr9, %sw.bb6 ], [ %incdec.ptr3, %sw.bb ] %a.020.be = phi i32* [ %incdec.ptr1, %while.body ], [ %incdec.ptr7, %sw.bb6 ], [ %incdec.ptr4, %sw.bb ] br label %while.body while.end: ; preds = %entry ret i32 undef } attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+avx,+avx2,+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" } dtemirbulatov: target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"…
		ABataevAuthorUnsubmitted Done Reply Inline Actions UndefValue also has an associated type, so it should be fine. Your reproducer crashes because of different reasons. ABataev: UndefValue also has an associated type, so it should be fine. Your reproducer crashes because…
VL, [VL0Ty](Value *V) { return VL0Ty == V->getType(); })		VL, [VL0Ty](Value *V) { return VL0Ty == V->getType(); })
? VL0Ty		? VL0Ty
: DL->getIndexType(cast<GetElementPtrInst>(VL0)		: DL->getIndexType(cast<GetElementPtrInst>(VL0)
->getPointerOperandType()		->getPointerOperandType()
->getScalarType());		->getScalarType());
for (Value *&V : VL) {		for (Value *&V : VL) {
		if (isa<UndefValue>(V))
		continue;
auto *CI = cast<ConstantInt>(V);		auto *CI = cast<ConstantInt>(V);
V = ConstantExpr::getIntegerCast(CI, Ty,		V = ConstantExpr::getIntegerCast(CI, Ty,
CI->getValue().isSignBitSet());		CI->getValue().isSignBitSet());
}		}
Value *OpVec = vectorizeTree(VL);		Value *OpVec = vectorizeTree(VL, SelfVF);
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Value *V = Builder.CreateGEP(		Value *V = Builder.CreateGEP(
cast<GetElementPtrInst>(VL0)->getSourceElementType(), Op0, OpVecs);		cast<GetElementPtrInst>(VL0)->getSourceElementType(), Op0, OpVecs);
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, llvm::to_vector<4>(InstructionsOnly));

ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;

return V;		return V;
Show All 20 Lines	case Instruction::Call: {
// vectorized.		// vectorized.
if (UseIntrinsic && hasVectorInstrinsicScalarOpd(IID, j)) {		if (UseIntrinsic && hasVectorInstrinsicScalarOpd(IID, j)) {
CallInst *CEI = cast<CallInst>(VL0);		CallInst *CEI = cast<CallInst>(VL0);
ScalarArg = CEI->getArgOperand(j);		ScalarArg = CEI->getArgOperand(j);
OpVecs.push_back(CEI->getArgOperand(j));		OpVecs.push_back(CEI->getArgOperand(j));
continue;		continue;
}		}

Value *OpVec = vectorizeTree(E->getOperand(j));		Value *OpVec = vectorizeTree(E->getOperand(j), SelfVF);
LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");		LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Function *CF;		Function *CF;
if (!UseIntrinsic) {		if (!UseIntrinsic) {
VFShape Shape =		VFShape Shape = VFShape::get(*CI, ElementCount::getFixed(SelfVF),
VFShape::get(*CI, ElementCount::getFixed(static_cast<unsigned>(
VecTy->getNumElements())),
false /HasGlobalPred/);		false /HasGlobalPred/);
CF = VFDatabase(*CI).getVectorizedFunction(Shape);		CF = VFDatabase(*CI).getVectorizedFunction(Shape);
} else {		} else {
Type *Tys[] = {FixedVectorType::get(CI->getType(), E->Scalars.size())};		Type *Tys[] = {FixedVectorType::get(CI->getType(), SelfVF)};
CF = Intrinsic::getDeclaration(F->getParent(), ID, Tys);		CF = Intrinsic::getDeclaration(F->getParent(), ID, Tys);
}		}

SmallVector<OperandBundleDef, 1> OpBundles;		SmallVector<OperandBundleDef, 1> OpBundles;
CI->getOperandBundlesAsDefs(OpBundles);		CI->getOperandBundlesAsDefs(OpBundles);
Value *V = Builder.CreateCall(CF, OpVecs, OpBundles);		Value *V = Builder.CreateCall(CF, OpVecs, OpBundles);

// The scalar argument uses an in-tree scalar so we add the new vectorized		// The scalar argument uses an in-tree scalar so we add the new vectorized
// call to ExternalUses list to make sure that an extract will be		// call to ExternalUses list to make sure that an extract will be
// generated in the future.		// generated in the future.
if (ScalarArg && getTreeEntry(ScalarArg))		if (ScalarArg && getTreeEntry(ScalarArg))
ExternalUses.push_back(ExternalUser(ScalarArg, cast<User>(V), 0));		ExternalUses.push_back(ExternalUser(ScalarArg, cast<User>(V), 0));

propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, to_vector<4>(InstructionsOnly), VL0);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
assert(E->isAltShuffle() &&		assert(E->isAltShuffle() &&
((Instruction::isBinaryOp(E->getOpcode()) &&		((Instruction::isBinaryOp(E->getOpcode()) &&
Instruction::isBinaryOp(E->getAltOpcode())) \|\|		Instruction::isBinaryOp(E->getAltOpcode())) \|\|
(Instruction::isCast(E->getOpcode()) &&		(Instruction::isCast(E->getOpcode()) &&
Instruction::isCast(E->getAltOpcode()))) &&		Instruction::isCast(E->getAltOpcode()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");

Value LHS = nullptr, RHS = nullptr;		Value LHS = nullptr, RHS = nullptr;
if (Instruction::isBinaryOp(E->getOpcode())) {		if (Instruction::isBinaryOp(E->getOpcode())) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), SelfVF);
RHS = vectorizeTree(E->getOperand(1));		RHS = vectorizeTree(E->getOperand(1), SelfVF);
} else {		} else {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), SelfVF);
}		}

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value V0, V1;		Value V0, V1;
if (Instruction::isBinaryOp(E->getOpcode())) {		if (Instruction::isBinaryOp(E->getOpcode())) {
V0 = Builder.CreateBinOp(		V0 = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS, RHS);		static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS, RHS);
V1 = Builder.CreateBinOp(		V1 = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getAltOpcode()), LHS, RHS);		static_cast<Instruction::BinaryOps>(E->getAltOpcode()), LHS, RHS);
} else {		} else {
V0 = Builder.CreateCast(		V0 = Builder.CreateCast(
static_cast<Instruction::CastOps>(E->getOpcode()), LHS, VecTy);		static_cast<Instruction::CastOps>(E->getOpcode()), LHS, VecTy);
V1 = Builder.CreateCast(		V1 = Builder.CreateCast(
static_cast<Instruction::CastOps>(E->getAltOpcode()), LHS, VecTy);		static_cast<Instruction::CastOps>(E->getAltOpcode()), LHS, VecTy);
}		}

// Create shuffle to take alternate operations from the vector.		// Create shuffle to take alternate operations from the vector.
// Also, gather up main and alt scalar ops to propagate IR flags to		// Also, gather up main and alt scalar ops to propagate IR flags to
// each vector operation.		// each vector operation.
ValueList OpScalars, AltScalars;		ValueList OpScalars, AltScalars;
unsigned e = E->Scalars.size();		SmallVector<int, 8> Mask(SelfVF);
SmallVector<int, 8> Mask(e);		for (unsigned i = 0; i < SelfVF; ++i) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
for (unsigned i = 0; i < e; ++i) {		if (isa<UndefValue>(E->Scalars[i])) {
		Mask[i] = i;
		OpScalars.push_back(E->Scalars[i]);
		continue;
		}
auto *OpInst = cast<Instruction>(E->Scalars[i]);		auto *OpInst = cast<Instruction>(E->Scalars[i]);
assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");		assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");
if (OpInst->getOpcode() == E->getAltOpcode()) {		if (OpInst->getOpcode() == E->getAltOpcode()) {
Mask[i] = e + i;		Mask[i] = SelfVF + i;
AltScalars.push_back(E->Scalars[i]);		AltScalars.push_back(E->Scalars[i]);
} else {		} else {
Mask[i] = i;		Mask[i] = i;
OpScalars.push_back(E->Scalars[i]);		OpScalars.push_back(E->Scalars[i]);
}		}
}		}

propagateIRFlags(V0, OpScalars);		propagateIRFlags(V0, OpScalars);
propagateIRFlags(V1, AltScalars);		propagateIRFlags(V1, AltScalars);

Value *V = Builder.CreateShuffleVector(V0, V1, Mask);		Value *V = Builder.CreateShuffleVector(V0, V1, Mask);
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, llvm::to_vector<4>(InstructionsOnly));
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;

return V;		return V;
}		}
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	for (const auto &ExternalUse : ExternalUses) {
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
continue;		continue;
TreeEntry *E = getTreeEntry(Scalar);		TreeEntry *E = getTreeEntry(Scalar);
assert(E && "Invalid scalar");		assert(E && "Invalid scalar");
assert(E->State != TreeEntry::NeedToGather &&		assert(E->State != TreeEntry::NeedToGather &&
"Extracting from a gather list");		"Extracting from a gather list");

Value *Vec = E->VectorizedValue;		Value *Vec = E->VectorizedValue;
		if (!Vec && E->getOpcode() == Instruction::Load &&
		E->UserTreeIndices.empty() && E != VectorizableTree[0].get())
		Vec = vectorizeTree(E);
assert(Vec && "Can't find vectorizable value");		assert(Vec && "Can't find vectorizable value");

Value *Lane = Builder.getInt32(ExternalUse.Lane);		Value *Lane = Builder.getInt32(ExternalUse.Lane);
// If User == nullptr, the Scalar is used as extra arg. Generate		// If User == nullptr, the Scalar is used as extra arg. Generate
// ExtractElement instruction and update the record for this scalar in		// ExtractElement instruction and update the record for this scalar in
// ExternallyUsedValues.		// ExternallyUsedValues.
if (!User) {		if (!User) {
assert(ExternallyUsedValues.count(Scalar) &&		assert(ExternallyUsedValues.count(Scalar) &&
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	for (auto &TEPtr : VectorizableTree) {
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

assert(Entry->VectorizedValue && "Can't find vectorizable value");		assert(Entry->VectorizedValue && "Can't find vectorizable value");

// For each lane:		// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];
		if (isa<UndefValue>(Scalar))
		continue;

#ifndef NDEBUG		#ifndef NDEBUG
Type *Ty = Scalar->getType();		Type *Ty = Scalar->getType();
if (!Ty->isVoidTy()) {		if (!Ty->isVoidTy()) {
for (User *U : Scalar->users()) {		for (User *U : Scalar->users()) {
LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");

// It is legal to delete users in the ignorelist.		// It is legal to delete users in the ignorelist.
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	auto &&TryScheduleBundle = [this, OldScheduleEnd, SLP](bool ReSchedule,
while (((!Bundle && ReSchedule) \|\| (Bundle && !Bundle->isReady())) &&		while (((!Bundle && ReSchedule) \|\| (Bundle && !Bundle->isReady())) &&
!ReadyInsts.empty()) {		!ReadyInsts.empty()) {
ScheduleData *Picked = ReadyInsts.pop_back_val();		ScheduleData *Picked = ReadyInsts.pop_back_val();
if (Picked->isSchedulingEntity() && Picked->isReady())		if (Picked->isSchedulingEntity() && Picked->isReady())
schedule(Picked, ReadyInsts);		schedule(Picked, ReadyInsts);
}		}
};		};

		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
// Make sure that the scheduling region contains all		// Make sure that the scheduling region contains all
// instructions of the bundle.		// instructions of the bundle.
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
if (!extendSchedulingRegion(V, S)) {		if (!extendSchedulingRegion(V, S)) {
// If the scheduling region got new instructions at the lower end (or it		// If the scheduling region got new instructions at the lower end (or it
// is a new region for the first bundle). This makes it necessary to		// is a new region for the first bundle). This makes it necessary to
// recalculate all dependencies.		// recalculate all dependencies.
// Otherwise the compiler may crash trying to incorrectly calculate		// Otherwise the compiler may crash trying to incorrectly calculate
// dependencies and emit instruction in the wrong order at the actual		// dependencies and emit instruction in the wrong order at the actual
// scheduling.		// scheduling.
TryScheduleBundle(/ReSchedule=/false, nullptr);		TryScheduleBundle(/ReSchedule=/false, nullptr);
return None;		return None;
}		}
}		}

for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
ScheduleData *BundleMember = getScheduleData(V);		ScheduleData *BundleMember = getScheduleData(V);
assert(BundleMember &&		assert(BundleMember &&
"no ScheduleData for bundle member (maybe not in same basic block)");		"no ScheduleData for bundle member (maybe not in same basic block)");
if (BundleMember->IsScheduled) {		if (BundleMember->IsScheduled) {
// A bundle member was scheduled as single instruction before and now		// A bundle member was scheduled as single instruction before and now
// needs to be scheduled as part of the bundle. We just get rid of the		// needs to be scheduled as part of the bundle. We just get rid of the
// existing schedule.		// existing schedule.
LLVM_DEBUG(dbgs() << "SLP: reset schedule because " << *BundleMember		LLVM_DEBUG(dbgs() << "SLP: reset schedule because " << *BundleMember
Show All 27 Lines	void BoUpSLP::BlockScheduling::cancelScheduling(ArrayRef<Value *> VL,
Value *OpValue) {		Value *OpValue) {
if (isa<PHINode>(OpValue))		if (isa<PHINode>(OpValue))
return;		return;

ScheduleData *Bundle = getScheduleData(OpValue);		ScheduleData *Bundle = getScheduleData(OpValue);
LLVM_DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");		LLVM_DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");
assert(!Bundle->IsScheduled &&		assert(!Bundle->IsScheduled &&
"Can't cancel bundle which is already scheduled");		"Can't cancel bundle which is already scheduled");
assert(Bundle->isSchedulingEntity() && Bundle->isPartOfBundle() &&		assert(Bundle->isSchedulingEntity() &&
		(Bundle->isPartOfBundle() \|\|
		llvm::count_if(VL, Instruction::classof) == 1) &&
"tried to unbundle something which is not a bundle");		"tried to unbundle something which is not a bundle");

// Un-bundle: make single instructions out of the bundle.		// Un-bundle: make single instructions out of the bundle.
ScheduleData *BundleMember = Bundle;		ScheduleData *BundleMember = Bundle;
while (BundleMember) {		while (BundleMember) {
assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");		assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");
BundleMember->FirstInBundle = BundleMember;		BundleMember->FirstInBundle = BundleMember;
ScheduleData *Next = BundleMember->NextInBundle;		ScheduleData *Next = BundleMember->NextInBundle;
▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	void BoUpSLP::scheduleBlock(BlockScheduling *BS) {
// Ensure that all dependency data is updated and fill the ready-list with		// Ensure that all dependency data is updated and fill the ready-list with
// initial instructions.		// initial instructions.
int Idx = 0;		int Idx = 0;
int NumToSchedule = 0;		int NumToSchedule = 0;
for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;		for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;
I = I->getNextNode()) {		I = I->getNextNode()) {
BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {		BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {
assert(SD->isPartOfBundle() ==		assert(SD->isPartOfBundle() ==
(getTreeEntry(SD->Inst) != nullptr) &&		(getTreeEntry(SD->Inst) != nullptr &&
		llvm::count_if(getTreeEntry(SD->Inst)->Scalars,
		Instruction::classof) > 1) &&
"scheduler and vectorizer bundle mismatch");		"scheduler and vectorizer bundle mismatch");
SD->FirstInBundle->SchedulingPriority = Idx++;		SD->FirstInBundle->SchedulingPriority = Idx++;
if (SD->isSchedulingEntity()) {		if (SD->isSchedulingEntity()) {
BS->calculateDependencies(SD, false, this);		BS->calculateDependencies(SD, false, this);
NumToSchedule++;		NumToSchedule++;
}		}
});		});
}		}
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	while (!Worklist.empty() && !FoundUnknownInst) {
else		else
FoundUnknownInst = true;		FoundUnknownInst = true;
}		}

int Width = MaxWidth;		int Width = MaxWidth;
// If we didn't encounter a memory access in the expression tree, or if we		// If we didn't encounter a memory access in the expression tree, or if we
// gave up for some reason, just return the width of V. Otherwise, return the		// gave up for some reason, just return the width of V. Otherwise, return the
// maximum width we found.		// maximum width we found.
if (!MaxWidth \|\| FoundUnknownInst)		if (!MaxWidth \|\| FoundUnknownInst) {
Width = DL->getTypeSizeInBits(V->getType());		// For cmp instructions use the size of its operands, not the size of i1
		// type.
		if (auto *CI = dyn_cast<CmpInst>(V))
		V = CI->getOperand(0);
		Width = std::max<int>(Width, DL->getTypeSizeInBits(V->getType()));
		}

for (Instruction *I : Visited)		for (Instruction *I : Visited)
InstrElementSize[I] = Width;		InstrElementSize[I] = Width;

return Width;		return Width;
}		}

// Determine if a value V in a vectorizable expression Expr can be demoted to a		// Determine if a value V in a vectorizable expression Expr can be demoted to a
▲ Show 20 Lines • Show All 345 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::runImpl(Function &F, ScalarEvolution *SE_,

if (Changed) {		if (Changed) {
R.optimizeGatherSequence();		R.optimizeGatherSequence();
LLVM_DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");		LLVM_DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");
}		}
return Changed;		return Changed;
}		}

		/// Order may have elements assigned special value (size) which is out of
		/// bounds. Such indices only appear on places which correspond to undef values
		/// (see canReuseExtract for details) and used in order to avoid undef values
		/// have effect on operands ordering.
		/// The first loop below simply finds all unused indices and then the next loop
		/// nest assigns these indecies for undef values positions.
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions typo: "indeces" anton-afanasyev: typo: "indeces"
		/// As an example below Order has two undef positions and they have assigned
		/// values 3 and 7 respectively:
		/// before: 6 9 5 4 9 2 1 0
		/// after: 6 3 5 4 7 2 1 0
		static void fixupOrderingIndicies(SmallVectorImpl<unsigned> &Order) {
		const unsigned Sz = Order.size();
		SmallBitVector UsedIndices(Sz);
		const unsigned BoundVal = Sz;
		for (unsigned I : Order)
		if (I != BoundVal)
		UsedIndices[I] = true;
		unsigned Idx = 0;
		for (unsigned &I : Order) {
		if (I == BoundVal) {
		// Find first non-used index.
		for (; Idx != Sz; ++Idx)
		if (!UsedIndices[Idx])
		break;
		// Set correct index.
		I = Idx;
		++Idx;
		}
		}
		}

bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,		bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,
unsigned Idx) {		unsigned Idx) {
LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << Chain.size()		LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << Chain.size()
<< "\n");		<< "\n");
const unsigned Sz = R.getVectorElementSize(Chain[0]);		const unsigned Sz = R.getVectorElementSize(Chain[0]);
const unsigned MinVF = R.getMinVecRegSize() / Sz;
unsigned VF = Chain.size();		unsigned VF = Chain.size();

if (!isPowerOf2_32(Sz) \|\| !isPowerOf2_32(VF) \|\| VF < 2 \|\| VF < MinVF)		if (!isPowerOf2_32(Sz) \|\| VF < 2)
		vdmitrieUnsubmitted Not Done Reply Inline Actions PowerOf2Ceil(VF) < MinVF vdmitrie: PowerOf2Ceil(VF) < MinVF
return false;		return false;

		const unsigned MinVF = R.getMinVecRegSize() / Sz;
		SmallVector<Value *, 8> FixedChain;
		unsigned NewSize = PowerOf2Ceil(std::max(VF, MinVF));
		if (NewSize != VF) {
		FixedChain.reserve(NewSize);
		FixedChain.append(Chain.begin(), Chain.end());
		FixedChain.append(NewSize - Chain.size(),
		UndefValue::get(Chain[0]->getType()));
		Chain = FixedChain;
		VF = NewSize;
		}
LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << Idx		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << Idx
<< "\n");		<< "\n");

R.buildTree(Chain);		R.buildTree(Chain);
Optional<ArrayRef<unsigned>> Order = R.bestOrder();		Optional<ArrayRef<unsigned>> Order = R.bestOrder();
// TODO: Handle orders of size less than number of elements in the vector.		// TODO: Handle orders of size less than number of elements in the vector.
if (Order && Order->size() == Chain.size()) {		if (Order && Order->size() == Chain.size()) {
		SmallVector<unsigned, 4> NewOrder(Order->begin(), Order->end());
		fixupOrderingIndicies(NewOrder);
		RKSimonUnsubmitted Done Reply Inline Actions Would SmallBitVector be cheaper for UsedIndices ? RKSimon: Would SmallBitVector be cheaper for UsedIndices ?
// TODO: reorder tree nodes without tree rebuilding.		// TODO: reorder tree nodes without tree rebuilding.
SmallVector<Value *, 4> ReorderedOps(Chain.rbegin(), Chain.rend());		SmallVector<Value *, 4> ReorderedOps(Chain.rbegin(), Chain.rend());
llvm::transform(*Order, ReorderedOps.begin(),		transform(NewOrder, ReorderedOps.begin(),
[Chain](const unsigned Idx) { return Chain[Idx]; });		[Chain](const unsigned Idx) { return Chain[Idx]; });
R.buildTree(ReorderedOps);		R.buildTree(ReorderedOps);
}		}
if (R.isTreeTinyAndNotFullyVectorizable())		if (R.isTreeTinyAndNotFullyVectorizable())
return false;		return false;
if (R.isLoadCombineCandidate())		if (R.isLoadCombineCandidate())
return false;		return false;

R.computeMinimumValueSizes();		R.computeMinimumValueSizes();
Show All 23 Lines	bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
BoUpSLP &R) {		BoUpSLP &R) {
// We may run into multiple chains that merge into a single chain. We mark the		// We may run into multiple chains that merge into a single chain. We mark the
// stores that we vectorized so that we don't visit the same store twice.		// stores that we vectorized so that we don't visit the same store twice.
BoUpSLP::ValueSet VectorizedStores;		BoUpSLP::ValueSet VectorizedStores;
bool Changed = false;		bool Changed = false;

int E = Stores.size();		int E = Stores.size();
SmallBitVector Tails(E, false);		SmallBitVector Tails(E, false);
SmallVector<int, 16> ConsecutiveChain(E, E + 1);
int MaxIter = MaxStoreLookup.getValue();		int MaxIter = MaxStoreLookup.getValue();
		// If a vector register can't hold 1 element, we are done.
		unsigned MaxVecRegSize = R.getMaxVecRegSize();
		unsigned EltSize = R.getVectorElementSize(Stores.front());
		if (MaxVecRegSize % EltSize != 0)
		return false;

		int MaxElts = PowerOf2Floor(MaxVecRegSize / EltSize);
		SmallVector<std::pair<int, int>, 16> ConsecutiveChain(
		E, std::make_pair(E, INT_MAX));
		SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E, false));
int IterCnt;		int IterCnt;
auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt, MaxIter,		auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt, MaxIter,
		&CheckedPairs,
&ConsecutiveChain](int K, int Idx) {		&ConsecutiveChain](int K, int Idx) {
if (IterCnt >= MaxIter)		if (IterCnt >= MaxIter)
return true;		return true;
		if (CheckedPairs[Idx].test(K))
		return ConsecutiveChain[K].second == 1 &&
		ConsecutiveChain[K].first == Idx;
++IterCnt;		++IterCnt;
if (!isConsecutiveAccess(Stores[K], Stores[Idx], DL, SE))		CheckedPairs[Idx].set(K);
		CheckedPairs[K].set(Idx);
		Optional<int> Diff =
		getPointersDiff(Stores[K]->getPointerOperand(),
		Stores[Idx]->getPointerOperand(), DL, SE);
		if (!Diff \|\| *Diff == 0)
		return false;
		int Val = *Diff;
		if (Val < 0) {
		if (ConsecutiveChain[Idx].second > -Val) {
		Tails.set(K);
		ConsecutiveChain[Idx] = std::make_pair(K, -Val);
		}
		return false;
		}
		if (ConsecutiveChain[K].second <= Val)
return false;		return false;

Tails.set(Idx);		Tails.set(Idx);
ConsecutiveChain[K] = Idx;		ConsecutiveChain[K] = std::make_pair(Idx, Val);
return true;		return Val == 1;
};		};
// Do a quadratic search on all of the given stores in reverse order and find		// Do a quadratic search on all of the given stores in reverse order and find
// all of the pairs of stores that follow each other.		// all of the pairs of stores that follow each other.
for (int Idx = E - 1; Idx >= 0; --Idx) {		for (int Idx = E - 1; Idx >= 0; --Idx) {
// If a store has multiple consecutive store candidates, search according		// If a store has multiple consecutive store candidates, search according
// to the sequence: Idx-1, Idx+1, Idx-2, Idx+2, ...		// to the sequence: Idx-1, Idx+1, Idx-2, Idx+2, ...
// This is because usually pairing with immediate succeeding or preceding		// This is because usually pairing with immediate succeeding or preceding
// candidate create the best chance to find slp vectorization opportunity.		// candidate create the best chance to find slp vectorization opportunity.
const int MaxLookDepth = std::max(E - Idx, Idx + 1);		const int MaxLookDepth = std::max(E - Idx, Idx + 1);
IterCnt = 0;		IterCnt = 0;
for (int Offset = 1, F = MaxLookDepth; Offset < F; ++Offset)		for (int Offset = 1, F = MaxLookDepth; Offset < F; ++Offset)
if ((Idx >= Offset && FindConsecutiveAccess(Idx - Offset, Idx)) \|\|		if ((Idx >= Offset && FindConsecutiveAccess(Idx - Offset, Idx)) \|\|
(Idx + Offset < E && FindConsecutiveAccess(Idx + Offset, Idx)))		(Idx + Offset < E && FindConsecutiveAccess(Idx + Offset, Idx)))
break;		break;
}		}

		// Check if we allow masked stores.
		int MinVF = PowerOf2Ceil(R.getMinVecRegSize() / EltSize);
		SmallBitVector MaskedStoresSupported(std::max<int>(MaxElts, MinVF) + 1,
		false);
		for (int I = MinVF; I <= MaxElts; I *= 2) {
		if (TTI->isLegalMaskedStore(
		FixedVectorType::get(Stores.front()->getValueOperand()->getType(),
		I),
		cast<StoreInst>(Stores.front())->getAlign()))
		MaskedStoresSupported.set(I);
		}

// For stores that start but don't end a link in the chain:		// For stores that start but don't end a link in the chain:
for (int Cnt = E; Cnt > 0; --Cnt) {		for (int Cnt = E; Cnt > 0; --Cnt) {
int I = Cnt - 1;		int I = Cnt - 1;
if (ConsecutiveChain[I] == E + 1 \|\| Tails.test(I))		if (ConsecutiveChain[I].first == E \|\| Tails.test(I))
continue;		continue;
// We found a store instr that starts a chain. Now follow the chain and try		// We found a store instr that starts a chain. Now follow the chain and try
// to vectorize it.		// to vectorize it.
BoUpSLP::ValueList Operands;		BoUpSLP::ValueList Operands;
// Collect the chain into a list.		// Collect the chain into a list.
while (I != E + 1 && !VectorizedStores.count(Stores[I])) {		while (I != E && !VectorizedStores.count(Stores[I])) {
Operands.push_back(Stores[I]);		Operands.push_back(Stores[I]);
		Tails.set(I);
		int VF = std::min(MaxElts,
		std::max<int>(MinVF, PowerOf2Ceil(Operands.size())));
		if (((!MaskedStoresSupported.test(VF) \|\|
		static_cast<int>(Operands.size()) <
		MinNonPow2StoresSize.getValue()) &&
		ConsecutiveChain[I].second != 1) \|\|
		ConsecutiveChain[I].second >= MaxElts) {
		// Mark the new end in the chain and go back, if required. It might be
		// required if the original stores comes in reversed order, for example.
		if (ConsecutiveChain[I].first != E &&
		Tails.test(ConsecutiveChain[I].first)) {
		Tails.reset(ConsecutiveChain[I].first);
		if (Cnt < ConsecutiveChain[I].first + 2)
		Cnt = ConsecutiveChain[I].first + 2;
		}
		break;
		}
// Move to the next value in the chain.		// Move to the next value in the chain.
I = ConsecutiveChain[I];		I = ConsecutiveChain[I].first;
}		}

// If a vector register can't hold 1 element, we are done.
unsigned MaxVecRegSize = R.getMaxVecRegSize();
unsigned EltSize = R.getVectorElementSize(Operands[0]);
if (MaxVecRegSize % EltSize != 0)
continue;

unsigned MaxElts = MaxVecRegSize / EltSize;
// FIXME: Is division-by-2 the correct step? Should we assert that the		// FIXME: Is division-by-2 the correct step? Should we assert that the
// register size is a power-of-2?		// register size is a power-of-2?
unsigned StartIdx = 0;		int StartIdx = 0;
for (unsigned Size = llvm::PowerOf2Ceil(MaxElts); Size >= 2; Size /= 2) {		int E = Operands.size();
for (unsigned Cnt = StartIdx, E = Operands.size(); Cnt + Size <= E;) {		int StartSize = std::min(MaxElts, std::max<int>(MinVF, PowerOf2Ceil(E)));
ArrayRef<Value *> Slice = makeArrayRef(Operands).slice(Cnt, Size);		for (int Size = StartSize; Size >= 2; Size /= 2) {
		bool IsLegalMaskedStores =
		MaskedStoresSupported.test(std::max(MinVF, Size));
		if (!IsLegalMaskedStores && Size < MinVF)
		continue;
		for (int Cnt = StartIdx; Cnt + 1 + Size / 2 <= E;) {
		int NumStores = std::min(Size, E - Cnt);
		// Try vectorization only if it is legal.
		if ((IsLegalMaskedStores &&
		NumStores >= MinNonPow2ValuesSize.getValue()) \|\|
		(NumStores >= MinVF && isPowerOf2_32(NumStores))) {
		ArrayRef<Value *> Slice =
		makeArrayRef(Operands).slice(Cnt, NumStores);
if (!VectorizedStores.count(Slice.front()) &&		if (!VectorizedStores.count(Slice.front()) &&
!VectorizedStores.count(Slice.back()) &&		!VectorizedStores.count(Slice.back()) &&
vectorizeStoreChain(Slice, R, Cnt)) {		vectorizeStoreChain(Slice, R, Cnt)) {
// Mark the vectorized stores so that we don't vectorize them again.		// Mark the vectorized stores so that we don't vectorize them again.
VectorizedStores.insert(Slice.begin(), Slice.end());		VectorizedStores.insert(Slice.begin(), Slice.end());
Changed = true;		Changed = true;
// If we vectorized initial block, no need to try to vectorize it		// If we vectorized initial block, no need to try to vectorize it
// again.		// again.
if (Cnt == StartIdx)		if (Cnt == StartIdx)
StartIdx += Size;		StartIdx += Size;
Cnt += Size;		Cnt += Size;
continue;		continue;
}		}
		}
++Cnt;		++Cnt;
}		}
// Check if the whole array was vectorized already - exit.		// Check if the whole array was vectorized already - exit.
if (StartIdx >= Operands.size())		if (StartIdx >= E)
break;		break;
}		}
}		}

return Changed;		return Changed;
}		}

void SLPVectorizerPass::collectSeedInstructions(BasicBlock *BB) {		void SLPVectorizerPass::collectSeedInstructions(BasicBlock *BB) {
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,
// we permit an alternate opcode via InstructionsState.		// we permit an alternate opcode via InstructionsState.
InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (!S.getOpcode())		if (!S.getOpcode())
return false;		return false;

Instruction *I0 = cast<Instruction>(S.OpValue);		Instruction *I0 = cast<Instruction>(S.OpValue);
// Make sure invalid types (including vector type) are rejected before		// Make sure invalid types (including vector type) are rejected before
// determining vectorization factor for scalar instructions.		// determining vectorization factor for scalar instructions.
for (Value *V : VL) {		for (Value *V : VL) {
		vdmitrieUnsubmitted Not Done Reply Inline Actions at 6187 already checked for VL size. vdmitrie: at 6187 already checked for VL size.
Type *Ty = V->getType();		Type *Ty = V->getType();
if (!isValidElementType(Ty)) {		if (!isValidElementType(Ty)) {
// NOTE: the following will give user internal llvm type name, which may		// NOTE: the following will give user internal llvm type name, which may
// not be useful.		// not be useful.
R.getORE()->emit([&]() {		R.getORE()->emit([&]() {
std::string type_str;		std::string type_str;
llvm::raw_string_ostream rso(type_str);		llvm::raw_string_ostream rso(type_str);
Ty->print(rso);		Ty->print(rso);
return OptimizationRemarkMissed(SV_NAME, "UnsupportedType", I0)		return OptimizationRemarkMissed(SV_NAME, "UnsupportedType", I0)
<< "Cannot SLP vectorize list: type "		<< "Cannot SLP vectorize list: type "
<< rso.str() + " is unsupported by vectorizer";		<< rso.str() + " is unsupported by vectorizer";
});		});
return false;		return false;
}		}
}		}

		int NumElts = VL.size();
unsigned Sz = R.getVectorElementSize(I0);		unsigned Sz = R.getVectorElementSize(I0);
unsigned MinVF = std::max(2U, R.getMinVecRegSize() / Sz);		unsigned MinVF = std::max(2U, R.getMinVecRegSize() / Sz);
unsigned MaxVF = std::max<unsigned>(PowerOf2Floor(VL.size()), MinVF);		unsigned MaxVF = std::max<unsigned>(NumElts >= MinNonPow2ValuesSize.getValue()
		? PowerOf2Ceil(NumElts)
		: PowerOf2Floor(NumElts),
		MinVF);
MaxVF = std::min(R.getMaximumVF(Sz, S.getOpcode()), MaxVF);		MaxVF = std::min(R.getMaximumVF(Sz, S.getOpcode()), MaxVF);
if (MaxVF < 2) {		if (MaxVF < 2) {
R.getORE()->emit([&]() {		R.getORE()->emit([&]() {
return OptimizationRemarkMissed(SV_NAME, "SmallVF", I0)		return OptimizationRemarkMissed(SV_NAME, "SmallVF", I0)
<< "Cannot SLP vectorize list: vectorization factor "		<< "Cannot SLP vectorize list: vectorization factor "
<< "less than 2 is not supported";		<< "less than 2 is not supported";
});		});
return false;		return false;
}		}

bool Changed = false;		bool Changed = false;
bool CandidateFound = false;		bool CandidateFound = false;
InstructionCost MinCost = SLPCostThreshold.getValue();		InstructionCost MinCost = SLPCostThreshold.getValue();

bool CompensateUseCost =		bool CompensateUseCost =
!InsertUses.empty() && llvm::all_of(InsertUses, [](const Value *V) {		!InsertUses.empty() && llvm::all_of(InsertUses, [](const Value *V) {
return V && isa<InsertElementInst>(V);		return isa_and_nonnull<InsertElementInst>(V);
});		});
		SmallVector<Value *, 4> NormalizedVL;
		if (!CompensateUseCost && MaxVF > VL.size()) {
		NormalizedVL.append(VL.begin(), VL.end());
		NormalizedVL.append(MaxVF - VL.size(), UndefValue::get(I0->getType()));
		VL = NormalizedVL;
		}

assert((!CompensateUseCost \|\| InsertUses.size() == VL.size()) &&		assert((!CompensateUseCost \|\| InsertUses.size() == VL.size()) &&
"Each scalar expected to have an associated InsertElement user.");		"Each scalar expected to have an associated InsertElement user.");

unsigned NextInst = 0, MaxInst = VL.size();		unsigned NextInst = 0, MaxInst = NumElts;
		bool Width3Tried = MaxVF < 4;
for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {		for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {
// No actual vectorization should happen, if number of parts is the same as		// No actual vectorization should happen, if number of parts is the same as
// provided vectorization factor (i.e. the scalar type is used for vector		// provided vectorization factor (i.e. the scalar type is used for vector
// code during codegen).		// code during codegen).
auto *VecTy = FixedVectorType::get(VL[0]->getType(), VF);		auto *VecTy = FixedVectorType::get(VL[0]->getType(), VF);
if (TTI->getNumberOfParts(VecTy) == VF)		if (TTI->getNumberOfParts(VecTy) == VF)
continue;		continue;
		int Width = VF;
		// Try the vectorization factor 4 once again if tried VF 4 already, but try
		// to vectorize bundles of 3 elements. Try VF 2 after bundles size 3.
		if (VF == 2 && !Width3Tried) {
		VF = 4;
		Width = 3;
		Width3Tried = true;
		}
for (unsigned I = NextInst; I < MaxInst; ++I) {		for (unsigned I = NextInst; I < MaxInst; ++I) {
unsigned OpsWidth = 0;		unsigned OpsWidth = 0;

if (I + VF > MaxInst)		if (I + Width > MaxInst)
OpsWidth = MaxInst - I;		OpsWidth = MaxInst - I;
else		else
OpsWidth = VF;		OpsWidth = Width;

if (!isPowerOf2_32(OpsWidth) \|\| OpsWidth < 2)		if ((Width == 3 && OpsWidth != 3) \|\| (VF > MinVF && OpsWidth <= VF / 2) \|\|
		(VF == MinVF && OpsWidth < 2))
break;		break;

ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);		ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);
// Check that a previous iteration of this loop did not delete the Value.		// Check that a previous iteration of this loop did not delete the Value.
if (llvm::any_of(Ops, [&R](Value *V) {		if (llvm::any_of(Ops, [&R](Value *V) {
auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
return I && R.isDeleted(I);		return I && R.isDeleted(I);
}))		}))
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "
<< "\n");		<< "\n");
		SmallVector<Value *, 8> FixedChain;
		if (OpsWidth != VF) {
		unsigned NewSize = VF;
		FixedChain.reserve(NewSize);
		FixedChain.append(Ops.begin(), Ops.end());
		FixedChain.append(NewSize - Ops.size(),
		UndefValue::get(Ops[0]->getType()));
		Ops = FixedChain;
		}
		assert(Ops.size() == VF &&
		"Operations must have same size as vectorization factor.");

R.buildTree(Ops);		R.buildTree(Ops);
		if (AllowReorder) {
Optional<ArrayRef<unsigned>> Order = R.bestOrder();		Optional<ArrayRef<unsigned>> Order = R.bestOrder();
// TODO: check if we can allow reordering for more cases.		if (Order) {
		vdmitrieUnsubmitted Not Done Reply Inline Actions if -else bodies are exactly the same. With OpsWidth !=VF there is still possibility to bypass it depending on UserCost and AllowReorder values. It should be either assertion to ensure it never happens or "break". vdmitrie: 1) if -else bodies are exactly the same. 2) With OpsWidth !=VF there is still possibility to…
if (AllowReorder && Order) {
// TODO: reorder tree nodes without tree rebuilding.		// TODO: reorder tree nodes without tree rebuilding.
// Conceptually, there is nothing actually preventing us from trying to		SmallVector<unsigned, 4> NewOrder(Order->begin(), Order->end());
// reorder a larger list. In fact, we do exactly this when vectorizing		fixupOrderingIndicies(NewOrder);
// reductions. However, at this point, we only expect to get here when		SmallVector<Value *, 4> ReorderedOps(Ops.size());
// there are exactly two operations.		transform(NewOrder, ReorderedOps.begin(),
assert(Ops.size() == 2);		[Ops](const unsigned Idx) { return Ops[Idx]; });
Value *ReorderedOps[] = {Ops[1], Ops[0]};		R.buildTree(ReorderedOps);
R.buildTree(ReorderedOps, None);		}
}		}
if (R.isTreeTinyAndNotFullyVectorizable())		if (R.isTreeTinyAndNotFullyVectorizable())
continue;		continue;

R.computeMinimumValueSizes();		R.computeMinimumValueSizes();
InstructionCost Cost = R.getTreeCost();		InstructionCost Cost = R.getTreeCost();
CandidateFound = true;		CandidateFound = true;
if (CompensateUseCost) {		if (CompensateUseCost) {
▲ Show 20 Lines • Show All 538 Lines • ▼ Show 20 Lines	while (!Stack.empty()) {
}		}
// I is an extra argument for TreeN (its parent operation).		// I is an extra argument for TreeN (its parent operation).
markExtraArg(Stack.back(), EdgeInst);		markExtraArg(Stack.back(), EdgeInst);
}		}
return true;		return true;
}		}

/// Attempt to vectorize the tree found by matchAssociativeReduction.		/// Attempt to vectorize the tree found by matchAssociativeReduction.
bool tryToReduce(BoUpSLP &V, TargetTransformInfo *TTI) {		bool tryToReduce(BoUpSLP &V, TargetTransformInfo *TTI, const DataLayout &DL) {
// If there are a sufficient number of reduction values, reduce		// If there are a sufficient number of reduction values, extend
// to a nearby power-of-2. We can safely generate oversized		// to a nearby power-of-2. We can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.		// vectors and rely on the backend to split them to legal sizes.
unsigned NumReducedVals = ReducedVals.size();		unsigned NumReducedVals = ReducedVals.size();
if (NumReducedVals < 4)		if (NumReducedVals < 3)
return false;		return false;

// Intersect the fast-math-flags from all reduction operations.		// Intersect the fast-math-flags from all reduction operations.
FastMathFlags RdxFMF;		FastMathFlags RdxFMF;
RdxFMF.set();		RdxFMF.set();
for (ReductionOpsType &RdxOp : ReductionOps) {		for (ReductionOpsType &RdxOp : ReductionOps) {
for (Value *RdxVal : RdxOp) {		for (Value *RdxVal : RdxOp) {
if (auto *FPMO = dyn_cast<FPMathOperator>(RdxVal))		if (auto *FPMO = dyn_cast<FPMathOperator>(RdxVal))
Show All 21 Lines	auto getCmpForMinMaxReduction = [](Instruction *RdxRootInst) {
assert(isa<Instruction>(ScalarCond) &&		assert(isa<Instruction>(ScalarCond) &&
"Expected min/max reduction to have compare condition");		"Expected min/max reduction to have compare condition");
return cast<Instruction>(ScalarCond);		return cast<Instruction>(ScalarCond);
};		};

// The reduction root is used as the insertion point for new instructions,		// The reduction root is used as the insertion point for new instructions,
// so set it as externally used to prevent it from being deleted.		// so set it as externally used to prevent it from being deleted.
ExternallyUsedValues[ReductionRoot];		ExternallyUsedValues[ReductionRoot];
SmallVector<Value *, 16> IgnoreList;		SmallVector<Value *, 16> PostoponedIndicies;
for (ReductionOpsType &RdxOp : ReductionOps)		for (ReductionOpsType &RdxOp : ReductionOps)
IgnoreList.append(RdxOp.begin(), RdxOp.end());		PostoponedIndicies.append(RdxOp.begin(), RdxOp.end());

unsigned ReduxWidth = PowerOf2Floor(NumReducedVals);		unsigned ReduxWidth = PowerOf2Floor(NumReducedVals);
if (NumReducedVals > ReduxWidth) {		if (NumReducedVals > ReduxWidth) {
// In the loop below, we are building a tree based on a window of		// In the loop below, we are building a tree based on a window of
// 'ReduxWidth' values.		// 'ReduxWidth' values.
// If the operands of those values have common traits (compare predicate,		// If the operands of those values have common traits (compare predicate,
// constant operand, etc), then we want to group those together to		// constant operand, etc), then we want to group those together to
// minimize the cost of the reduction.		// minimize the cost of the reduction.
Show All 16 Lines	if (NumReducedVals > ReduxWidth) {
return PredCountMap[PredA] > PredCountMap[PredB];		return PredCountMap[PredA] > PredCountMap[PredB];
}		}
return false;		return false;
});		});
}		}

Value *VectorizedTree = nullptr;		Value *VectorizedTree = nullptr;
unsigned i = 0;		unsigned i = 0;
while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {		ReduxWidth = PowerOf2Ceil(NumReducedVals);
ArrayRef<Value *> VL(&ReducedVals[i], ReduxWidth);		// Try once the non-power-2 vectorization and only if it is unsuccessfull,
V.buildTree(VL, ExternallyUsedValues, IgnoreList);		// try to split it for less power-2 chunks.
		while (
		(ReduxWidth > NumReducedVals \|\| i < NumReducedVals - ReduxWidth + 1) &&
		ReduxWidth > 2) {
		ArrayRef<Value *> VL;
		SmallVector<Value *, 4> NormalizedVL;
		// Still need to normalize to power-of-2 size.
		if (ReduxWidth > NumReducedVals) {
		NormalizedVL.append(&ReducedVals[i],
		&ReducedVals[i] + ReducedVals.size() - i);
		NormalizedVL.append(ReduxWidth - NormalizedVL.size(),
		UndefValue::get(ReducedVals[i]->getType()));
		VL = NormalizedVL;
		} else {
		VL = makeArrayRef(&ReducedVals[i], ReduxWidth);
		}
		V.buildTree(VL, ExternallyUsedValues, PostoponedIndicies);
Optional<ArrayRef<unsigned>> Order = V.bestOrder();		Optional<ArrayRef<unsigned>> Order = V.bestOrder();
if (Order) {		if (Order) {
assert(Order->size() == VL.size() &&		assert(Order->size() == VL.size() &&
"Order size must be the same as number of vectorized "		"Order size must be the same as number of vectorized "
		RKSimonUnsubmitted Done Reply Inline Actions SmallBitVector ? RKSimon: SmallBitVector ?
"instructions.");		"instructions.");
		SmallVector<unsigned, 4> NewOrder(Order->begin(), Order->end());
		fixupOrderingIndicies(NewOrder);
// TODO: reorder tree nodes without tree rebuilding.		// TODO: reorder tree nodes without tree rebuilding.
SmallVector<Value *, 4> ReorderedOps(VL.size());		SmallVector<Value *, 4> ReorderedOps(VL.size());
llvm::transform(*Order, ReorderedOps.begin(),		llvm::transform(NewOrder, ReorderedOps.begin(),
[VL](const unsigned Idx) { return VL[Idx]; });		[VL](const unsigned Idx) { return VL[Idx]; });
V.buildTree(ReorderedOps, ExternallyUsedValues, IgnoreList);		V.buildTree(ReorderedOps, ExternallyUsedValues, PostoponedIndicies);
		}
		if (V.isTreeTinyAndNotFullyVectorizable() \|\|
		V.isLoadCombineReductionCandidate(RdxKind)) {
		// Try with smaller reductions.
		if (ReduxWidth > NumReducedVals) {
		ReduxWidth /= 2;
		continue;
}		}
if (V.isTreeTinyAndNotFullyVectorizable())
break;
if (V.isLoadCombineReductionCandidate(RdxKind))
break;		break;
		}

V.computeMinimumValueSizes();		V.computeMinimumValueSizes();

// Estimate cost.		// Estimate cost.
InstructionCost TreeCost = V.getTreeCost();		InstructionCost TreeCost = V.getTreeCost();
InstructionCost ReductionCost =		InstructionCost ReductionCost = getReductionCost(
getReductionCost(TTI, ReducedVals[i], ReduxWidth);		TTI, ReducedVals[i],
		ReduxWidth > NumReducedVals ? NumReducedVals : VL.size(), ReduxWidth);
InstructionCost Cost = TreeCost + ReductionCost;		InstructionCost Cost = TreeCost + ReductionCost;
if (!Cost.isValid()) {		if (!Cost.isValid()) {
LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");		LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");
		// Try with smaller reductions.
		if (ReduxWidth > NumReducedVals) {
		ReduxWidth /= 2;
		continue;
		}
return false;		return false;
}		}
if (Cost >= -SLPCostThreshold) {		if (Cost >= -SLPCostThreshold) {
V.getORE()->emit([&]() {		V.getORE()->emit([&]() {
return OptimizationRemarkMissed(SV_NAME, "HorSLPNotBeneficial",		return OptimizationRemarkMissed(SV_NAME, "HorSLPNotBeneficial",
cast<Instruction>(VL[0]))		cast<Instruction>(VL[0]))
<< "Vectorizing horizontal reduction is possible"		<< "Vectorizing horizontal reduction is possible"
<< "but not beneficial with cost " << ore::NV("Cost", Cost)		<< "but not beneficial with cost " << ore::NV("Cost", Cost)
<< " and threshold "		<< " and threshold "
<< ore::NV("Threshold", -SLPCostThreshold);		<< ore::NV("Threshold", -SLPCostThreshold);
});		});
		// Try with smaller reductions.
		if (ReduxWidth > NumReducedVals) {
		ReduxWidth /= 2;
		continue;
		}
break;		break;
}		}

LLVM_DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:"		LLVM_DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:"
<< Cost << ". (HorRdx)\n");		<< Cost << ". (HorRdx)\n");
V.getORE()->emit([&]() {		V.getORE()->emit([&]() {
return OptimizationRemark(SV_NAME, "VectorizedHorizontalReduction",		return OptimizationRemark(SV_NAME, "VectorizedHorizontalReduction",
cast<Instruction>(VL[0]))		cast<Instruction>(VL[0]))
Show All 9 Lines	while (
// Emit a reduction. If the root is a select (min/max idiom), the insert		// Emit a reduction. If the root is a select (min/max idiom), the insert
// point is the compare condition of that select.		// point is the compare condition of that select.
Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);		Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);
if (isa<SelectInst>(RdxRootInst))		if (isa<SelectInst>(RdxRootInst))
Builder.SetInsertPoint(getCmpForMinMaxReduction(RdxRootInst));		Builder.SetInsertPoint(getCmpForMinMaxReduction(RdxRootInst));
else		else
Builder.SetInsertPoint(RdxRootInst);		Builder.SetInsertPoint(RdxRootInst);

		// Check if we reduced non-power-2 number of elements and need to extend
		// the scalars with the elements that does not affect the result (0 for
		// add, or, xor, 1 for mul, ~0 for and, min for max and max for min).
		if (ReduxWidth > NumReducedVals) {
		Value *ShuffleOp = nullptr;
		Type *ScalarTy = ReducedVals[i]->getType();
		switch (RdxKind) {
		case RecurKind::Add:
		case RecurKind::Or:
		case RecurKind::FAdd:
		case RecurKind::Xor:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		Constant::getNullValue(ScalarTy));
		break;
		case RecurKind::And:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		Constant::getAllOnesValue(ScalarTy));
		break;
		case RecurKind::Mul:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, 1));
		break;
		case RecurKind::FMul:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		ConstantFP::get(ScalarTy, 1.0));
		break;
		case RecurKind::UMax:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getMinValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::SMax:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getSignedMinValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::UMin:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getMaxValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::SMin:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getSignedMaxValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::FMax:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantFP::get(ScalarTy,
		APFloat::getLargest(ScalarTy->getFltSemantics(),
		/Negative=/true)));
		break;
		case RecurKind::FMin:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantFP::get(ScalarTy,
		APFloat::getLargest(ScalarTy->getFltSemantics(),
		/Negative=/false)));
		break;
		default:
		llvm_unreachable(
		"Expected arithmetic or min/max reduction operation");
		}
		SmallVector<int, 4> Mask(ReduxWidth);
		std::iota(Mask.begin(), Mask.begin() + NumReducedVals, 0);
		std::iota(Mask.begin() + NumReducedVals, Mask.end(), ReduxWidth);
		VectorizedRoot = Builder.CreateShuffleVector(
		VectorizedRoot, ShuffleOp, Mask, "reduction.normalization");
		}

Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);		emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);

if (!VectorizedTree) {		if (!VectorizedTree) {
// Initialize the final value in the reduction.		// Initialize the final value in the reduction.
VectorizedTree = ReducedSubTree;		VectorizedTree = ReducedSubTree;
} else {		} else {
// Update the final value in the reduction.		// Update the final value in the reduction.
Builder.SetCurrentDebugLocation(Loc);		Builder.SetCurrentDebugLocation(Loc);
VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,		VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,
ReducedSubTree, "op.rdx", ReductionOps);		ReducedSubTree, "op.rdx", ReductionOps);
}		}
i += ReduxWidth;		i += ReduxWidth;
		if (ReduxWidth > NumReducedVals)
		ReduxWidth /= 2;
		else
ReduxWidth = PowerOf2Floor(NumReducedVals - i);		ReduxWidth = PowerOf2Floor(NumReducedVals - i);
}		}

if (VectorizedTree) {		if (VectorizedTree) {
// Finish the reduction.		// Finish the reduction.
for (; i < NumReducedVals; ++i) {		for (; i < NumReducedVals; ++i) {
auto *I = cast<Instruction>(ReducedVals[i]);		auto *I = cast<Instruction>(ReducedVals[i]);
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
VectorizedTree =		VectorizedTree =
Show All 18 Lines	if (VectorizedTree) {
getCmpForMinMaxReduction(cast<Instruction>(ReductionRoot));		getCmpForMinMaxReduction(cast<Instruction>(ReductionRoot));
ScalarCmp->replaceAllUsesWith(VecSelect->getCondition());		ScalarCmp->replaceAllUsesWith(VecSelect->getCondition());
}		}
}		}
ReductionRoot->replaceAllUsesWith(VectorizedTree);		ReductionRoot->replaceAllUsesWith(VectorizedTree);

// Mark all scalar reduction ops for deletion, they are replaced by the		// Mark all scalar reduction ops for deletion, they are replaced by the
// vector reductions.		// vector reductions.
V.eraseInstructions(IgnoreList);		V.eraseInstructions(PostoponedIndicies);
}		}
return VectorizedTree != nullptr;		return VectorizedTree != nullptr;
}		}

		/// Extracts extra argument values to the vector to try to use them as
		/// the vectorization roots.
		SmallVector<Value *, 4> getCopyOfExtraArgValues() const {
		SmallVector<Value *, 4> Args(ExtraArgs.size());
		transform(
		spatelUnsubmitted Not Done Reply Inline Actions Is it necessary to copy these? If so, it would be better to name this function something like "getCopyOfExtraArgValues" to make that explicit. If not, we can just make this a standard 'get' method: const MapVector<Instruction , Value > &getExtraArgs() const { return ExtraArgs; } And then access the 'second' data in the user code? spatel: Is it necessary to copy these? If so, it would be better to name this function something like…
		ABataevAuthorUnsubmitted Done Reply Inline Actions We don't need to expose the `first` element of the `MapVector` here, it is not good from the general design point of view. I'll rename the member function. ABataev: We don't need to expose the `first` element of the `MapVector` here, it is not good from the…
		ExtraArgs, Args.begin(),
		[](const std::pair<Instruction , Value > &P) { return P.second; });
		return Args;
		}

unsigned numReductionValues() const { return ReducedVals.size(); }		unsigned numReductionValues() const { return ReducedVals.size(); }

private:		private:
/// Calculate the cost of a reduction.		/// Calculate the cost of a reduction.
InstructionCost getReductionCost(TargetTransformInfo *TTI,		InstructionCost getReductionCost(TargetTransformInfo *TTI,
Value *FirstReducedVal,		Value *FirstReducedVal,
unsigned ReduxWidth) {		unsigned NumOfScalars, unsigned ReduxWidth) {
Type *ScalarTy = FirstReducedVal->getType();		Type *ScalarTy = FirstReducedVal->getType();
FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth);		FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth);
InstructionCost VectorCost, ScalarCost;		InstructionCost VectorCost, ScalarCost;
switch (RdxKind) {		switch (RdxKind) {
case RecurKind::Add:		case RecurKind::Add:
case RecurKind::Mul:		case RecurKind::Mul:
case RecurKind::Or:		case RecurKind::Or:
case RecurKind::And:		case RecurKind::And:
case RecurKind::Xor:		case RecurKind::Xor:
case RecurKind::FAdd:		case RecurKind::FAdd:
case RecurKind::FMul: {		case RecurKind::FMul: {
unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(RdxKind);		unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(RdxKind);
		if ((RdxKind == RecurKind::Or \|\| RdxKind == RecurKind::And) &&
		ScalarTy == IntegerType::getInt1Ty(FirstReducedVal->getContext())) {
		// Or reduction for i1 is represented as:
		// %val = bitcast <ReduxWidth x i1> to iReduxWidth
		// %res = cmp ne iReduxWidth %val, 0
		// And reduction for i1 is represented as:
		// %val = bitcast <ReduxWidth x i1> to iReduxWidth
		// %res = cmp eq iReduxWidth %val, 11111
		Type *ValTy =
		IntegerType::get(FirstReducedVal->getContext(), ReduxWidth);
		VectorCost = TTI->getCastInstrCost(Instruction::BitCast, ValTy,
		VectorTy, TTI::CastContextHint::None,
		TTI::TCK_RecipThroughput) +
		TTI->getCmpSelInstrCost(Instruction::ICmp, ValTy,
		CmpInst::makeCmpResultType(ValTy));
		} else {
VectorCost = TTI->getArithmeticReductionCost(RdxOpcode, VectorTy,		VectorCost = TTI->getArithmeticReductionCost(RdxOpcode, VectorTy,
/IsPairwiseForm=/false);		/IsPairwiseForm=/false);
		}
ScalarCost = TTI->getArithmeticInstrCost(RdxOpcode, ScalarTy);		ScalarCost = TTI->getArithmeticInstrCost(RdxOpcode, ScalarTy);
break;		break;
}		}
case RecurKind::FMax:		case RecurKind::FMax:
case RecurKind::FMin: {		case RecurKind::FMin: {
auto *VecCondTy = cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));		auto *VecCondTy = cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));
VectorCost =		VectorCost =
TTI->getMinMaxReductionCost(VectorTy, VecCondTy,		TTI->getMinMaxReductionCost(VectorTy, VecCondTy,
Show All 20 Lines	case RecurKind::UMin: {
CmpInst::makeCmpResultType(ScalarTy));		CmpInst::makeCmpResultType(ScalarTy));
break;		break;
}		}
default:		default:
llvm_unreachable("Expected arithmetic or min/max reduction operation");		llvm_unreachable("Expected arithmetic or min/max reduction operation");
}		}

// Scalar cost is repeated for N-1 elements.		// Scalar cost is repeated for N-1 elements.
ScalarCost *= (ReduxWidth - 1);		ScalarCost *= (NumOfScalars - 1);
		// Need to reshuffle elements to replace undefs with the real constant
		// values.
		if (NumOfScalars != ReduxWidth)
		VectorCost +=
		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, VectorTy);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << VectorCost - ScalarCost		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << VectorCost - ScalarCost
<< " for reduction that starts with " << *FirstReducedVal		<< " for reduction that starts with " << *FirstReducedVal
<< " (It is a splitting reduction)\n");		<< " (It is a splitting reduction)\n");
return VectorCost - ScalarCost;		return VectorCost - ScalarCost;
}		}

/// Emit a horizontal reduction of the vectorized value.		/// Emit a horizontal reduction of the vectorized value.
Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,		Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	if (findBuildAggregate_rec(LastInsertInst, TTI, BuildVectorOpds, InsertElts,
llvm::erase_value(InsertElts, nullptr);		llvm::erase_value(InsertElts, nullptr);
if (BuildVectorOpds.size() >= 2)		if (BuildVectorOpds.size() >= 2)
return true;		return true;
}		}

return false;		return false;
}		}

static bool PhiTypeSorterFunc(Value V, Value V2) {
return V->getType() < V2->getType();
}

/// Try and get a reduction value from a phi node.		/// Try and get a reduction value from a phi node.
///		///
/// Given a phi node \p P in a block \p ParentBB, consider possible reductions		/// Given a phi node \p P in a block \p ParentBB, consider possible reductions
/// if they come from either \p ParentBB or a containing loop latch.		/// if they come from either \p ParentBB or a containing loop latch.
///		///
/// \returns A candidate reduction value if possible, or \code nullptr \endcode		/// \returns A candidate reduction value if possible, or \code nullptr \endcode
/// if not possible.		/// if not possible.
static Value getReductionValue(const DominatorTree DT, PHINode *P,		static Value getReductionValue(const DominatorTree DT, PHINode *P,
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
/// attempted.		/// attempted.
/// \returns true if a horizontal reduction was matched and reduced or operands		/// \returns true if a horizontal reduction was matched and reduced or operands
/// of one of the binary instruction were vectorized.		/// of one of the binary instruction were vectorized.
/// \returns false if a horizontal reduction was not matched (or not possible)		/// \returns false if a horizontal reduction was not matched (or not possible)
/// or no vectorization of any binary operation feeding \a Root instruction was		/// or no vectorization of any binary operation feeding \a Root instruction was
/// performed.		/// performed.
static bool tryToVectorizeHorReductionOrInstOperands(		static bool tryToVectorizeHorReductionOrInstOperands(
PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,		PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI,		TargetTransformInfo *TTI, const DataLayout &DL,
const function_ref<bool(Instruction *, BoUpSLP &)> Vectorize) {		const function_ref<bool(Instruction *, BoUpSLP &)> Vectorize) {
if (!ShouldVectorizeHor)		if (!ShouldVectorizeHor)
return false;		return false;

if (!Root)		if (!Root)
return false;		return false;

if (Root->getParent() != BB \|\| isa<PHINode>(Root))		if (Root->getParent() != BB \|\| isa<PHINode>(Root))
Show All 15 Lines	while (!Stack.empty()) {
unsigned Level;		unsigned Level;
std::tie(Inst, Level) = Stack.pop_back_val();		std::tie(Inst, Level) = Stack.pop_back_val();
Value B0, B1;		Value B0, B1;
bool IsBinop = matchRdxBop(Inst, B0, B1);		bool IsBinop = matchRdxBop(Inst, B0, B1);
bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));		bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));
if (IsBinop \|\| IsSelect) {		if (IsBinop \|\| IsSelect) {
HorizontalReduction HorRdx;		HorizontalReduction HorRdx;
if (HorRdx.matchAssociativeReduction(P, Inst)) {		if (HorRdx.matchAssociativeReduction(P, Inst)) {
if (HorRdx.tryToReduce(R, TTI)) {		if (HorRdx.tryToReduce(R, TTI, DL)) {
Res = true;		Res = true;
// Set P to nullptr to avoid re-analysis of phi node in		// Set P to nullptr to avoid re-analysis of phi node in
// matchAssociativeReduction function unless this is the root node.		// matchAssociativeReduction function unless this is the root node.
P = nullptr;		P = nullptr;
		// Try to vectorize ExtraArgs.
		// Continue analysis for the instruction from the same basic block
		// only to save compile time.
		if (++Level < RecursionMaxDepth)
		for (auto *Op : HorRdx.getCopyOfExtraArgValues())
		if (VisitedInstrs.insert(Op).second)
		if (auto *I = dyn_cast<Instruction>(Op))
		if (!isa<PHINode>(I) && !R.isDeleted(I) &&
		I->getParent() == BB)
		Stack.emplace_back(I, Level);
continue;		continue;
}		}
}		}
if (P && IsBinop) {		if (P && IsBinop) {
Inst = dyn_cast<Instruction>(B0);		Inst = dyn_cast<Instruction>(B0);
if (Inst == P)		if (Inst == P)
Inst = dyn_cast<Instruction>(B1);		Inst = dyn_cast<Instruction>(B1);
if (!Inst) {		if (!Inst) {
Show All 33 Lines	if (!I)
return false;		return false;

if (!isa<BinaryOperator>(I))		if (!isa<BinaryOperator>(I))
P = nullptr;		P = nullptr;
// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {		auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {
return tryToVectorize(I, R);		return tryToVectorize(I, R);
};		};
return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI,		return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI, *DL,
ExtraVectorization);		ExtraVectorization);
}		}

bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,		bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,
BasicBlock *BB, BoUpSLP &R) {		BasicBlock *BB, BoUpSLP &R) {
const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();
if (!R.canMapToVector(IVI->getType(), DL))		if (!R.canMapToVector(IVI->getType(), DL))
return false;		return false;
Show All 24 Lines	bool SLPVectorizerPass::vectorizeInsertElementInst(InsertElementInst *IEI,
// Vectorize starting with the build vector operands ignoring the BuildVector		// Vectorize starting with the build vector operands ignoring the BuildVector
// instructions for the purpose of scheduling and user extraction.		// instructions for the purpose of scheduling and user extraction.
return tryToVectorizeList(BuildVectorOpds, R, /AllowReorder=/false,		return tryToVectorizeList(BuildVectorOpds, R, /AllowReorder=/false,
BuildVectorInsts);		BuildVectorInsts);
}		}

bool SLPVectorizerPass::vectorizeCmpInst(CmpInst CI, BasicBlock BB,		bool SLPVectorizerPass::vectorizeCmpInst(CmpInst CI, BasicBlock BB,
BoUpSLP &R) {		BoUpSLP &R) {
if (tryToVectorizePair(CI->getOperand(0), CI->getOperand(1), R))
return true;

bool OpsChanged = false;		bool OpsChanged = false;
for (int Idx = 0; Idx < 2; ++Idx) {		for (int Idx = 0; Idx < 2; ++Idx) {
OpsChanged \|=		OpsChanged \|=
vectorizeRootInstruction(nullptr, CI->getOperand(Idx), BB, R, TTI);		vectorizeRootInstruction(nullptr, CI->getOperand(Idx), BB, R, TTI);
}		}
return OpsChanged;		return OpsChanged \|\|
		tryToVectorizePair(CI->getOperand(0), CI->getOperand(1), R);
}		}

bool SLPVectorizerPass::vectorizeSimpleInstructions(		bool SLPVectorizerPass::vectorizeSimpleInstructions(
SmallVectorImpl<Instruction > &Instructions, BasicBlock BB, BoUpSLP &R) {		SmallVectorImpl<Instruction > &Instructions, BasicBlock BB, BoUpSLP &R,
		bool AtTerminator) {
bool OpsChanged = false;		bool OpsChanged = false;
		SmallVector<Instruction *, 4> PostponedCmps;
for (auto *I : reverse(Instructions)) {		for (auto *I : reverse(Instructions)) {
if (R.isDeleted(I))		if (R.isDeleted(I))
continue;		continue;
if (auto *LastInsertValue = dyn_cast<InsertValueInst>(I))		if (auto *LastInsertValue = dyn_cast<InsertValueInst>(I)) {
OpsChanged \|= vectorizeInsertValueInst(LastInsertValue, BB, R);		OpsChanged \|= vectorizeInsertValueInst(LastInsertValue, BB, R);
else if (auto *LastInsertElem = dyn_cast<InsertElementInst>(I))		} else if (auto *LastInsertElem = dyn_cast<InsertElementInst>(I)) {
OpsChanged \|= vectorizeInsertElementInst(LastInsertElem, BB, R);		OpsChanged \|= vectorizeInsertElementInst(LastInsertElem, BB, R);
else if (auto *CI = dyn_cast<CmpInst>(I))		} else if (auto *CI = dyn_cast<CmpInst>(I)) {
		if (!AtTerminator)
		PostponedCmps.push_back(CI);
		else
OpsChanged \|= vectorizeCmpInst(CI, BB, R);		OpsChanged \|= vectorizeCmpInst(CI, BB, R);
}		}
Instructions.clear();		}
		// Insert in reverse order since the PostponedCmps vector was filled in
		// reverse order.
		Instructions.assign(PostponedCmps.rbegin(), PostponedCmps.rend());
return OpsChanged;		return OpsChanged;
}		}

bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {		bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
bool Changed = false;		bool Changed = false;
SmallVector<Value *, 4> Incoming;		SmallVector<Value *, 4> Incoming;
SmallPtrSet<Value *, 16> VisitedInstrs;		SmallPtrSet<Value *, 16> VisitedInstrs;
		// Maps phi nodes to the non-phi nodes found in the use tree for each phi
		// node.
		DenseMap<Value , SmallVector<Value , 4>> PHIToOpcodes;

bool HaveVectorizedPhiNodes = true;		bool HaveVectorizedPhiNodes = true;
while (HaveVectorizedPhiNodes) {		while (HaveVectorizedPhiNodes) {
HaveVectorizedPhiNodes = false;		HaveVectorizedPhiNodes = false;

// Collect the incoming values from the PHIs.		// Collect the incoming values from the PHIs.
Incoming.clear();		Incoming.clear();
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
PHINode *P = dyn_cast<PHINode>(&I);		PHINode *P = dyn_cast<PHINode>(&I);
if (!P)		if (!P)
break;		break;

if (!VisitedInstrs.count(P) && !R.isDeleted(P))		// No need to analyze deleted and/or vectorized nodes.
		if (!VisitedInstrs.count(P) && !R.isDeleted(P) &&
		!P->getType()->isVectorTy())
Incoming.push_back(P);		Incoming.push_back(P);
}		}

// Sort by type.		// Find the corresponding non-phi nodes for better matching when trying to
llvm::stable_sort(Incoming, PhiTypeSorterFunc);		// build the tree.
		for (Value *V : Incoming) {
		SmallVectorImpl<Value *> &Opcodes =
		PHIToOpcodes.try_emplace(V).first->getSecond();
		if (!Opcodes.empty())
		continue;
		SmallVector<Value *, 4> Nodes(1, V);
		SmallPtrSet<Value *, 4> Visited;
		while (!Nodes.empty()) {
		auto *PHI = cast<PHINode>(Nodes.pop_back_val());
		if (!Visited.insert(PHI).second)
		continue;
		for (Value *V : PHI->incoming_values()) {
		if (auto *PHI1 = dyn_cast<PHINode>((V))) {
		Nodes.push_back(PHI1);
		continue;
		}
		Opcodes.emplace_back(V);
		}
		}
		}

		// Sort by type, parent, operands.
		stable_sort(Incoming, [&PHIToOpcodes](Value V1, Value V2) {
		if (V1->getType() < V2->getType())
		return true;
		if (V1->getType() > V2->getType())
		return false;
		ArrayRef<Value *> Opcodes1 = PHIToOpcodes[V1];
		ArrayRef<Value *> Opcodes2 = PHIToOpcodes[V2];
		if (Opcodes1.size() < Opcodes2.size())
		return true;
		if (Opcodes1.size() > Opcodes2.size())
		return false;
		for (int I = 0, E = Opcodes1.size(); I < E; ++I) {
		// Undefs are compatible with any other value.
		if (isa<UndefValue>(Opcodes1[I]) \|\| isa<UndefValue>(Opcodes2[I]))
		continue;
		if (auto *I1 = dyn_cast<Instruction>(Opcodes1[I]))
		if (auto *I2 = dyn_cast<Instruction>(Opcodes2[I])) {
		if (I1->getParent() < I2->getParent())
		return true;
		if (I1->getParent() > I2->getParent())
		return false;
		InstructionsState S = getSameOpcode({I1, I2});
		if (S.getOpcode())
		continue;
		return I1->getOpcode() < I2->getOpcode();
		}
		if (isa<Constant>(Opcodes1[I]) && isa<Constant>(Opcodes2[I]))
		continue;
		if (Opcodes1[I]->getValueID() < Opcodes2[I]->getValueID())
		return true;
		if (Opcodes1[I]->getValueID() > Opcodes2[I]->getValueID())
		return false;
		}
		return false;
		});

		auto &&AreCompatiblePHIs = [&PHIToOpcodes](Value V1, Value V2) {
		if (V1 == V2)
		return true;
		if (V1->getType() != V2->getType())
		return false;
		ArrayRef<Value *> Opcodes1 = PHIToOpcodes[V1];
		ArrayRef<Value *> Opcodes2 = PHIToOpcodes[V2];
		if (Opcodes1.size() != Opcodes2.size())
		return false;
		for (int I = 0, E = Opcodes1.size(); I < E; ++I) {
		// Undefs are compatible with any other value.
		if (isa<UndefValue>(Opcodes1[I]) \|\| isa<UndefValue>(Opcodes2[I]))
		continue;
		if (auto *I1 = dyn_cast<Instruction>(Opcodes1[I]))
		if (auto *I2 = dyn_cast<Instruction>(Opcodes2[I])) {
		if (I1->getParent() != I2->getParent())
		return false;
		InstructionsState S = getSameOpcode({I1, I2});
		if (!S.getOpcode())
		return false;
		continue;
		}
		if (isa<Constant>(Opcodes1[I]) && isa<Constant>(Opcodes2[I]))
		continue;
		if (Opcodes1[I]->getValueID() != Opcodes2[I]->getValueID())
		return false;
		}
		return true;
		};

// Try to vectorize elements base on their type.		// Try to vectorize elements base on their type.
		SmallVector<Value *, 4> Candidates;
for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),		for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),
E = Incoming.end();		E = Incoming.end();
IncIt != E;) {		IncIt != E;) {

// Look for the next elements with the same type.		// Look for the next elements with the same type, parent and operand
		// kinds.
SmallVector<Value *, 4>::iterator SameTypeIt = IncIt;		SmallVector<Value *, 4>::iterator SameTypeIt = IncIt;
while (SameTypeIt != E &&		while (SameTypeIt != E && AreCompatiblePHIs(SameTypeIt, IncIt)) {
(SameTypeIt)->getType() == (IncIt)->getType()) {
VisitedInstrs.insert(*SameTypeIt);		VisitedInstrs.insert(*SameTypeIt);
++SameTypeIt;		++SameTypeIt;
}		}

// Try to vectorize them.		// Try to vectorize them.
unsigned NumElts = (SameTypeIt - IncIt);		int NumElts = (SameTypeIt - IncIt);
LLVM_DEBUG(dbgs() << "SLP: Trying to vectorize starting at PHIs ("		LLVM_DEBUG(dbgs() << "SLP: Trying to vectorize starting at PHIs ("
<< NumElts << ")\n");		<< NumElts << ")\n");
// The order in which the phi nodes appear in the program does not matter.		if (NumElts > 1 && tryToVectorizeList(makeArrayRef(IncIt, NumElts), R,
// So allow tryToVectorizeList to reorder them if it is beneficial. This		/AllowReorder=/true)) {
// is done when there are exactly two elements since tryToVectorizeList		// Success start over because instructions might have been changed.
// asserts that there are only two values when AllowReorder is true.		HaveVectorizedPhiNodes = true;
bool AllowReorder = NumElts == 2;		Changed = true;
if (NumElts > 1 &&		} else if ((NumElts == 1 \|\| NumElts < MinNonPow2ValuesSize.getValue()) &&
tryToVectorizeList(makeArrayRef(IncIt, NumElts), R, AllowReorder)) {		(Candidates.empty() \|\|
		Candidates.front()->getType() == (*IncIt)->getType())) {
		Candidates.append(IncIt, std::next(IncIt, NumElts));
		}
		// Final attempt to vectorize phis with the same types.
		if (SameTypeIt == E \|\| (SameTypeIt)->getType() != (IncIt)->getType()) {
		if (Candidates.size() > 1 &&
		tryToVectorizeList(Candidates, R, /AllowReorder=/true)) {
// Success start over because instructions might have been changed.		// Success start over because instructions might have been changed.
HaveVectorizedPhiNodes = true;		HaveVectorizedPhiNodes = true;
Changed = true;		Changed = true;
break;		}
		Candidates.clear();
}		}

// Start over at the next instruction of a different type (or the end).		// Start over at the next instruction of a different type (or the end).
IncIt = SameTypeIt;		IncIt = SameTypeIt;
}		}
}		}

VisitedInstrs.clear();		VisitedInstrs.clear();

SmallVector<Instruction *, 8> PostProcessInstructions;		SmallVector<Instruction *, 8> PostProcessInstructions;
SmallDenseSet<Instruction *, 4> KeyNodes;		SmallDenseSet<Instruction *, 4> KeyNodes;
for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
// Skip instructions with scalable type. The num of elements is unknown at		// Skip instructions with scalable type. The num of elements is unknown at
// compile-time for scalable type.		// compile-time for scalable type.
if (isa<ScalableVectorType>(it->getType()))		if (isa<ScalableVectorType>(it->getType()))
continue;		continue;

// Skip instructions marked for the deletion.		// Skip instructions marked for the deletion.
if (R.isDeleted(&*it))		if (R.isDeleted(&*it))
continue;		continue;
// We may go through BB multiple times so skip the one we have checked.		// We may go through BB multiple times so skip the one we have checked.
if (!VisitedInstrs.insert(&*it).second) {		if (!VisitedInstrs.insert(&*it).second) {
if (it->use_empty() && KeyNodes.contains(&*it) &&		if (it->use_empty() && KeyNodes.contains(&*it) &&
vectorizeSimpleInstructions(PostProcessInstructions, BB, R)) {		vectorizeSimpleInstructions(PostProcessInstructions, BB, R,
		it->isTerminator())) {
// We would like to start over since some instructions are deleted		// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.		// and the iterator may become invalid value.
Changed = true;		Changed = true;
it = BB->begin();		it = BB->begin();
e = BB->end();		e = BB->end();
}		}
continue;		continue;
}		}
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	if (it->use_empty() && (it->getType()->isVoidTy() \|\| isa<CallInst>(it) \|\|
for (auto *V : it->operand_values()) {		for (auto *V : it->operand_values()) {
// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
OpsChanged \|= vectorizeRootInstruction(nullptr, V, BB, R, TTI);		OpsChanged \|= vectorizeRootInstruction(nullptr, V, BB, R, TTI);
}		}
}		}
// Start vectorization of post-process list of instructions from the		// Start vectorization of post-process list of instructions from the
// top-tree instructions to try to vectorize as many instructions as		// top-tree instructions to try to vectorize as many instructions as
// possible.		// possible.
OpsChanged \|= vectorizeSimpleInstructions(PostProcessInstructions, BB, R);		OpsChanged \|= vectorizeSimpleInstructions(PostProcessInstructions, BB, R,
		it->isTerminator());
if (OpsChanged) {		if (OpsChanged) {
// We would like to start over since some instructions are deleted		// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.		// and the iterator may become invalid value.
Changed = true;		Changed = true;
it = BB->begin();		it = BB->begin();
e = BB->end();		e = BB->end();
continue;		continue;
}		}
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::vectorizeStoreChains(BoUpSLP &R) {
for (StoreListMap::iterator it = Stores.begin(), e = Stores.end(); it != e;		for (StoreListMap::iterator it = Stores.begin(), e = Stores.end(); it != e;
++it) {		++it) {
if (it->second.size() < 2)		if (it->second.size() < 2)
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length "		LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length "
<< it->second.size() << ".\n");		<< it->second.size() << ".\n");

Changed \|= vectorizeStores(it->second, R);		// Sort by type, base pointers and values operand. Value operands must be
		// compatible (have the same opcode, same parent), otherwise it is
		// definitely not profitable to try to vectorize them.
		auto &&StoreSorter = [](StoreInst V, StoreInst V2) {
		if (V->getPointerOperandType() < V2->getPointerOperandType())
		return true;
		if (V->getPointerOperandType() > V2->getPointerOperandType())
		return false;
		// UndefValues are compatible with all other values.
		if (isa<UndefValue>(V->getValueOperand()) \|\|
		isa<UndefValue>(V2->getValueOperand()))
		return false;
		if (auto *I1 = dyn_cast<Instruction>(V->getValueOperand()))
		if (auto *I2 = dyn_cast<Instruction>(V2->getValueOperand())) {
		if (I1->getParent() < I2->getParent())
		return true;
		if (I1->getParent() > I2->getParent())
		return false;
		InstructionsState S = getSameOpcode({I1, I2});
		if (S.getOpcode())
		return false;
		return I1->getOpcode() < I2->getOpcode();
		}
		if (isa<Constant>(V->getValueOperand()) &&
		isa<Constant>(V2->getValueOperand()))
		return false;
		return V->getValueOperand()->getValueID() <
		V2->getValueOperand()->getValueID();
		};

		llvm::stable_sort(it->second, StoreSorter);

		auto &&AreCompatibleStores = [](StoreInst V1, StoreInst V2) {
		if (V1 == V2)
		return true;
		if (V1->getPointerOperandType() != V2->getPointerOperandType())
		return false;
		// Undefs are compatible with any other value.
		if (isa<UndefValue>(V1->getValueOperand()) \|\|
		isa<UndefValue>(V2->getValueOperand()))
		return true;
		if (auto *I1 = dyn_cast<Instruction>(V1->getValueOperand()))
		if (auto *I2 = dyn_cast<Instruction>(V2->getValueOperand())) {
		if (I1->getParent() != I2->getParent())
		return false;
		InstructionsState S = getSameOpcode({I1, I2});
		return S.getOpcode() > 0;
		}
		if (isa<Constant>(V1->getValueOperand()) &&
		isa<Constant>(V2->getValueOperand()))
		return true;
		return V1->getValueOperand()->getValueID() ==
		V2->getValueOperand()->getValueID();
		};

		// Try to vectorize elements based on their compatibility.
		for (SmallVector<StoreInst *, 4>::iterator IncIt = it->second.begin(),
		E = it->second.end();
		IncIt != E;) {

		// Look for the next elements with the same type.
		SmallVector<StoreInst *, 4>::iterator SameTypeIt = IncIt;
		Type EltTy = (IncIt)->getPointerOperand()->getType();

		while (SameTypeIt != E && AreCompatibleStores(SameTypeIt, IncIt))
		++SameTypeIt;

		// Try to vectorize them.
		unsigned NumElts = (SameTypeIt - IncIt);
		LLVM_DEBUG(dbgs() << "SLP: Trying to vectorize starting at stores ("
		<< NumElts << ")\n");
		if (NumElts > 1 && !EltTy->getPointerElementType()->isVectorTy() &&
		vectorizeStores(makeArrayRef(IncIt, NumElts), R)) {
		// Success start over because instructions might have been changed.
		Changed = true;
		}

		// Start over at the next instruction of a different type (or the end).
		IncIt = SameTypeIt;
		}
}		}
return Changed;		return Changed;
}		}

char SLPVectorizer::ID = 0;		char SLPVectorizer::ID = 0;

static const char lv_name[] = "SLP Vectorizer";		static const char lv_name[] = "SLP Vectorizer";

Show All 12 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=aarch64-apple-ios -mcpu=cyclone -o - %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=aarch64-apple-ios -mcpu=cyclone -o - %s \| FileCheck %s

	define void @f1(<2 x i16> %x, i16* %a) {			define void @f1(<2 x i16> %x, i16* %a) {
	; CHECK-LABEL: @f1(			; CHECK-LABEL: @f1(
	; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i16> [[X:%.]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>			; CHECK-NEXT: [[T2:%.]] = extractelement <2 x i16> [[X:%.]], i32 0
				; CHECK-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[X]], i32 1
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i16> [[SHUFFLE]], i32 0			; CHECK-NEXT: store i16 [[T2]], i16* [[A:%.*]]
	; CHECK-NEXT: store i16 [[TMP1]], i16* [[A:%.*]], align 2			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR0]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR1]]
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP2]], align 2			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR2]]
				RKSimonUnsubmitted Not Done Reply Inline Actions These "feel" like regressions to me - any idea whats going on? RKSimon: These "feel" like regressions to me - any idea whats going on?
				ABataevAuthorUnsubmitted Done Reply Inline Actions The cost model problem, if I recall it correctly. I investigated it before and found out that the cost model for AArch64 is not defined for long vectors in some cases and we fall back to the generic cost model evaluation which is not quite correct in many cases. Need to tweak the cost model for AArch64. ABataev: The cost model problem, if I recall it correctly. I investigated it before and found out that…
				RKSimonUnsubmitted Not Done Reply Inline Actions Any instruction cost type (extract/shuffle/store?) in particular that needs better costs? It'd be good to at least raise a specific bug report to the aarch64 team RKSimon: Any instruction cost type (extract/shuffle/store?) in particular that needs better costs? It'd…
				ABataevAuthorUnsubmitted Done Reply Inline Actions Do not remember already, need some time to investigate it again. Hope to do it by the end of this week. PS. There was a question about this test already. ABataev: Do not remember already, need some time to investigate it again. Hope to do it by the end of…
				ABataevAuthorUnsubmitted Done Reply Inline Actions Found the reason. It is the cost of shuffle of `TTI::SK_PermuteSingleSrc` kind. Before this patch, the test operated with the vector `<2 x i16>`, which is transformed to `llvm::MVT::v2i32` by type legalization function and the cost of this shuffle is tweaked to be `1` (see llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp, `AArch64TTIImpl::getShuffleCost`). The cost of this operation is 1, per table. With this patch, the original vector type is `<4 x i16>` which is transformed to `llvm::MVT::v4i16` and there is no optimized value for `TTI::SK_PermuteSingleSrc` in the table for this type and the function falls back to the pessimistic cost model and returns `18`. There are several TODOs int the file already about fixing the cost model for different shuffle operations. ABataev: Found the reason. It is the cost of shuffle of `TTI::SK_PermuteSingleSrc` kind. Before this…
	; CHECK-NEXT: ret void			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR3]]
	;			;
	%t2 = extractelement <2 x i16> %x, i32 0			%t2 = extractelement <2 x i16> %x, i32 0
	%t3 = extractelement <2 x i16> %x, i32 1			%t3 = extractelement <2 x i16> %x, i32 1
	%ptr0 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 0			%ptr0 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 0
	%ptr1 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 1			%ptr1 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 1
	%ptr2 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 2			%ptr2 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 2
	%ptr3 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 3			%ptr3 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 3
	store i16 %t2, i16* %a			store i16 %t2, i16* %a
	store i16 %t2, i16* %ptr0			store i16 %t2, i16* %ptr0
	store i16 %t3, i16* %ptr1			store i16 %t3, i16* %ptr1
	store i16 %t3, i16* %ptr2			store i16 %t3, i16* %ptr2
	store i16 %t2, i16* %ptr3			store i16 %t2, i16* %ptr3
	ret void			ret void
	}			}

	define void @f2(<2 x i16> %x, i16* %a) {			define void @f2(<2 x i16> %x, i16* %a) {
	; CHECK-LABEL: @f2(			; CHECK-LABEL: @f2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[CONT:%.*]]			; CHECK-NEXT: br label [[CONT:%.*]]
	; CHECK: cont:			; CHECK: cont:
	; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>			; CHECK-NEXT: [[T2:%.*]] = extractelement <2 x i16> [[XX]], i32 0
				; CHECK-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[XX]], i32 1
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x i16> [[SHUFFLE]], i32 0			; CHECK-NEXT: store i16 [[T2]], i16* [[A]]
	; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR1]]
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR2]]
				; CHECK-NEXT: store i16 [[T2]], i16* [[PTR3]]
	; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2			; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %cont			br label %cont
	Show All 22 Lines

	define void @f3(<2 x i16> %x, i16* %a) {			define void @f3(<2 x i16> %x, i16* %a) {
	; CHECK-LABEL: @f3(			; CHECK-LABEL: @f3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[CONT:%.*]]			; CHECK-NEXT: br label [[CONT:%.*]]
	; CHECK: cont:			; CHECK: cont:
	; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[T2:%.*]] = extractelement <2 x i16> [[XX]], i32 0
				; CHECK-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[XX]], i32 1
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x i16> [[SHUFFLE]], i32 0			; CHECK-NEXT: store i16 [[T3]], i16* [[A]]
	; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR1]]
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR2]]
				; CHECK-NEXT: store i16 [[T3]], i16* [[PTR3]]
	; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2			; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %cont			br label %cont
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

	Show All 24 Lines
	;			;
	; NOACCELERATE-LABEL: @int_sin_4x(			; NOACCELERATE-LABEL: @int_sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @exp_4x(			; NOACCELERATE-LABEL: @exp_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @expf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @expf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @log_4x(			; NOACCELERATE-LABEL: @log_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @logf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @logf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @logf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @sin_4x(			; NOACCELERATE-LABEL: @sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @sinf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	Show All 25 Lines
	;			;
	; NOACCELERATE-LABEL: @cos_4x(			; NOACCELERATE-LABEL: @cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @cosf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 554 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @int_cos_4x(			; NOACCELERATE-LABEL: @int_cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

	Show All 24 Lines
	;			;
	; NOACCELERATE-LABEL: @int_sin_4x(			; NOACCELERATE-LABEL: @int_sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @exp_4x(			; NOACCELERATE-LABEL: @exp_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @expf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @expf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @log_4x(			; NOACCELERATE-LABEL: @log_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @logf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @logf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @logf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @sin_4x(			; NOACCELERATE-LABEL: @sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @sinf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	Show All 25 Lines
	;			;
	; NOACCELERATE-LABEL: @cos_4x(			; NOACCELERATE-LABEL: @cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @cosf(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	▲ Show 20 Lines • Show All 499 Lines • ▼ Show 20 Lines
	}			}

	; Accelerate does not provide sin() for <2 x float>.			; Accelerate does not provide sin() for <2 x float>.
	define <2 x float> @sin_2x(<2 x float>* %a) {			define <2 x float> @sin_2x(<2 x float>* %a) {
	; CHECK-LABEL: @sin_2x(			; CHECK-LABEL: @sin_2x(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]]) #2			; CHECK-NEXT: [[TMP1:%.]] = tail call fast float @llvm.sin.f32(float [[VECEXT]]) [[ATTR2:#.]]
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]]) #2			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]]) [[ATTR2]]
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: ret <2 x float> [[VECINS_1]]			; CHECK-NEXT: ret <2 x float> [[VECINS_1]]
	;			;
	; NOACCELERATE-LABEL: @sin_2x(			; NOACCELERATE-LABEL: @sin_2x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	Show All 35 Lines
	;			;
	; NOACCELERATE-LABEL: @int_cos_4x(			; NOACCELERATE-LABEL: @int_cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
	; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_2]])			; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
				; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
				; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP5]], i32 1
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
				; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP6]], i32 2
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3			; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	Show All 9 Lines
	}			}

	; Accelerate does not provide cos() for <2 x float>.			; Accelerate does not provide cos() for <2 x float>.
	define <2 x float> @cos_2x(<2 x float>* %a) {			define <2 x float> @cos_2x(<2 x float>* %a) {
	; CHECK-LABEL: @cos_2x(			; CHECK-LABEL: @cos_2x(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) #3			; CHECK-NEXT: [[TMP1:%.]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) [[ATTR3:#.]]
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) #3			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) [[ATTR3]]
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: ret <2 x float> [[VECINS_1]]			; CHECK-NEXT: ret <2 x float> [[VECINS_1]]
	;			;
	; NOACCELERATE-LABEL: @cos_2x(			; NOACCELERATE-LABEL: @cos_2x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll

	Show All 9 Lines
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>			; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>
	; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>
	; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]			; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]
	; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0			; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0
	; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64			; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, i64 [[P:%.*]], i64 [[S0]]			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, i64 [[P:%.*]], i64 [[S0]]
	; CHECK-NEXT: [[LOAD0:%.]] = load i64, i64 [[GEP0]]			; CHECK-NEXT: [[LOAD0:%.]] = load i64, i64 [[GEP0]], align 4
	; CHECK-NEXT: [[E1:%.*]] = extractelement <4 x i32> [[SUB0]], i32 1			; CHECK-NEXT: [[E1:%.*]] = extractelement <4 x i32> [[SUB0]], i32 1
	; CHECK-NEXT: [[S1:%.*]] = sext i32 [[E1]] to i64			; CHECK-NEXT: [[S1:%.*]] = sext i32 [[E1]] to i64
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[S1]]			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[S1]]
	; CHECK-NEXT: [[LOAD1:%.]] = load i64, i64 [[GEP1]]			; CHECK-NEXT: [[LOAD1:%.]] = load i64, i64 [[GEP1]], align 4
	; CHECK-NEXT: [[E2:%.*]] = extractelement <4 x i32> [[SUB0]], i32 2			; CHECK-NEXT: [[E2:%.*]] = extractelement <4 x i32> [[SUB0]], i32 2
	; CHECK-NEXT: [[S2:%.*]] = sext i32 [[E2]] to i64
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[S2]]
	; CHECK-NEXT: [[LOAD2:%.]] = load i64, i64 [[GEP2]]
	; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3			; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3
	; CHECK-NEXT: [[S3:%.*]] = sext i32 [[E3]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[E2]], i32 0
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[S3]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[E3]], i32 1
	; CHECK-NEXT: [[LOAD3:%.]] = load i64, i64 [[GEP3]]			; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i32> [[TMP1]] to <2 x i64>
				; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i32>
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
				; CHECK-NEXT: [[TMP5:%.*]] = sext i32 [[TMP4]] to i64
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[TMP5]]
				; CHECK-NEXT: [[LOAD2:%.]] = load i64, i64 [[GEP2]], align 4
				; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
				; CHECK-NEXT: [[TMP7:%.*]] = sext i32 [[TMP6]] to i64
				; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[TMP7]]
				; CHECK-NEXT: [[LOAD3:%.]] = load i64, i64 [[GEP3]], align 4
	; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])			; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%z0 = zext <4 x i16> %a to <4 x i32>			%z0 = zext <4 x i16> %a to <4 x i32>
	%z1 = zext <4 x i16> %b to <4 x i32>			%z1 = zext <4 x i16> %b to <4 x i32>
	%sub0 = sub <4 x i32> %z0, %z1			%sub0 = sub <4 x i32> %z0, %z1
	%e0 = extractelement <4 x i32> %sub0, i32 0			%e0 = extractelement <4 x i32> %sub0, i32 0
	Show All 21 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>			; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>
	; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>
	; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]			; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]
	; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0			; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0
	; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64			; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64
	; CHECK-NEXT: [[A0:%.]] = add i64 [[S0]], [[C0:%.]]			; CHECK-NEXT: [[A0:%.]] = add i64 [[S0]], [[C0:%.]]
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, i64 [[P:%.*]], i64 [[A0]]			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, i64 [[P:%.*]], i64 [[A0]]
	; CHECK-NEXT: [[LOAD0:%.]] = load i64, i64 [[GEP0]]			; CHECK-NEXT: [[LOAD0:%.]] = load i64, i64 [[GEP0]], align 4
	; CHECK-NEXT: [[E1:%.*]] = extractelement <4 x i32> [[SUB0]], i32 1			; CHECK-NEXT: [[E1:%.*]] = extractelement <4 x i32> [[SUB0]], i32 1
	; CHECK-NEXT: [[S1:%.*]] = sext i32 [[E1]] to i64			; CHECK-NEXT: [[S1:%.*]] = sext i32 [[E1]] to i64
	; CHECK-NEXT: [[A1:%.]] = add i64 [[S1]], [[C1:%.]]			; CHECK-NEXT: [[A1:%.]] = add i64 [[S1]], [[C1:%.]]
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[A1]]			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[A1]]
	; CHECK-NEXT: [[LOAD1:%.]] = load i64, i64 [[GEP1]]			; CHECK-NEXT: [[LOAD1:%.]] = load i64, i64 [[GEP1]], align 4
	; CHECK-NEXT: [[E2:%.*]] = extractelement <4 x i32> [[SUB0]], i32 2			; CHECK-NEXT: [[E2:%.*]] = extractelement <4 x i32> [[SUB0]], i32 2
	; CHECK-NEXT: [[S2:%.*]] = sext i32 [[E2]] to i64
	; CHECK-NEXT: [[A2:%.]] = add i64 [[S2]], [[C2:%.]]
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[A2]]
	; CHECK-NEXT: [[LOAD2:%.]] = load i64, i64 [[GEP2]]
	; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3			; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3
	; CHECK-NEXT: [[S3:%.*]] = sext i32 [[E3]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[E2]], i32 0
	; CHECK-NEXT: [[A3:%.]] = add i64 [[S3]], [[C3:%.]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[E3]], i32 1
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[A3]]			; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i32> [[TMP1]] to <2 x i64>
	; CHECK-NEXT: [[LOAD3:%.]] = load i64, i64 [[GEP3]]			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x i64> poison, i64 [[C2:%.]], i32 0
				; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x i64> [[TMP3]], i64 [[C3:%.]], i32 1
				; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i64> [[TMP2]], [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[TMP6]]
				; CHECK-NEXT: [[LOAD2:%.]] = load i64, i64 [[GEP2]], align 4
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
				; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 [[TMP7]]
				; CHECK-NEXT: [[LOAD3:%.]] = load i64, i64 [[GEP3]], align 4
	; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])			; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%z0 = zext <4 x i16> %a to <4 x i32>			%z0 = zext <4 x i16> %a to <4 x i32>
	%z1 = zext <4 x i16> %b to <4 x i32>			%z1 = zext <4 x i16> %b to <4 x i32>
	%sub0 = sub <4 x i32> %z0, %z1			%sub0 = sub <4 x i32> %z0, %z1
	%e0 = extractelement <4 x i32> %sub0, i32 0			%e0 = extractelement <4 x i32> %sub0, i32 0
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s --check-prefix=DEFAULT			; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s --check-prefix=DEFAULT
	; RUN: opt < %s -slp-schedule-budget=0 -slp-min-tree-size=0 -slp-threshold=-30 -slp-vectorizer -S \| FileCheck %s --check-prefix=GATHER			; RUN: opt < %s -slp-schedule-budget=0 -slp-min-tree-size=0 -slp-threshold=-30 -slp-vectorizer -S \| FileCheck %s --check-prefix=GATHER
	; RUN: opt < %s -slp-schedule-budget=0 -slp-threshold=-30 -slp-vectorizer -S \| FileCheck %s --check-prefix=MAX-COST			; RUN: opt < %s -slp-schedule-budget=0 -slp-threshold=-32 -slp-vectorizer -S \| FileCheck %s --check-prefix=MAX-COST

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	@a = common global [80 x i8] zeroinitializer, align 16			@a = common global [80 x i8] zeroinitializer, align 16

	define void @PR28330(i32 %n) {			define void @PR28330(i32 %n) {
	; DEFAULT-LABEL: @PR28330(			; DEFAULT-LABEL: @PR28330(
	▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7			; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7
	; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7			; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7
	; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])			; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], -5			; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], -5
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR32038(			; MAX-COST-LABEL: @PR32038(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1			; MAX-COST-NEXT: [[P0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer			; MAX-COST-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2) to <2 x i8>*), align 2
	; MAX-COST-NEXT: [[P4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			; MAX-COST-NEXT: [[TMP1:%.*]] = extractelement <2 x i8> [[TMP0]], i32 1
	; MAX-COST-NEXT: [[P5:%.*]] = icmp eq i8 [[P4]], 0			; MAX-COST-NEXT: [[TMP2:%.*]] = insertelement <4 x i8> poison, i8 [[P0]], i32 0
				; MAX-COST-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP0]], i32 0
				; MAX-COST-NEXT: [[TMP4:%.*]] = insertelement <4 x i8> [[TMP2]], i8 [[TMP3]], i32 1
				; MAX-COST-NEXT: [[TMP5:%.*]] = insertelement <4 x i8> [[TMP4]], i8 [[TMP1]], i32 2
				; MAX-COST-NEXT: [[TMP6:%.*]] = icmp eq <4 x i8> [[TMP5]], <i8 0, i8 0, i8 0, i8 poison>
	; MAX-COST-NEXT: [[P6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4			; MAX-COST-NEXT: [[P6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	; MAX-COST-NEXT: [[P7:%.*]] = icmp eq i8 [[P6]], 0			; MAX-COST-NEXT: [[P7:%.*]] = icmp eq i8 [[P6]], 0
	; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1			; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0			; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0
	; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2			; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0			; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0
	; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1			; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0			; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0
	; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8			; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0			; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0			; MAX-COST-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP6]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 poison>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 poison>
	; MAX-COST-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> poison, i1 [[TMP2]], i32 0			; MAX-COST-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 4>
	; MAX-COST-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1			; MAX-COST-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[REDUCTION_NORMALIZATION]])
	; MAX-COST-NEXT: [[TMP5:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[TMP4]], i32 1			; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP8]], -5
	; MAX-COST-NEXT: [[TMP6:%.*]] = insertelement <4 x i1> [[TMP5]], i1 [[P5]], i32 2			; MAX-COST-NEXT: [[P25:%.*]] = select i1 [[P7]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP7:%.*]] = insertelement <4 x i1> [[TMP6]], i1 [[P7]], i32 3			; MAX-COST-NEXT: [[P26:%.*]] = add i32 [[OP_EXTRA]], [[P25]]
	; MAX-COST-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP7]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80
				; MAX-COST-NEXT: [[P28:%.*]] = add i32 [[P26]], [[P27]]
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])			; MAX-COST-NEXT: [[P30:%.*]] = add i32 [[P28]], [[P29]]
	; MAX-COST-NEXT: [[TMP10:%.*]] = add i32 [[TMP9]], [[P27]]
	; MAX-COST-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], [[P29]]
	; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP11]], -5
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]			; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[P30]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]			; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	%p1 = icmp eq i8 %p0, 0			%p1 = icmp eq i8 %p0, 0
	%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-threshold=-6 -S -pass-remarks-output=%t < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-threshold=-5 -S -pass-remarks-output=%t < %s \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=YAML %s			; RUN: cat %t \| FileCheck -check-prefix=YAML %s


	; FIXME: The threshold is changed to keep this test case a bit smaller.			; FIXME: The threshold is changed to keep this test case a bit smaller.
	; The AArch64 cost model should not give such high costs to select statements.			; The AArch64 cost model should not give such high costs to select statements.

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux"			target triple = "aarch64--linux"
	▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S 2>%t \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S 2>%t \| FileCheck %s
	; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t			; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

	; WARN-NOT: warning			; WARN-NOT: warning

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define <2 x float> @insertelement-fixed-vector() {			define <2 x float> @insertelement-fixed-vector() {
	; CHECK-LABEL: @insertelement-fixed-vector(			; CHECK-LABEL: @insertelement-fixed-vector(
	; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> undef)			; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> poison)
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; CHECK-NEXT: [[I0:%.*]] = insertelement <2 x float> poison, float [[TMP2]], i32 0			; CHECK-NEXT: [[I0:%.*]] = insertelement <2 x float> poison, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[I1:%.*]] = insertelement <2 x float> [[I0]], float [[TMP3]], i32 1			; CHECK-NEXT: [[I1:%.*]] = insertelement <2 x float> [[I0]], float [[TMP3]], i32 1
	; CHECK-NEXT: ret <2 x float> [[I1]]			; CHECK-NEXT: ret <2 x float> [[I1]]
	;			;
	%f0 = tail call fast float @llvm.fabs.f32(float undef)			%f0 = tail call fast float @llvm.fabs.f32(float undef)
	%f1 = tail call fast float @llvm.fabs.f32(float undef)			%f1 = tail call fast float @llvm.fabs.f32(float undef)
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S 2>%t \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S 2>%t \| FileCheck %s
	; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t			; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

	; WARN-NOT: warning			; WARN-NOT: warning

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define <2 x float> @insertelement-fixed-vector() {			define <2 x float> @insertelement-fixed-vector() {
	; CHECK-LABEL: @insertelement-fixed-vector(			; CHECK-LABEL: @insertelement-fixed-vector(
	; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> undef)			; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> poison)
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; CHECK-NEXT: [[I0:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0			; CHECK-NEXT: [[I0:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[I1:%.*]] = insertelement <2 x float> [[I0]], float [[TMP3]], i32 1			; CHECK-NEXT: [[I1:%.*]] = insertelement <2 x float> [[I0]], float [[TMP3]], i32 1
	; CHECK-NEXT: ret <2 x float> [[I1]]			; CHECK-NEXT: ret <2 x float> [[I1]]
	;			;
	%f0 = tail call fast float @llvm.fabs.f32(float undef)			%f0 = tail call fast float @llvm.fabs.f32(float undef)
	%f1 = tail call fast float @llvm.fabs.f32(float undef)			%f1 = tail call fast float @llvm.fabs.f32(float undef)
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"		target triple = "aarch64--linux-gnu"

define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {		define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {
; CHECK-LABEL: @build_vec_v2i64(		; CHECK-LABEL: @build_vec_v2i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP3]], [[TMP6]]
; CHECK-NEXT: ret <2 x i64> [[TMP9]]		; CHECK-NEXT: ret <2 x i64> [[TMP7]]
;		;
%v0.0 = extractelement <2 x i64> %v0, i32 0		%v0.0 = extractelement <2 x i64> %v0, i32 0
%v0.1 = extractelement <2 x i64> %v0, i32 1		%v0.1 = extractelement <2 x i64> %v0, i32 1
%v1.0 = extractelement <2 x i64> %v1, i32 0		%v1.0 = extractelement <2 x i64> %v1, i32 0
%v1.1 = extractelement <2 x i64> %v1, i32 1		%v1.1 = extractelement <2 x i64> %v1, i32 1
%tmp0.0 = add i64 %v0.0, %v1.0		%tmp0.0 = add i64 %v0.0, %v1.0
%tmp0.1 = add i64 %v0.1, %v1.1		%tmp0.1 = add i64 %v0.1, %v1.1
%tmp1.0 = sub i64 %v0.0, %v1.0		%tmp1.0 = sub i64 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
store i64 %tmp2.0, i64* %c.0, align 8		store i64 %tmp2.0, i64* %c.0, align 8
store i64 %tmp2.1, i64* %c.1, align 8		store i64 %tmp2.1, i64* %c.1, align 8
ret void		ret void
}		}

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP3]], [[TMP8]]
; CHECK-NEXT: ret <4 x i32> [[TMP9]]		; CHECK-NEXT: ret <4 x i32> [[TMP9]]
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
%v0.2 = extractelement <4 x i32> %v0, i32 2		%v0.2 = extractelement <4 x i32> %v0, i32 2
%v0.3 = extractelement <4 x i32> %v0, i32 3		%v0.3 = extractelement <4 x i32> %v0, i32 3
%v1.0 = extractelement <4 x i32> %v1, i32 0		%v1.0 = extractelement <4 x i32> %v1, i32 0
%v1.1 = extractelement <4 x i32> %v1, i32 1		%v1.1 = extractelement <4 x i32> %v1, i32 1
Show All 15 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
%tmp1.1 = sub i32 %v0.1, %v1.1		%tmp1.1 = sub i32 %v0.1, %v1.1
%tmp2.0 = add i32 %tmp0.0, %tmp0.1		%tmp2.0 = add i32 %tmp0.0, %tmp0.1
%tmp2.1 = add i32 %tmp1.0, %tmp1.1		%tmp2.1 = add i32 %tmp1.0, %tmp1.1
%tmp3.0 = insertelement <4 x i32> poison, i32 %tmp2.0, i32 0		%tmp3.0 = insertelement <4 x i32> poison, i32 %tmp2.0, i32 0
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_1(		; CHECK-LABEL: @build_vec_v4i32_reuse_1(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP1_2:%.*]] = sub i32 [[TMP0_2]], [[TMP0_3]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP1_3:%.*]] = sub i32 [[TMP0_3]], [[TMP0_2]]		; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]
		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1_0]], i32 0		; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1_0]], i32 0
; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1		; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1
; CHECK-NEXT: [[TMP2_2:%.*]] = insertelement <4 x i32> [[TMP2_1]], i32 [[TMP1_2]], i32 2		; CHECK-NEXT: [[TMP2_3:%.*]] = shufflevector <4 x i32> [[TMP2_1]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP2_3:%.*]] = insertelement <4 x i32> [[TMP2_2]], i32 [[TMP1_3]], i32 3
; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
%tmp0.3 = xor i32 %v0.1, %v1.1		%tmp0.3 = xor i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %tmp0.0, %tmp0.1		%tmp1.0 = sub i32 %tmp0.0, %tmp0.1
%tmp1.1 = sub i32 %tmp0.0, %tmp0.1		%tmp1.1 = sub i32 %tmp0.0, %tmp0.1
%tmp1.2 = sub i32 %tmp0.2, %tmp0.3		%tmp1.2 = sub i32 %tmp0.2, %tmp0.3
%tmp1.3 = sub i32 %tmp0.3, %tmp0.2		%tmp1.3 = sub i32 %tmp0.3, %tmp0.2
%tmp2.0 = insertelement <4 x i32> poison, i32 %tmp1.0, i32 0		%tmp2.0 = insertelement <4 x i32> poison, i32 %tmp1.0, i32 0
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP7]], [[TMP8]]
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
Show All 11 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]		; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]
; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]		; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])		; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])
; CHECK-NEXT: ret i32 [[TMP15]]		; CHECK-NEXT: ret i32 [[TMP15]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"		target triple = "aarch64--linux-gnu"

define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {		define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {
; CHECK-LABEL: @build_vec_v2i64(		; CHECK-LABEL: @build_vec_v2i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP3]], [[TMP6]]
; CHECK-NEXT: ret <2 x i64> [[TMP9]]		; CHECK-NEXT: ret <2 x i64> [[TMP7]]
;		;
%v0.0 = extractelement <2 x i64> %v0, i32 0		%v0.0 = extractelement <2 x i64> %v0, i32 0
%v0.1 = extractelement <2 x i64> %v0, i32 1		%v0.1 = extractelement <2 x i64> %v0, i32 1
%v1.0 = extractelement <2 x i64> %v1, i32 0		%v1.0 = extractelement <2 x i64> %v1, i32 0
%v1.1 = extractelement <2 x i64> %v1, i32 1		%v1.1 = extractelement <2 x i64> %v1, i32 1
%tmp0.0 = add i64 %v0.0, %v1.0		%tmp0.0 = add i64 %v0.0, %v1.0
%tmp0.1 = add i64 %v0.1, %v1.1		%tmp0.1 = add i64 %v0.1, %v1.1
%tmp1.0 = sub i64 %v0.0, %v1.0		%tmp1.0 = sub i64 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
store i64 %tmp2.0, i64* %c.0, align 8		store i64 %tmp2.0, i64* %c.0, align 8
store i64 %tmp2.1, i64* %c.1, align 8		store i64 %tmp2.1, i64* %c.1, align 8
ret void		ret void
}		}

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP3]], [[TMP8]]
; CHECK-NEXT: ret <4 x i32> [[TMP9]]		; CHECK-NEXT: ret <4 x i32> [[TMP9]]
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
%v0.2 = extractelement <4 x i32> %v0, i32 2		%v0.2 = extractelement <4 x i32> %v0, i32 2
%v0.3 = extractelement <4 x i32> %v0, i32 3		%v0.3 = extractelement <4 x i32> %v0, i32 3
%v1.0 = extractelement <4 x i32> %v1, i32 0		%v1.0 = extractelement <4 x i32> %v1, i32 0
%v1.1 = extractelement <4 x i32> %v1, i32 1		%v1.1 = extractelement <4 x i32> %v1, i32 1
Show All 15 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
%tmp1.1 = sub i32 %v0.1, %v1.1		%tmp1.1 = sub i32 %v0.1, %v1.1
%tmp2.0 = add i32 %tmp0.0, %tmp0.1		%tmp2.0 = add i32 %tmp0.0, %tmp0.1
%tmp2.1 = add i32 %tmp1.0, %tmp1.1		%tmp2.1 = add i32 %tmp1.0, %tmp1.1
%tmp3.0 = insertelement <4 x i32> undef, i32 %tmp2.0, i32 0		%tmp3.0 = insertelement <4 x i32> undef, i32 %tmp2.0, i32 0
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_1(		; CHECK-LABEL: @build_vec_v4i32_reuse_1(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP1_2:%.*]] = sub i32 [[TMP0_2]], [[TMP0_3]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP1_3:%.*]] = sub i32 [[TMP0_3]], [[TMP0_2]]		; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]
		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1_0]], i32 0		; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1_0]], i32 0
; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1		; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1
; CHECK-NEXT: [[TMP2_2:%.*]] = insertelement <4 x i32> [[TMP2_1]], i32 [[TMP1_2]], i32 2		; CHECK-NEXT: [[TMP2_3:%.*]] = shufflevector <4 x i32> [[TMP2_1]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP2_3:%.*]] = insertelement <4 x i32> [[TMP2_2]], i32 [[TMP1_3]], i32 3
; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
%tmp0.3 = xor i32 %v0.1, %v1.1		%tmp0.3 = xor i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %tmp0.0, %tmp0.1		%tmp1.0 = sub i32 %tmp0.0, %tmp0.1
%tmp1.1 = sub i32 %tmp0.0, %tmp0.1		%tmp1.1 = sub i32 %tmp0.0, %tmp0.1
%tmp1.2 = sub i32 %tmp0.2, %tmp0.3		%tmp1.2 = sub i32 %tmp0.2, %tmp0.3
%tmp1.3 = sub i32 %tmp0.3, %tmp0.2		%tmp1.3 = sub i32 %tmp0.3, %tmp0.2
%tmp2.0 = insertelement <4 x i32> undef, i32 %tmp1.0, i32 0		%tmp2.0 = insertelement <4 x i32> undef, i32 %tmp1.0, i32 0
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP7]], [[TMP8]]
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
Show All 11 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]		; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]
; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]		; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])		; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])
; CHECK-NEXT: ret i32 [[TMP15]]		; CHECK-NEXT: ret i32 [[TMP15]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/trunc-insertion.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"
	@d = internal unnamed_addr global i32 5, align 4			@d = internal unnamed_addr global i32 5, align 4

	define dso_local void @l() local_unnamed_addr {			define dso_local void @l() local_unnamed_addr {
	; CHECK-LABEL: @l(			; CHECK-LABEL: @l(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ undef, [[BB:%.]] ], [ [[TMP11:%.]], [[BB25:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ poison, [[BB:%.]] ], [ [[TMP11:%.]], [[BB25:%.]] ]
	; CHECK-NEXT: br i1 undef, label [[BB3:%.]], label [[BB11:%.]]			; CHECK-NEXT: br i1 undef, label [[BB3:%.]], label [[BB11:%.]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[I4:%.*]] = zext i1 undef to i32			; CHECK-NEXT: [[I4:%.*]] = zext i1 undef to i32
	; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i16> [[TMP0]], undef			; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i16> [[TMP0]], poison
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <2 x i16> [[TMP1]], <i16 8, i16 8>			; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <2 x i16> [[TMP1]], <i16 8, i16 8>
	; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i1> [[TMP2]] to <2 x i32>			; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i1> [[TMP2]] to <2 x i32>
	; CHECK-NEXT: br label [[BB25]]			; CHECK-NEXT: br label [[BB25]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: [[I12:%.*]] = zext i1 undef to i32			; CHECK-NEXT: [[I12:%.*]] = zext i1 undef to i32
	; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i16> [[TMP0]], undef			; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i16> [[TMP0]], poison
	; CHECK-NEXT: [[TMP5:%.*]] = sext <2 x i16> [[TMP4]] to <2 x i64>			; CHECK-NEXT: [[TMP5:%.*]] = sext <2 x i16> [[TMP4]] to <2 x i64>
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ule <2 x i64> undef, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = icmp ule <2 x i64> poison, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = zext <2 x i1> [[TMP6]] to <2 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = zext <2 x i1> [[TMP6]] to <2 x i32>
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ult <2 x i32> undef, [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp ult <2 x i32> poison, [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = zext <2 x i1> [[TMP8]] to <2 x i32>			; CHECK-NEXT: [[TMP9:%.*]] = zext <2 x i1> [[TMP8]] to <2 x i32>
	; CHECK-NEXT: br label [[BB25]]			; CHECK-NEXT: br label [[BB25]]
	; CHECK: bb25:			; CHECK: bb25:
	; CHECK-NEXT: [[I28:%.*]] = phi i32 [ [[I12]], [[BB11]] ], [ [[I4]], [[BB3]] ]			; CHECK-NEXT: [[I28:%.*]] = phi i32 [ [[I12]], [[BB11]] ], [ [[I4]], [[BB3]] ]
	; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x i32> [ [[TMP9]], [[BB11]] ], [ [[TMP3]], [[BB3]] ]			; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x i32> [ [[TMP9]], [[BB11]] ], [ [[TMP3]], [[BB3]] ]
	; CHECK-NEXT: [[TMP11]] = phi <2 x i16> [ [[TMP4]], [[BB11]] ], [ [[TMP1]], [[BB3]] ]			; CHECK-NEXT: [[TMP11]] = phi <2 x i16> [ [[TMP4]], [[BB11]] ], [ [[TMP1]], [[BB3]] ]
	; CHECK-NEXT: [[TMP12:%.*]] = trunc <2 x i32> [[TMP10]] to <2 x i8>			; CHECK-NEXT: [[TMP12:%.*]] = trunc <2 x i32> [[TMP10]] to <2 x i8>
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i8> [[TMP12]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i8> [[TMP12]], i32 0
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> poison, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> poison, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <3 x i16> [[ARG0:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.*]] = extractelement <3 x i16> [[ARG0]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <3 x i16> [[ARG1:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG1_2:%.*]] = extractelement <3 x i16> [[ARG1]], i64 2
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP3:%.*]] = extractelement <2 x i16> [[TMP2]], i32 0			; GFX8-NEXT: [[TMP1:%.*]] = extractelement <2 x i16> [[TMP0]], i32 0
	; GFX8-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> poison, i16 [[TMP3]], i64 0			; GFX8-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> poison, i16 [[TMP1]], i64 0
	; GFX8-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP2]], i32 1			; GFX8-NEXT: [[TMP2:%.*]] = extractelement <2 x i16> [[TMP0]], i32 1
	; GFX8-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[TMP4]], i64 1			; GFX8-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[TMP2]], i64 1
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	Show All 26 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP2:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP3:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP2]])
	; GFX8-NEXT: [[INS_3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_3:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_3]]			; GFX8-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> undef, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> undef, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <3 x i16> [[ARG0:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.*]] = extractelement <3 x i16> [[ARG0]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <3 x i16> [[ARG1:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG1_2:%.*]] = extractelement <3 x i16> [[ARG1]], i64 2
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP3:%.*]] = extractelement <2 x i16> [[TMP2]], i32 0			; GFX8-NEXT: [[TMP1:%.*]] = extractelement <2 x i16> [[TMP0]], i32 0
	; GFX8-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> undef, i16 [[TMP3]], i64 0			; GFX8-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> undef, i16 [[TMP1]], i64 0
	; GFX8-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP2]], i32 1			; GFX8-NEXT: [[TMP2:%.*]] = extractelement <2 x i16> [[TMP0]], i32 1
	; GFX8-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[TMP4]], i64 1			; GFX8-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[TMP2]], i64 1
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	Show All 26 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP2:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP3:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP2]])
	; GFX8-NEXT: [[INS_3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_3:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_3]]			; GFX8-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=systemz-unknown -mcpu=z13 -slp-vectorizer -S < %s \| FileCheck %s			; RUN: opt -mtriple=systemz-unknown -mcpu=z13 -slp-vectorizer -S < %s \| FileCheck %s

	@bar = external global [4 x [4 x i32]], align 4			@bar = external global [4 x [4 x i32]], align 4
	@dct_luma = external global [4 x [4 x i32]], align 4			@dct_luma = external global [4 x [4 x i32]], align 4

	define void @foo() local_unnamed_addr {			define void @foo() local_unnamed_addr {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD277:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD277:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: store i32 [[ADD277]], i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4			; CHECK-NEXT: store i32 [[ADD277]], i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4
	; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0			; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0
	; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1			; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2), align 4
	; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2			; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 3), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0) to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[ADD277]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP1]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ADD277]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[TMP2]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> undef, [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = ashr <4 x i32> [[TMP7]], <i32 6, i32 6, i32 6, i32 6>			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP0]], i32 3
				; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[TMP6]], i32 3
				; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> poison, [[TMP7]]
				; CHECK-NEXT: [[TMP9:%.*]] = ashr <4 x i32> [[TMP8]], <i32 6, i32 6, i32 6, i32 6>
	; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3			; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP10]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%add277 = add nsw i32 undef, undef			%add277 = add nsw i32 undef, undef
	store i32 %add277, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4			store i32 %add277, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4
	%0 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4			%0 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4
	%sub355 = add nsw i32 undef, %0			%sub355 = add nsw i32 undef, %0
	%shr.i = ashr i32 %sub355, 6			%shr.i = ashr i32 %sub355, 6
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP15:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP14:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])
	; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]			; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
	Show All 15 Lines
	; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[OP_EXTRA26]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_EXTRA26]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = and <2 x i32> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP12]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i32> [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP14]] = insertelement <2 x i32> [[TMP12]], i32 [[TMP13]], i32 1
	; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 undef, i32 undef>
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>			; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 poison, i32 poison>
	; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496			; FORCE_REDUCTION-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP3]])			; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
	Show All 14 Lines
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529			; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
	; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]			; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]
	; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685			; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP2]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_41]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 14910, i32 1			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = and <2 x i32> [[TMP6]], [[TMP7]]
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = and <2 x i32> [[TMP8]], [[TMP10]]			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP6]], [[TMP7]]
	; FORCE_REDUCTION-NEXT: [[TMP12:%.*]] = add <2 x i32> [[TMP8]], [[TMP10]]			; FORCE_REDUCTION-NEXT: [[TMP10]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: [[TMP13]] = shufflevector <2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]			%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]
	%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]			%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
				; SSE-LABEL: @ceil_floor(
				; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
				; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
				; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
				; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
				; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
				; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
				; SSE-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
				; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
				; SSE-NEXT: [[AB4:%.*]] = call float @llvm.ceil.f32(float [[A4]])
				; SSE-NEXT: [[AB5:%.*]] = call float @llvm.ceil.f32(float [[A5]])
				; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
				; SSE-NEXT: [[TMP3:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP2]])
				; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
				; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
				; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP5]], i32 1
				; SSE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
				; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP6]], i32 2
				; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
				; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
				; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
				; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[R5]], <8 x float> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
				; SSE-NEXT: ret <8 x float> [[R7]]
				;
	; CHECK-LABEL: @ceil_floor(			; CHECK-LABEL: @ceil_floor(
	; CHECK-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; CHECK-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; CHECK-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
	; CHECK-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
	; CHECK-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; CHECK-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
	; CHECK-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
	; CHECK-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
	; CHECK-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
	; CHECK-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; CHECK-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; CHECK-NEXT: [[AB1:%.*]] = call float @llvm.floor.f32(float [[A1]])			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[AB2:%.*]] = call float @llvm.floor.f32(float [[A2]])			; CHECK-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; CHECK-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[AB4:%.*]] = call float @llvm.ceil.f32(float [[A4]])			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 undef>
	; CHECK-NEXT: [[AB5:%.*]] = call float @llvm.ceil.f32(float [[A5]])			; CHECK-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP3]])
	; CHECK-NEXT: [[AB6:%.*]] = call float @llvm.floor.f32(float [[A6]])			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[AB7:%.*]] = call float @llvm.floor.f32(float [[A7]])			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
				; CHECK-NEXT: [[TMP7:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP6]])
				; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0			; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
	; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1			; CHECK-NEXT: [[R2:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2			; CHECK-NEXT: [[R5:%.*]] = shufflevector <8 x float> [[R2]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 undef, i32 undef>
	; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3			; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[R5]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
	; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
	; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
	; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
	; CHECK-NEXT: ret <8 x float> [[R7]]			; CHECK-NEXT: ret <8 x float> [[R7]]
	;			;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	%a1 = extractelement <8 x float> %a, i32 1			%a1 = extractelement <8 x float> %a, i32 1
	%a2 = extractelement <8 x float> %a, i32 2			%a2 = extractelement <8 x float> %a, i32 2
	%a3 = extractelement <8 x float> %a, i32 3			%a3 = extractelement <8 x float> %a, i32 3
	%a4 = extractelement <8 x float> %a, i32 4			%a4 = extractelement <8 x float> %a, i32 4
	%a5 = extractelement <8 x float> %a, i32 5			%a5 = extractelement <8 x float> %a, i32 5
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
				; SSE-LABEL: @ceil_floor(
				; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
				; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
				; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
				; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
				; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
				; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
				; SSE-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
				; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
				; SSE-NEXT: [[AB4:%.*]] = call float @llvm.ceil.f32(float [[A4]])
				; SSE-NEXT: [[AB5:%.*]] = call float @llvm.ceil.f32(float [[A5]])
				; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
				; SSE-NEXT: [[TMP3:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP2]])
				; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
				; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
				; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP5]], i32 1
				; SSE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
				; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP6]], i32 2
				; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
				; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
				; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
				; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[R5]], <8 x float> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
				; SSE-NEXT: ret <8 x float> [[R7]]
				;
	; CHECK-LABEL: @ceil_floor(			; CHECK-LABEL: @ceil_floor(
	; CHECK-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; CHECK-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; CHECK-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
	; CHECK-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
	; CHECK-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; CHECK-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
	; CHECK-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
	; CHECK-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
	; CHECK-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
	; CHECK-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; CHECK-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; CHECK-NEXT: [[AB1:%.*]] = call float @llvm.floor.f32(float [[A1]])			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[AB2:%.*]] = call float @llvm.floor.f32(float [[A2]])			; CHECK-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; CHECK-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[AB4:%.*]] = call float @llvm.ceil.f32(float [[A4]])			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 undef>
	; CHECK-NEXT: [[AB5:%.*]] = call float @llvm.ceil.f32(float [[A5]])			; CHECK-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP3]])
	; CHECK-NEXT: [[AB6:%.*]] = call float @llvm.floor.f32(float [[A6]])			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[AB7:%.*]] = call float @llvm.floor.f32(float [[A7]])			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
				; CHECK-NEXT: [[TMP7:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP6]])
				; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0			; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
	; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1			; CHECK-NEXT: [[R2:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2			; CHECK-NEXT: [[R5:%.*]] = shufflevector <8 x float> [[R2]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 undef, i32 undef>
	; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3			; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[R5]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
	; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
	; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
	; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
	; CHECK-NEXT: ret <8 x float> [[R7]]			; CHECK-NEXT: ret <8 x float> [[R7]]
	;			;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	%a1 = extractelement <8 x float> %a, i32 1			%a1 = extractelement <8 x float> %a, i32 1
	%a2 = extractelement <8 x float> %a, i32 2			%a2 = extractelement <8 x float> %a, i32 2
	%a3 = extractelement <8 x float> %a, i32 3			%a3 = extractelement <8 x float> %a, i32 3
	%a4 = extractelement <8 x float> %a, i32 4			%a4 = extractelement <8 x float> %a, i32 4
	%a5 = extractelement <8 x float> %a, i32 5			%a5 = extractelement <8 x float> %a, i32 5
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512

define <8 x float> @sitofp_uitofp(<8 x i32> %a) {		define <8 x float> @sitofp_uitofp(<8 x i32> %a) {
; SSE-LABEL: @sitofp_uitofp(		; SSE-LABEL: @sitofp_uitofp(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[TMP1:%.*]] = sitofp <4 x i32> [[SHUFFLE]] to <4 x float>
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[TMP3:%.*]] = uitofp <4 x i32> [[TMP2]] to <4 x float>
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4		; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; SSE-NEXT: [[AB2:%.*]] = sitofp i32 [[A2]] to float
; SSE-NEXT: [[AB3:%.*]] = sitofp i32 [[A3]] to float
; SSE-NEXT: [[AB4:%.*]] = uitofp i32 [[A4]] to float
; SSE-NEXT: [[AB5:%.*]] = uitofp i32 [[A5]] to float
; SSE-NEXT: [[AB6:%.*]] = uitofp i32 [[A6]] to float
; SSE-NEXT: [[AB7:%.*]] = uitofp i32 [[A7]] to float
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]		; SSE-NEXT: ret <8 x float> [[R7]]
;		;
; SLM-LABEL: @sitofp_uitofp(		; SLM-LABEL: @sitofp_uitofp(
; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[R7]]
;		;
Show All 33 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x i32> @fptosi_fptoui(<8 x float> %a) {		define <8 x i32> @fptosi_fptoui(<8 x float> %a) {
; SSE-LABEL: @fptosi_fptoui(		; SSE-LABEL: @fptosi_fptoui(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4		; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32		; SSE-NEXT: [[TMP1:%.*]] = fptosi <4 x float> [[SHUFFLE]] to <4 x i32>
; SSE-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32
; SSE-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; SSE-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32		; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32		; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32		; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32		; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; SSE-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP2]], i32 0
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; SSE-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP3]], i32 1
		; SSE-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP1]], i32 2
		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP4]], i32 2
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP5]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @fptosi_fptoui(		; SLM-LABEL: @fptosi_fptoui(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
Show All 18 Lines
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SLM-NEXT: ret <8 x i32> [[R7]]		; SLM-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX-LABEL: @fptosi_fptoui(		; AVX-LABEL: @fptosi_fptoui(
; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; AVX-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1		; AVX-NEXT: [[TMP1:%.*]] = fptosi <4 x float> [[SHUFFLE]] to <4 x i32>
; AVX-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3		; AVX-NEXT: [[TMP3:%.*]] = fptoui <4 x float> [[TMP2]] to <4 x i32>
; AVX-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4		; AVX-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; AVX-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; AVX-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; AVX-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32
; AVX-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32
; AVX-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; AVX-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; AVX-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; AVX-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; AVX-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; AVX-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX-NEXT: ret <8 x i32> [[R7]]		; AVX-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @fptosi_fptoui(		; AVX512-LABEL: @fptosi_fptoui(
; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>		; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>
; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>		; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	;
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; SSE-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; SSE-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2		; SSE-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
; SSE-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0		; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1		; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float		; SSE-NEXT: [[TMP1:%.*]] = sitofp <2 x i32> [[SHUFFLE]] to <2 x float>
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; SSE-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float		; SSE-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float
; SSE-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float		; SSE-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float
; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float		; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; SSE-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float		; SSE-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float		; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float		; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0		; SSE-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP2]], i32 0
		; SSE-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]		; SSE-NEXT: ret <8 x float> [[R7]]
;		;
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512

define <8 x float> @sitofp_uitofp(<8 x i32> %a) {		define <8 x float> @sitofp_uitofp(<8 x i32> %a) {
; SSE-LABEL: @sitofp_uitofp(		; SSE-LABEL: @sitofp_uitofp(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[TMP1:%.*]] = sitofp <4 x i32> [[SHUFFLE]] to <4 x float>
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[TMP3:%.*]] = uitofp <4 x i32> [[TMP2]] to <4 x float>
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4		; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; SSE-NEXT: [[AB2:%.*]] = sitofp i32 [[A2]] to float
; SSE-NEXT: [[AB3:%.*]] = sitofp i32 [[A3]] to float
; SSE-NEXT: [[AB4:%.*]] = uitofp i32 [[A4]] to float
; SSE-NEXT: [[AB5:%.*]] = uitofp i32 [[A5]] to float
; SSE-NEXT: [[AB6:%.*]] = uitofp i32 [[A6]] to float
; SSE-NEXT: [[AB7:%.*]] = uitofp i32 [[A7]] to float
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]		; SSE-NEXT: ret <8 x float> [[R7]]
;		;
; SLM-LABEL: @sitofp_uitofp(		; SLM-LABEL: @sitofp_uitofp(
; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[R7]]
;		;
Show All 33 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x i32> @fptosi_fptoui(<8 x float> %a) {		define <8 x i32> @fptosi_fptoui(<8 x float> %a) {
; SSE-LABEL: @fptosi_fptoui(		; SSE-LABEL: @fptosi_fptoui(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4		; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32		; SSE-NEXT: [[TMP1:%.*]] = fptosi <4 x float> [[SHUFFLE]] to <4 x i32>
; SSE-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32
; SSE-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; SSE-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32		; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32		; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32		; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32		; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; SSE-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP2]], i32 0
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; SSE-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP3]], i32 1
		; SSE-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP1]], i32 2
		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP4]], i32 2
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP5]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @fptosi_fptoui(		; SLM-LABEL: @fptosi_fptoui(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
Show All 18 Lines
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SLM-NEXT: ret <8 x i32> [[R7]]		; SLM-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX-LABEL: @fptosi_fptoui(		; AVX-LABEL: @fptosi_fptoui(
; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; AVX-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1		; AVX-NEXT: [[TMP1:%.*]] = fptosi <4 x float> [[SHUFFLE]] to <4 x i32>
; AVX-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3		; AVX-NEXT: [[TMP3:%.*]] = fptoui <4 x float> [[TMP2]] to <4 x i32>
; AVX-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4		; AVX-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; AVX-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; AVX-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; AVX-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32
; AVX-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32
; AVX-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; AVX-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; AVX-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; AVX-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; AVX-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; AVX-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX-NEXT: ret <8 x i32> [[R7]]		; AVX-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @fptosi_fptoui(		; AVX512-LABEL: @fptosi_fptoui(
; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>		; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>
; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>		; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	;
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; SSE-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; SSE-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2		; SSE-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
; SSE-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0		; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1		; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float		; SSE-NEXT: [[TMP1:%.*]] = sitofp <2 x i32> [[SHUFFLE]] to <2 x float>
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; SSE-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float		; SSE-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float
; SSE-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float		; SSE-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float
; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float		; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; SSE-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float		; SSE-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float		; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float		; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0		; SSE-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP2]], i32 0
		; SSE-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]		; SSE-NEXT: ret <8 x float> [[R7]]
;		;
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0			; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
	; SLM-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2			; SLM-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; SLM-NEXT: [[AB0:%.*]] = fmul float [[A0]], 2.000000e+00			; SLM-NEXT: [[TMP1:%.*]] = fmul <2 x float> [[SHUFFLE]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> poison, float [[AB0]], i32 0			; SLM-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[A1]], i32 1			; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
				; SLM-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
				; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; AVX-NEXT: ret <4 x float> [[TMP1]]			; AVX-NEXT: ret <4 x float> [[TMP1]]
	;			;
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0			; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
	; SLM-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2			; SLM-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; SLM-NEXT: [[AB0:%.*]] = fmul float [[A0]], 2.000000e+00			; SLM-NEXT: [[TMP1:%.*]] = fmul <2 x float> [[SHUFFLE]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[AB0]], i32 0			; SLM-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[A1]], i32 1			; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
				; SLM-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
				; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; AVX-NEXT: ret <4 x float> [[TMP1]]			; AVX-NEXT: ret <4 x float> [[TMP1]]
	;			;
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX1-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX1-NEXT: [[TMP7:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; AVX1-NEXT: [[R3:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; AVX1-NEXT: [[R5:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 10, i32 11, i32 undef, i32 undef>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 14, i32 15>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32(		; AVX2-LABEL: @ashr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
Show All 35 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP1:%.*]] = ashr <4 x i32> [[SHUFFLE]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP3:%.*]] = shl <4 x i32> [[TMP2]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; AVX1-NEXT: [[TMP1:%.*]] = ashr <4 x i32> [[SHUFFLE]], <i32 2, i32 2, i32 2, i32 2>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP3:%.*]] = shl <4 x i32> [[TMP2]], <i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32_const(		; AVX2-LABEL: @ashr_shl_v8i32_const(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6
; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6		; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX2-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX2-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.*]] = ashr <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX2-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; AVX2-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX2-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX2-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP2]], i32 2
; AVX2-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 0		; AVX2-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP3]], i32 2
; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP2]], i32 3
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP4]], i32 3
; AVX2-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i32 4
; AVX2-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2		; AVX2-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP5]], i32 4
; AVX2-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP2]], i32 5
; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3		; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP6]], i32 5
; AVX2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX2-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX2-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX2-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX2-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX2-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX512-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6
; AVX512-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX512-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX512-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6		; AVX512-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX512-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX512-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.*]] = ashr <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX512-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; AVX512-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX512-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX512-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP2]], i32 2
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 0		; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP3]], i32 2
; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP2]], i32 3
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP4]], i32 3
; AVX512-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i32 4
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2		; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP5]], i32 4
; AVX512-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP2]], i32 5
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3		; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP6]], i32 5
; AVX512-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX512-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	;
%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4		%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {		define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {
; CHECK-LABEL: @sdiv_v8i32_undefs(		; SSE-LABEL: @sdiv_v8i32_undefs(
; CHECK-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; SSE-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; CHECK-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; CHECK-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; CHECK-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; CHECK-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; CHECK-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; CHECK-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; SSE-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; CHECK-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8		; SSE-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; CHECK-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16		; SSE-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; CHECK-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; SSE-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; CHECK-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8		; SSE-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
; CHECK-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16		; SSE-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[AB2]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[AB3]], i32 3		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; CHECK-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX1-LABEL: @sdiv_v8i32_undefs(
		; AVX1-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; AVX1-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
		; AVX1-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; AVX1-NEXT: [[TMP2:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 16, i32 16, i32 4, i32 8>
		; AVX1-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
		; AVX1-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP2]], i32 1
		; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP2]], i32 2
		; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
		; AVX1-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[TMP6]], i32 6
		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; AVX1-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX2-LABEL: @sdiv_v8i32_undefs(
		; AVX2-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
		; AVX2-NEXT: ret <8 x i32> [[TMP1]]
		;
		; AVX512-LABEL: @sdiv_v8i32_undefs(
		; AVX512-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
		; AVX512-NEXT: ret <8 x i32> [[TMP1]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX1-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX1-NEXT: [[TMP7:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; AVX1-NEXT: [[R3:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; AVX1-NEXT: [[R5:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 10, i32 11, i32 undef, i32 undef>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 14, i32 15>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32(		; AVX2-LABEL: @ashr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
Show All 35 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP1:%.*]] = ashr <4 x i32> [[SHUFFLE]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP3:%.*]] = shl <4 x i32> [[TMP2]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; AVX1-NEXT: [[TMP1:%.*]] = ashr <4 x i32> [[SHUFFLE]], <i32 2, i32 2, i32 2, i32 2>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP3:%.*]] = shl <4 x i32> [[TMP2]], <i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32_const(		; AVX2-LABEL: @ashr_shl_v8i32_const(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6
; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6		; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX2-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX2-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP1:%.*]] = ashr <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX2-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; AVX2-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX2-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX2-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP2]], i32 2
; AVX2-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 0		; AVX2-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP3]], i32 2
; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP2]], i32 3
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP4]], i32 3
; AVX2-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i32 4
; AVX2-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2		; AVX2-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP5]], i32 4
; AVX2-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP2]], i32 5
; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3		; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP6]], i32 5
; AVX2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX2-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX2-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX2-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX2-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX2-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX512-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6
; AVX512-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX512-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX512-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6		; AVX512-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX512-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX512-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP1:%.*]] = ashr <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX512-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; AVX512-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX512-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX512-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP2]], i32 2
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 0		; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP3]], i32 2
; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP2]], i32 3
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP4]], i32 3
; AVX512-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i32 4
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2		; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP5]], i32 4
; AVX512-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP2]], i32 5
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3		; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP6]], i32 5
; AVX512-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX512-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	;
%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4		%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {		define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {
; CHECK-LABEL: @sdiv_v8i32_undefs(		; SSE-LABEL: @sdiv_v8i32_undefs(
; CHECK-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; SSE-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; CHECK-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; CHECK-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; CHECK-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; CHECK-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; CHECK-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; CHECK-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; SSE-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; CHECK-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8		; SSE-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; CHECK-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16		; SSE-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; CHECK-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; SSE-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; CHECK-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8		; SSE-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
; CHECK-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16		; SSE-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 poison, i32 poison, i32 poison>, i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 poison, i32 poison, i32 poison>, i32 [[AB1]], i32 1
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; CHECK-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX1-LABEL: @sdiv_v8i32_undefs(
		; AVX1-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; AVX1-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
		; AVX1-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; AVX1-NEXT: [[TMP2:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 16, i32 16, i32 4, i32 8>
		; AVX1-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
		; AVX1-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP2]], i32 1
		; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP2]], i32 2
		; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
		; AVX1-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[TMP6]], i32 6
		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; AVX1-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX2-LABEL: @sdiv_v8i32_undefs(
		; AVX2-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
		; AVX2-NEXT: ret <8 x i32> [[TMP1]]
		;
		; AVX512-LABEL: @sdiv_v8i32_undefs(
		; AVX512-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
		; AVX512-NEXT: ret <8 x i32> [[TMP1]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	;
%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1		%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1
%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
ret <4 x i8> %ins4		ret <4 x i8> %ins4
}		}

define i8 @i(<4 x i8> %x, <4 x i8> %y) {		define i8 @i(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @i(		; CHECK-LABEL: @i(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.]] = mul <4 x i8> [[Y:%.]], [[Y]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = add i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[Y1Y1]], [[Y2Y2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = add i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = add i8 %1, %2		%3 = add i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @j(<4 x i8> %x, <4 x i8> %y) {		define i8 @j(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @j(		; CHECK-LABEL: @j(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.]] = mul <4 x i8> [[Y:%.]], [[Y]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[Y1Y1]], [[Y2Y2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X0X0]], [[X3X3]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 2
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: ret i8 [[TMP3]]		; CHECK-NEXT: [[TMP6:%.*]] = sdiv i8 [[TMP2]], [[TMP5]]
		; CHECK-NEXT: ret i8 [[TMP6]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	;
%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1		%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1
%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
ret <4 x i8> %ins4		ret <4 x i8> %ins4
}		}

define i8 @i(<4 x i8> %x, <4 x i8> %y) {		define i8 @i(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @i(		; CHECK-LABEL: @i(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.]] = mul <4 x i8> [[Y:%.]], [[Y]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = add i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[Y1Y1]], [[Y2Y2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = add i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = add i8 %1, %2		%3 = add i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @j(<4 x i8> %x, <4 x i8> %y) {		define i8 @j(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @j(		; CHECK-LABEL: @j(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.]] = mul <4 x i8> [[Y:%.]], [[Y]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[Y1Y1]], [[Y2Y2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X0X0]], [[X3X3]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 2
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: ret i8 [[TMP3]]		; CHECK-NEXT: [[TMP6:%.*]] = sdiv i8 [[TMP2]], [[TMP5]]
		; CHECK-NEXT: ret i8 [[TMP6]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	;
%d3 = insertelement <4 x i1> %d2, i1 %c3, i32 3		%d3 = insertelement <4 x i1> %d2, i1 %c3, i32 3
%r = sext <4 x i1> %d3 to <4 x i32>		%r = sext <4 x i1> %d3 to <4 x i32>
ret <4 x i32> %r		ret <4 x i32> %r
}		}

define <4 x i32> @fcmp_ord_uno_v4i32(<4 x float> %a, float* %b) {		define <4 x i32> @fcmp_ord_uno_v4i32(<4 x float> %a, float* %b) {
; CHECK-LABEL: @fcmp_ord_uno_v4i32(		; CHECK-LABEL: @fcmp_ord_uno_v4i32(
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds float, float [[B]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3		; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4		; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4
; CHECK-NEXT: [[B1:%.]] = load float, float [[P1]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*
; CHECK-NEXT: [[B2:%.]] = load float, float [[P2]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4		; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]		; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]
; CHECK-NEXT: [[C1:%.*]] = fcmp uno float [[B1]], [[A1]]		; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[C2:%.*]] = fcmp uno float [[B2]], [[A2]]		; CHECK-NEXT: [[TMP3:%.*]] = fcmp uno <2 x float> [[TMP2]], [[SHRINK_SHUFFLE]]
; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]		; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]
; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> poison, i1 [[C0]], i32 0		; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> poison, i1 [[C0]], i32 0
; CHECK-NEXT: [[D1:%.*]] = insertelement <4 x i1> [[D0]], i1 [[C1]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
; CHECK-NEXT: [[D2:%.*]] = insertelement <4 x i1> [[D1]], i1 [[C2]], i32 2		; CHECK-NEXT: [[D1:%.*]] = insertelement <4 x i1> [[D0]], i1 [[TMP4]], i32 1
		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1
		; CHECK-NEXT: [[D2:%.*]] = insertelement <4 x i1> [[D1]], i1 [[TMP5]], i32 2
; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D2]], i1 [[C3]], i32 3		; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D2]], i1 [[C3]], i32 3
; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>		; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	;
%d3 = insertelement <4 x i1> %d2, i1 %c3, i32 3		%d3 = insertelement <4 x i1> %d2, i1 %c3, i32 3
%r = sext <4 x i1> %d3 to <4 x i32>		%r = sext <4 x i1> %d3 to <4 x i32>
ret <4 x i32> %r		ret <4 x i32> %r
}		}

define <4 x i32> @fcmp_ord_uno_v4i32(<4 x float> %a, float* %b) {		define <4 x i32> @fcmp_ord_uno_v4i32(<4 x float> %a, float* %b) {
; CHECK-LABEL: @fcmp_ord_uno_v4i32(		; CHECK-LABEL: @fcmp_ord_uno_v4i32(
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds float, float [[B]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3		; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4		; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4
; CHECK-NEXT: [[B1:%.]] = load float, float [[P1]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*
; CHECK-NEXT: [[B2:%.]] = load float, float [[P2]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4		; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]		; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]
; CHECK-NEXT: [[C1:%.*]] = fcmp uno float [[B1]], [[A1]]		; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[C2:%.*]] = fcmp uno float [[B2]], [[A2]]		; CHECK-NEXT: [[TMP3:%.*]] = fcmp uno <2 x float> [[TMP2]], [[SHRINK_SHUFFLE]]
; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]		; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]
; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> undef, i1 [[C0]], i32 0		; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> undef, i1 [[C0]], i32 0
; CHECK-NEXT: [[D1:%.*]] = insertelement <4 x i1> [[D0]], i1 [[C1]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
; CHECK-NEXT: [[D2:%.*]] = insertelement <4 x i1> [[D1]], i1 [[C2]], i32 2		; CHECK-NEXT: [[D1:%.*]] = insertelement <4 x i1> [[D0]], i1 [[TMP4]], i32 1
		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1
		; CHECK-NEXT: [[D2:%.*]] = insertelement <4 x i1> [[D1]], i1 [[TMP5]], i32 2
; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D2]], i1 [[C3]], i32 3		; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D2]], i1 [[C3]], i32 3
; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>		; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i8 [[TMP14]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 13), align 1			; SSE-NEXT: store i8 [[TMP14]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 13), align 1
	; SSE-NEXT: [[TMP15:%.*]] = xor i8 [[A]], [[C]]			; SSE-NEXT: [[TMP15:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: store i8 [[TMP15]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 14), align 1			; SSE-NEXT: store i8 [[TMP15]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 14), align 1
	; SSE-NEXT: [[TMP16:%.*]] = xor i8 [[A]], [[C]]			; SSE-NEXT: [[TMP16:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: store i8 [[TMP16]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1			; SSE-NEXT: store i8 [[TMP16]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @splat(			; AVX-LABEL: @splat(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <16 x i8> [[TMP1]], i8 [[C]], i32 1			; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <16 x i8> [[TMP2]], i8 [[C]], i32 2			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 [[C]], i32 3			; AVX-NEXT: [[TMP3:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 [[C]], i32 4			; AVX-NEXT: [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 [[C]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 [[C]], i32 5			; AVX-NEXT: [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 [[C]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <16 x i8> [[TMP6]], i8 [[C]], i32 6			; AVX-NEXT: [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 [[C]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <16 x i8> [[TMP7]], i8 [[C]], i32 7			; AVX-NEXT: [[TMP7:%.*]] = insertelement <16 x i8> [[TMP6]], i8 [[C]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = insertelement <16 x i8> [[TMP8]], i8 [[C]], i32 8			; AVX-NEXT: [[TMP8:%.*]] = insertelement <16 x i8> [[TMP7]], i8 [[C]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = insertelement <16 x i8> [[TMP9]], i8 [[C]], i32 9			; AVX-NEXT: [[TMP9:%.*]] = insertelement <16 x i8> [[TMP8]], i8 [[C]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <16 x i8> [[TMP10]], i8 [[C]], i32 10			; AVX-NEXT: [[TMP10:%.*]] = insertelement <16 x i8> [[TMP9]], i8 [[C]], i32 7
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <16 x i8> [[TMP11]], i8 [[C]], i32 11			; AVX-NEXT: [[TMP11:%.*]] = insertelement <16 x i8> [[TMP10]], i8 [[C]], i32 8
	; AVX-NEXT: [[TMP13:%.*]] = insertelement <16 x i8> [[TMP12]], i8 [[C]], i32 12			; AVX-NEXT: [[TMP12:%.*]] = insertelement <16 x i8> [[TMP11]], i8 [[C]], i32 9
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <16 x i8> [[TMP13]], i8 [[C]], i32 13			; AVX-NEXT: [[TMP13:%.*]] = insertelement <16 x i8> [[TMP12]], i8 [[C]], i32 10
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <16 x i8> [[TMP14]], i8 [[C]], i32 14			; AVX-NEXT: [[TMP14:%.*]] = insertelement <16 x i8> [[TMP13]], i8 [[C]], i32 11
	; AVX-NEXT: [[TMP16:%.*]] = insertelement <16 x i8> [[TMP15]], i8 [[C]], i32 15			; AVX-NEXT: [[TMP15:%.*]] = insertelement <16 x i8> [[TMP14]], i8 [[C]], i32 12
	; AVX-NEXT: [[TMP17:%.]] = insertelement <2 x i8> poison, i8 [[A:%.]], i32 0			; AVX-NEXT: [[TMP16:%.*]] = insertelement <16 x i8> [[TMP15]], i8 [[C]], i32 13
	; AVX-NEXT: [[TMP18:%.]] = insertelement <2 x i8> [[TMP17]], i8 [[B:%.]], i32 1			; AVX-NEXT: [[TMP17:%.*]] = insertelement <16 x i8> [[TMP16]], i8 [[C]], i32 14
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i8> [[TMP18]], <2 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; AVX-NEXT: [[TMP18:%.*]] = insertelement <16 x i8> [[TMP17]], i8 [[C]], i32 15
	; AVX-NEXT: [[TMP19:%.*]] = xor <16 x i8> [[TMP16]], [[SHUFFLE]]			; AVX-NEXT: [[TMP19:%.*]] = xor <16 x i8> [[SHUFFLE]], [[TMP18]]
	; AVX-NEXT: store <16 x i8> [[TMP19]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16			; AVX-NEXT: store <16 x i8> [[TMP19]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%1 = xor i8 %c, %a			%1 = xor i8 %c, %a
	store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16			store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16
	%2 = xor i8 %a, %c			%2 = xor i8 %a, %c
	store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)			store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)
	%3 = xor i8 %a, %c			%3 = xor i8 %a, %c
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP5:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0			; AVX-NEXT: [[TMP5:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 1			; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[A]], i32 2			; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[A]], i32 2
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[A]], i32 3			; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[A]], i32 3
	; AVX-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]			; AVX-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]
	; AVX-NEXT: [[TMP10:%.]] = insertelement <4 x i32> [[TMP5]], i32 [[B:%.]], i32 1			; AVX-NEXT: [[TMP10:%.]] = insertelement <4 x i32> [[TMP5]], i32 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[C]], i32 2			; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[C]], i32 2
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[A]], i32 3			; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[A]], i32 3
	; AVX-NEXT: [[TMP13:%.*]] = xor <4 x i32> [[TMP9]], [[TMP12]]			; AVX-NEXT: [[TMP13:%.*]] = xor <4 x i32> [[TMP12]], [[TMP9]]
	; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16			; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%add1 = add i32 %c, %a			%add1 = add i32 %c, %a
	%add2 = add i32 %c, %a			%add2 = add i32 %c, %a
	%add3 = add i32 %a, %c			%add3 = add i32 %a, %c
	%add4 = add i32 %c, %a			%add4 = add i32 %c, %a
	%1 = xor i32 %add1, %a			%1 = xor i32 %add1, %a
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/compare-reduce.ll

	Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; other candidates in the reduction because it does not have matching predicate			; other candidates in the reduction because it does not have matching predicate
	; and/or constant operand.			; and/or constant operand.

	define float @merge_anyof_v4f32_wrong_first(<4 x float> %x) {			define float @merge_anyof_v4f32_wrong_first(<4 x float> %x) {
	; CHECK-LABEL: @merge_anyof_v4f32_wrong_first(			; CHECK-LABEL: @merge_anyof_v4f32_wrong_first(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[X:%.]], i32 3			; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[X:%.]], i32 3
	; CHECK-NEXT: [[CMP3WRONG:%.*]] = fcmp olt float [[TMP1]], 4.200000e+01			; CHECK-NEXT: [[CMP3WRONG:%.*]] = fcmp olt float [[TMP1]], 4.200000e+01
	; CHECK-NEXT: [[TMP2:%.*]] = fcmp ogt <4 x float> [[X]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fcmp ogt <4 x float> [[X]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP2]])			; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4
	; CHECK-NEXT: [[TMP4:%.*]] = or i1 [[TMP3]], [[CMP3WRONG]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp ne i4 [[TMP3]], 0
	; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP4]], float -1.000000e+00, float 1.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = or i1 [[TMP4]], [[CMP3WRONG]]
				; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP5]], float -1.000000e+00, float 1.000000e+00
	; CHECK-NEXT: ret float [[R]]			; CHECK-NEXT: ret float [[R]]
	;			;
	%x0 = extractelement <4 x float> %x, i32 0			%x0 = extractelement <4 x float> %x, i32 0
	%x1 = extractelement <4 x float> %x, i32 1			%x1 = extractelement <4 x float> %x, i32 1
	%x2 = extractelement <4 x float> %x, i32 2			%x2 = extractelement <4 x float> %x, i32 2
	%x3 = extractelement <4 x float> %x, i32 3			%x3 = extractelement <4 x float> %x, i32 3
	%cmp3wrong = fcmp olt float %x3, 42.0			%cmp3wrong = fcmp olt float %x3, 42.0
	%cmp0 = fcmp ogt float %x0, 1.0			%cmp0 = fcmp ogt float %x0, 1.0
	%cmp1 = fcmp ogt float %x1, 1.0			%cmp1 = fcmp ogt float %x1, 1.0
	%cmp2 = fcmp ogt float %x2, 1.0			%cmp2 = fcmp ogt float %x2, 1.0
	%cmp3 = fcmp ogt float %x3, 1.0			%cmp3 = fcmp ogt float %x3, 1.0
	%or03 = or i1 %cmp0, %cmp3wrong			%or03 = or i1 %cmp0, %cmp3wrong
	%or031 = or i1 %or03, %cmp1			%or031 = or i1 %or03, %cmp1
	%or0312 = or i1 %or031, %cmp2			%or0312 = or i1 %or031, %cmp2
	%or03123 = or i1 %or0312, %cmp3			%or03123 = or i1 %or0312, %cmp3
	%r = select i1 %or03123, float -1.0, float 1.0			%r = select i1 %or03123, float -1.0, float 1.0
	ret float %r			ret float %r
	}			}

	define float @merge_anyof_v4f32_wrong_last(<4 x float> %x) {			define float @merge_anyof_v4f32_wrong_last(<4 x float> %x) {
	; CHECK-LABEL: @merge_anyof_v4f32_wrong_last(			; CHECK-LABEL: @merge_anyof_v4f32_wrong_last(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[X:%.]], i32 3			; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[X:%.]], i32 3
	; CHECK-NEXT: [[CMP3WRONG:%.*]] = fcmp olt float [[TMP1]], 4.200000e+01			; CHECK-NEXT: [[CMP3WRONG:%.*]] = fcmp olt float [[TMP1]], 4.200000e+01
	; CHECK-NEXT: [[TMP2:%.*]] = fcmp ogt <4 x float> [[X]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fcmp ogt <4 x float> [[X]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP2]])			; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4
	; CHECK-NEXT: [[TMP4:%.*]] = or i1 [[TMP3]], [[CMP3WRONG]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp ne i4 [[TMP3]], 0
	; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP4]], float -1.000000e+00, float 1.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = or i1 [[TMP4]], [[CMP3WRONG]]
				; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP5]], float -1.000000e+00, float 1.000000e+00
	; CHECK-NEXT: ret float [[R]]			; CHECK-NEXT: ret float [[R]]
	;			;
	%x0 = extractelement <4 x float> %x, i32 0			%x0 = extractelement <4 x float> %x, i32 0
	%x1 = extractelement <4 x float> %x, i32 1			%x1 = extractelement <4 x float> %x, i32 1
	%x2 = extractelement <4 x float> %x, i32 2			%x2 = extractelement <4 x float> %x, i32 2
	%x3 = extractelement <4 x float> %x, i32 3			%x3 = extractelement <4 x float> %x, i32 3
	%cmp3wrong = fcmp olt float %x3, 42.0			%cmp3wrong = fcmp olt float %x3, 42.0
	%cmp0 = fcmp ogt float %x0, 1.0			%cmp0 = fcmp ogt float %x0, 1.0
	%cmp1 = fcmp ogt float %x1, 1.0			%cmp1 = fcmp ogt float %x1, 1.0
	%cmp2 = fcmp ogt float %x2, 1.0			%cmp2 = fcmp ogt float %x2, 1.0
	%cmp3 = fcmp ogt float %x3, 1.0			%cmp3 = fcmp ogt float %x3, 1.0
	%or03 = or i1 %cmp0, %cmp3			%or03 = or i1 %cmp0, %cmp3
	%or031 = or i1 %or03, %cmp1			%or031 = or i1 %or03, %cmp1
	%or0312 = or i1 %or031, %cmp2			%or0312 = or i1 %or031, %cmp2
	%or03123 = or i1 %or0312, %cmp3wrong			%or03123 = or i1 %or0312, %cmp3wrong
	%r = select i1 %or03123, float -1.0, float 1.0			%r = select i1 %or03123, float -1.0, float 1.0
	ret float %r			ret float %r
	}			}

	define i32 @merge_anyof_v4i32_wrong_middle(<4 x i32> %x) {			define i32 @merge_anyof_v4i32_wrong_middle(<4 x i32> %x) {
	; CHECK-LABEL: @merge_anyof_v4i32_wrong_middle(			; CHECK-LABEL: @merge_anyof_v4i32_wrong_middle(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3			; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3
	; CHECK-NEXT: [[CMP3WRONG:%.*]] = icmp slt i32 [[TMP1]], 42			; CHECK-NEXT: [[CMP3WRONG:%.*]] = icmp slt i32 [[TMP1]], 42
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP2]])			; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4
	; CHECK-NEXT: [[TMP4:%.*]] = or i1 [[TMP3]], [[CMP3WRONG]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp ne i4 [[TMP3]], 0
	; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP4]], i32 -1, i32 1			; CHECK-NEXT: [[TMP5:%.*]] = or i1 [[TMP4]], [[CMP3WRONG]]
				; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP5]], i32 -1, i32 1
	; CHECK-NEXT: ret i32 [[R]]			; CHECK-NEXT: ret i32 [[R]]
	;			;
	%x0 = extractelement <4 x i32> %x, i32 0			%x0 = extractelement <4 x i32> %x, i32 0
	%x1 = extractelement <4 x i32> %x, i32 1			%x1 = extractelement <4 x i32> %x, i32 1
	%x2 = extractelement <4 x i32> %x, i32 2			%x2 = extractelement <4 x i32> %x, i32 2
	%x3 = extractelement <4 x i32> %x, i32 3			%x3 = extractelement <4 x i32> %x, i32 3
	%cmp3wrong = icmp slt i32 %x3, 42			%cmp3wrong = icmp slt i32 %x3, 42
	%cmp0 = icmp sgt i32 %x0, 1			%cmp0 = icmp sgt i32 %x0, 1
	%cmp1 = icmp sgt i32 %x1, 1			%cmp1 = icmp sgt i32 %x1, 1
	%cmp2 = icmp sgt i32 %x2, 1			%cmp2 = icmp sgt i32 %x2, 1
	%cmp3 = icmp sgt i32 %x3, 1			%cmp3 = icmp sgt i32 %x3, 1
	%or03 = or i1 %cmp0, %cmp3			%or03 = or i1 %cmp0, %cmp3
	%or033 = or i1 %or03, %cmp3wrong			%or033 = or i1 %or03, %cmp3wrong
	%or0332 = or i1 %or033, %cmp2			%or0332 = or i1 %or033, %cmp2
	%or03321 = or i1 %or0332, %cmp1			%or03321 = or i1 %or0332, %cmp1
	%r = select i1 %or03321, i32 -1, i32 1			%r = select i1 %or03321, i32 -1, i32 1
	ret i32 %r			ret i32 %r
	}			}

	; Operand/predicate swapping allows forming a reduction, but the			; Operand/predicate swapping allows forming a reduction, but the
	; ideal reduction groups all of the original 'sgt' ops together.			; ideal reduction groups all of the original 'sgt' ops together.

	define i32 @merge_anyof_v4i32_wrong_middle_better_rdx(<4 x i32> %x, <4 x i32> %y) {			define i32 @merge_anyof_v4i32_wrong_middle_better_rdx(<4 x i32> %x, <4 x i32> %y) {
	; CHECK-LABEL: @merge_anyof_v4i32_wrong_middle_better_rdx(			; CHECK-LABEL: @merge_anyof_v4i32_wrong_middle_better_rdx(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[Y:%.]], i32 3			; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[X:%.]], i32 3			; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
	; CHECK-NEXT: [[CMP3WRONG:%.*]] = icmp slt i32 [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[X]], [[Y]]			; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP3]])			; CHECK-NEXT: [[Y0:%.]] = extractelement <4 x i32> [[Y:%.]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = or i1 [[TMP4]], [[CMP3WRONG]]			; CHECK-NEXT: [[Y1:%.*]] = extractelement <4 x i32> [[Y]], i32 1
	; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP5]], i32 -1, i32 1			; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i32> [[Y]], i32 2
				; CHECK-NEXT: [[Y3:%.*]] = extractelement <4 x i32> [[Y]], i32 3
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
				; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[X3]], i32 1
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[X2]], i32 2
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[X1]], i32 3
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[Y3]], i32 4
				; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> poison, i32 [[Y0]], i32 0
				; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[Y3]], i32 1
				; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[Y2]], i32 2
				; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[Y1]], i32 3
				; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[X3]], i32 4
				; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt <8 x i32> [[TMP5]], [[TMP10]]
				; CHECK-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i1> [[TMP11]], <8 x i1> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 10>
				; CHECK-NEXT: [[TMP12:%.*]] = bitcast <8 x i1> [[REDUCTION_NORMALIZATION]] to i8
				; CHECK-NEXT: [[TMP13:%.*]] = icmp ne i8 [[TMP12]], 0
				; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP13]], i32 -1, i32 1
	; CHECK-NEXT: ret i32 [[R]]			; CHECK-NEXT: ret i32 [[R]]
	;			;
	%x0 = extractelement <4 x i32> %x, i32 0			%x0 = extractelement <4 x i32> %x, i32 0
	%x1 = extractelement <4 x i32> %x, i32 1			%x1 = extractelement <4 x i32> %x, i32 1
	%x2 = extractelement <4 x i32> %x, i32 2			%x2 = extractelement <4 x i32> %x, i32 2
	%x3 = extractelement <4 x i32> %x, i32 3			%x3 = extractelement <4 x i32> %x, i32 3
	%y0 = extractelement <4 x i32> %y, i32 0			%y0 = extractelement <4 x i32> %y, i32 0
	%y1 = extractelement <4 x i32> %y, i32 1			%y1 = extractelement <4 x i32> %y, i32 1
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @testfunc(			; AVX-LABEL: @testfunc(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: br label [[FOR_BODY:%.*]]			; AVX-NEXT: br label [[FOR_BODY:%.*]]
	; AVX: for.body:			; AVX: for.body:
	; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; AVX-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]			; AVX-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]
	; AVX-NEXT: [[TMP0:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP23:%.]], [[FOR_BODY]] ]			; AVX-NEXT: [[TMP0:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]
	; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]
	; AVX-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4			; AVX-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
	; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]
	; AVX-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4			; AVX-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4
	; AVX-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 1			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP0]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[TMP2]], i32 0			; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP3]], float [[TMP4]], i32 1			; AVX-NEXT: [[TMP4:%.*]] = fadd <2 x float> [[SHUFFLE]], [[TMP3]]
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; AVX-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP0]], zeroinitializer
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[TMP1]], i32 1			; AVX-NEXT: [[TMP6:%.*]] = fadd <2 x float> [[TMP5]], [[TMP4]]
	; AVX-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP5]], [[TMP7]]			; AVX-NEXT: [[TMP7:%.*]] = fcmp olt <2 x float> [[TMP6]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP9:%.*]] = fmul <2 x float> [[TMP0]], zeroinitializer			; AVX-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP6]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP10:%.*]] = fadd <2 x float> [[TMP9]], [[TMP8]]			; AVX-NEXT: [[TMP9:%.*]] = fcmp olt <2 x float> [[TMP8]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP11:%.*]] = fcmp olt <2 x float> [[TMP10]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer
	; AVX-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP10]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP9]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP10]]
	; AVX-NEXT: [[TMP13:%.*]] = fcmp olt <2 x float> [[TMP12]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
	; AVX-NEXT: [[TMP14:%.*]] = fmul <2 x float> [[TMP12]], zeroinitializer			; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
	; AVX-NEXT: [[TMP15:%.*]] = select <2 x i1> [[TMP13]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP14]]			; AVX-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]
	; AVX-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0			; AVX-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
	; AVX-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1			; AVX-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1
	; AVX-NEXT: [[ADD13]] = fadd float [[TMP16]], [[TMP17]]			; AVX-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP18:%.*]] = insertelement <2 x float> poison, float [[TMP17]], i32 0			; AVX-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP19:%.*]] = insertelement <2 x float> [[TMP18]], float [[ADD13]], i32 1			; AVX-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP20:%.*]] = fcmp olt <2 x float> [[TMP19]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]
	; AVX-NEXT: [[TMP21:%.*]] = select <2 x i1> [[TMP20]], <2 x float> [[TMP19]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP22:%.*]] = fcmp olt <2 x float> [[TMP21]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP23]] = select <2 x i1> [[TMP22]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP21]]
	; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32			; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
	; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; AVX: for.end:			; AVX: for.end:
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 28 Lines
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double undef, i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> poison, double [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.*]] = fmul fast <2 x double> [[TMP14]], [[TMP15]]
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x double> [[TMP16]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x double> [[TMP15]], i32 0
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[TMP17]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <2 x double> poison, double [[TMP16]], i32 0
	; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x double> [[TMP16]], i32 1			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x double> [[TMP15]], i32 1
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> [[TMP18]], double [[TMP19]], i32 1			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP17]], double [[TMP18]], i32 1
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	; CHECK: label:			; CHECK: label:
	; CHECK-NEXT: [[TMP21:%.*]] = phi <2 x double> [ [[TMP12]], [[BB1]] ], [ [[TMP20]], [[BB2]] ]			; CHECK-NEXT: [[TMP20:%.*]] = phi <2 x double> [ [[TMP12]], [[BB1]] ], [ [[TMP19]], [[BB2]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%i10 = fdiv fast double %0, %1			%i10 = fdiv fast double %0, %1
	%ix = fmul double %i10, undef			%ix = fmul double %i10, undef
	%ixx0 = fsub double undef, undef			%ixx0 = fsub double undef, undef
	%ixx1 = fsub double undef, undef			%ixx1 = fsub double undef, undef
	%ixx2 = fsub double undef, undef			%ixx2 = fsub double undef, undef
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

	Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	define fastcc void @dct36(double* %inbuf) {			define fastcc void @dct36(double* %inbuf) {
	; CHECK-LABEL: @dct36(			; CHECK-LABEL: @dct36(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double undef, i32 1			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP1]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2			%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2
	%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1			%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1
	%0 = load double, double* %arrayidx44, align 8			%0 = load double, double* %arrayidx44, align 8
	%add46 = fadd double %0, undef			%add46 = fadd double %0, undef
	store double %add46, double* %arrayidx41, align 8			store double %add46, double* %arrayidx41, align 8
	%1 = load double, double* %inbuf, align 8			%1 = load double, double* %inbuf, align 8
	%add49 = fadd double %1, %0			%add49 = fadd double %1, %0
	store double %add49, double* %arrayidx44, align 8			store double %add49, double* %arrayidx44, align 8
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/crash_mandeltext.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines

	define void @zot(%struct.hoge* %arg) {			define void @zot(%struct.hoge* %arg) {
	; CHECK-LABEL: @zot(			; CHECK-LABEL: @zot(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP:%.]] = load double, double undef, align 8
	; CHECK-NEXT: [[TMP2:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP2:%.]] = load double, double undef, align 8
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[TMP]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[TMP]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], undef			; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], poison
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [[STRUCT_HOGE:%.]], %struct.hoge* [[ARG:%.*]], i64 0, i32 1			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [[STRUCT_HOGE:%.]], %struct.hoge* [[ARG:%.*]], i64 0, i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], poison
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], poison
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TMP7]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TMP7]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: br i1 undef, label [[BB11:%.]], label [[BB12:%.]]			; CHECK-NEXT: br i1 undef, label [[BB11:%.]], label [[BB12:%.]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: br label [[BB14:%.*]]			; CHECK-NEXT: br label [[BB14:%.*]]
	; CHECK: bb12:			; CHECK: bb12:
	; CHECK-NEXT: br label [[BB14]]			; CHECK-NEXT: br label [[BB14]]
	; CHECK: bb14:			; CHECK: bb14:
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s

	define i32 @crash_reordering_undefs() {			define i32 @crash_reordering_undefs() {
	; CHECK-LABEL: @crash_reordering_undefs(			; CHECK-LABEL: @crash_reordering_undefs(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[OR0:%.*]] = or i64 undef, undef			; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> poison)
	; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i64 undef, [[OR0]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP0]], undef
	; CHECK-NEXT: [[ADD0:%.*]] = select i1 [[CMP0]], i32 65536, i32 65537			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = add i32 [[OP_EXTRA]], undef
	; CHECK-NEXT: [[ADD1:%.*]] = add i32 undef, [[ADD0]]			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = add i32 [[OP_EXTRA1]], undef
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add i32 [[OP_EXTRA2]], undef
	; CHECK-NEXT: [[ADD2:%.*]] = select i1 [[CMP1]], i32 65536, i32 65537			; CHECK-NEXT: [[OP_EXTRA4:%.*]] = add i32 [[OP_EXTRA3]], undef
	; CHECK-NEXT: [[ADD3:%.*]] = add i32 [[ADD1]], [[ADD2]]			; CHECK-NEXT: ret i32 [[OP_EXTRA4]]
	; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i64 undef, undef
	; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[CMP2]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD5:%.*]] = add i32 [[ADD3]], [[ADD4]]
	; CHECK-NEXT: [[ADD6:%.*]] = add i32 [[ADD5]], undef
	; CHECK-NEXT: [[ADD7:%.*]] = add i32 [[ADD6]], undef
	; CHECK-NEXT: [[ADD8:%.*]] = add i32 [[ADD7]], undef
	; CHECK-NEXT: [[OR1:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i64 undef, [[OR1]]
	; CHECK-NEXT: [[ADD9:%.*]] = select i1 [[CMP3]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD10:%.*]] = add i32 [[ADD8]], [[ADD9]]
	; CHECK-NEXT: [[ADD11:%.*]] = add i32 [[ADD10]], undef
	; CHECK-NEXT: ret i32 [[ADD11]]
	;			;
	entry:			entry:
	%or0 = or i64 undef, undef			%or0 = or i64 undef, undef
	%cmp0 = icmp eq i64 undef, %or0			%cmp0 = icmp eq i64 undef, %or0
	%add0 = select i1 %cmp0, i32 65536, i32 65537			%add0 = select i1 %cmp0, i32 65536, i32 65537
	%add1 = add i32 undef, %add0			%add1 = add i32 undef, %add0
	%cmp1 = icmp eq i64 undef, undef			%cmp1 = icmp eq i64 undef, undef
	%add2 = select i1 %cmp1, i32 65536, i32 65537			%add2 = select i1 %cmp1, i32 65536, i32 65537
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show All 25 Lines
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY42_LR_PH_US:%.]], label [[_Z5CLAMPD_EXIT_1:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY42_LR_PH_US:%.]], label [[_Z5CLAMPD_EXIT_1:%.]]
	; CHECK: cond.false51.us:			; CHECK: cond.false51.us:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cond.true48.us:			; CHECK: cond.true48.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]
	; CHECK: cond.false66.us:			; CHECK: cond.false66.us:
	; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef			; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[ADD_I276_US]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[ADD_I276_US]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double undef, i32 1			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> [[TMP0]], <double 0.000000e+00, double 0xBFA5CC2D1960285F>
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 0.000000e+00, double 0xBFA5CC2D1960285F>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.400000e+02, double 1.400000e+02>
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 1.400000e+02, double 1.400000e+02>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 5.000000e+01, double 5.200000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], <double 5.000000e+01, double 5.200000e+01>			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> poison, [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> undef, [[TMP2]]			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP5]], align 8
				; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP6]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cond.true63.us:			; CHECK: cond.true63.us:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: for.body42.lr.ph.us:			; CHECK: for.body42.lr.ph.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]
	; CHECK: _Z5clampd.exit.1:			; CHECK: _Z5clampd.exit.1:
	; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]			; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]
	;			;
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }			%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }
	%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }			%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }

	define void @_Z8radianceRK3RayiPt() #0 {			define void @_Z8radianceRK3RayiPt() #0 {
	; CHECK-LABEL: @_Z8radianceRK3RayiPt(			; CHECK-LABEL: @_Z8radianceRK3RayiPt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]
	; CHECK: if.then38:			; CHECK: if.then38:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double undef, double poison>, double undef, i32 1			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double poison, i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> undef, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> poison, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> undef, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> poison, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> poison, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> poison, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> poison, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> undef, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> poison, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> undef, [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> poison, [[TMP6]]
	; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; CHECK-NEXT: br label [[RETURN:%.*]]			; CHECK-NEXT: br label [[RETURN:%.*]]
	; CHECK: if.then78:			; CHECK: if.then78:
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll

	Show All 19 Lines
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[TMP0:%.]], %0* undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[TMP0:%.]], %0* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1
	; CHECK-NEXT: br label [[TMP7:%.*]]			; CHECK-NEXT: br label [[TMP7:%.*]]
	; CHECK: [[TMP8:%.]] = phi <2 x double> [ <double 1.800000e+01, double 2.800000e+01>, [[TMP0]] ], [ [[TMP11:%.]], [[TMP21:%.]] ], [ [[TMP11]], [[TMP18:%.]] ], [ [[TMP11]], [[TMP18]] ]			; CHECK: 7:
				; CHECK-NEXT: [[TMP8:%.]] = phi <2 x double> [ <double 1.800000e+01, double 2.800000e+01>, [[TMP0]] ], [ [[TMP11:%.]], [[TMP21:%.]] ], [ [[TMP11]], [[TMP18:%.]] ], [ [[TMP11]], [[TMP18]] ]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP1]] to <2 x double>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP1]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP9]], align 8			; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP9]], align 8
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[TMP3]] to <2 x double>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[TMP3]] to <2 x double>*
	; CHECK-NEXT: [[TMP11]] = load <2 x double>, <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: [[TMP11]] = load <2 x double>, <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: br i1 undef, label [[TMP12:%.]], label [[TMP13:%.]]			; CHECK-NEXT: br i1 undef, label [[TMP12:%.]], label [[TMP13:%.]]
	; CHECK: ret void			; CHECK: 12:
	; CHECK: [[TMP14:%.]] = bitcast double [[TMP5]] to <2 x double>*			; CHECK-NEXT: ret void
				; CHECK: 13:
				; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[TMP5]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: br i1 undef, label [[TMP15:%.]], label [[TMP16:%.]]			; CHECK-NEXT: br i1 undef, label [[TMP15:%.]], label [[TMP16:%.]]
	; CHECK: br label [[TMP16]]			; CHECK: 15:
	; CHECK: br i1 undef, label [[TMP17:%.*]], label [[TMP18]]			; CHECK-NEXT: br label [[TMP16]]
	; CHECK: unreachable			; CHECK: 16:
	; CHECK: [[TMP19:%.*]] = extractelement <2 x double> [[TMP11]], i32 0			; CHECK-NEXT: br i1 undef, label [[TMP17:%.*]], label [[TMP18]]
				; CHECK: 17:
				; CHECK-NEXT: unreachable
				; CHECK: 18:
				; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x double> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x double> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x double> [[TMP11]], i32 1
	; CHECK-NEXT: switch i32 undef, label [[TMP21]] [			; CHECK-NEXT: switch i32 undef, label [[TMP21]] [
	; CHECK-NEXT: i32 32, label [[TMP7]]			; CHECK-NEXT: i32 32, label [[TMP7]]
	; CHECK-NEXT: i32 103, label [[TMP7]]			; CHECK-NEXT: i32 103, label [[TMP7]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: br i1 undef, label [[TMP7]], label [[TMP22:%.*]]			; CHECK: 21:
	; CHECK: unreachable			; CHECK-NEXT: br i1 undef, label [[TMP7]], label [[TMP22:%.*]]
				; CHECK: 22:
				; CHECK-NEXT: unreachable
	;			;
	%1 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0			%1 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0
	%2 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1			%2 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1
	%3 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0			%3 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0
	%4 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1			%4 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1
	%5 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0			%5 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0
	%6 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1			%6 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1
	br label %7			br label %7
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 12 Lines

	define i32 @test(double* nocapture %G) {			define i32 @test(double* nocapture %G) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[SHUFFLE]], <double 4.000000e+00, double 3.000000e+00, double 4.000000e+00, double poison>
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 2>
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[SHUFFLE1]], <double 1.000000e+00, double 6.000000e+00, double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[MUL11]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <4 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: store <4 x double> [[TMP3]], <4 x double>* [[TMP4]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %G, i64 5			%arrayidx = getelementptr inbounds double, double* %G, i64 5
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	%add = fadd double %mul, 1.000000e+00			%add = fadd double %mul, 1.000000e+00
	store double %add, double* %G, align 8			store double %add, double* %G, align 8
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> [[TMP19]], double [[TMP18]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> [[TMP19]], double [[TMP18]], i32 1
	; CHECK-NEXT: [[TMP21:%.*]] = fadd <2 x double> [[TMP20]], <double 7.000000e+00, double 8.000000e+00>			; CHECK-NEXT: [[TMP21:%.*]] = fadd <2 x double> [[TMP20]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[G]], i64 3			; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP15]] to <2 x double>*			; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP15]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP21]], <2 x double>* [[TMP23]], align 8			; CHECK-NEXT: store <2 x double> [[TMP21]], <2 x double>* [[TMP23]], align 8
	; CHECK-NEXT: br label [[TMP24]]			; CHECK-NEXT: br label [[TMP24]]
	; CHECK: 24:			; CHECK: 24:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
				RKSimonUnsubmitted Not Done Reply Inline Actions The test2 changes look superfluous (maybe precommit them?). RKSimon: The test2 changes look superfluous (maybe precommit them?).
				ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, will do this later ABataev: Yes, will do this later
	;			;
	%1 = icmp eq i32 %k, 0			%1 = icmp eq i32 %k, 0
	%2 = getelementptr inbounds double, double* %G, i64 5			%2 = getelementptr inbounds double, double* %G, i64 5
	%3 = load double, double* %2, align 8			%3 = load double, double* %2, align 8
	%4 = fmul double %3, 4.000000e+00			%4 = fmul double %3, 4.000000e+00
	br i1 %1, label %12, label %5			br i1 %1, label %12, label %5

	; <label>:5 ; preds = %0			; <label>:5 ; preds = %0
	▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

define void @fextr2(double* %ptr) {		define void @fextr2(double* %ptr) {
; CHECK-LABEL: @fextr2(		; CHECK-LABEL: @fextr2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32		; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32
; CHECK-NEXT: [[V0:%.*]] = extractelement <4 x double> [[LD]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[LD]], <4 x double> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[V1:%.*]] = extractelement <4 x double> [[LD]], i32 1
; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0		; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0]], i32 0		; CHECK-NEXT: [[TMP0:%.*]] = fadd <2 x double> [[SHUFFLE]], <double 5.500000e+00, double 6.600000e+00>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1]], i32 1		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[P0]] to <2 x double>*
; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 5.500000e+00, double 6.600000e+00>		; CHECK-NEXT: store <2 x double> [[TMP0]], <2 x double>* [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[P0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%LD = load <4 x double>, <4 x double>* undef		%LD = load <4 x double>, <4 x double>* undef
%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.		%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.
%V1 = extractelement <4 x double> %LD, i32 1		%V1 = extractelement <4 x double> %LD, i32 1
%P0 = getelementptr inbounds double, double* %ptr, i64 0		%P0 = getelementptr inbounds double, double* %ptr, i64 0
%P1 = getelementptr inbounds double, double* %ptr, i64 1		%P1 = getelementptr inbounds double, double* %ptr, i64 1
%A0 = fadd double %V0, 5.5		%A0 = fadd double %V0, 5.5
%A1 = fadd double %V1, 6.6		%A1 = fadd double %V1, 6.6
store double %A0, double* %P0, align 4		store double %A0, double* %P0, align 4
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show All 40 Lines
	; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1			; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1
	; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X0]]			; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X0]]
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: store float [[ADD]], float* @a, align 4			; CHECK-NEXT: store float [[ADD]], float* @a, align 4
	; CHECK-NEXT: ret float [[X0]]			; CHECK-NEXT: ret float [[X0]]
	;			;
	; THRESH1-LABEL: @f_used_out_of_tree(			; THRESH1-LABEL: @f_used_out_of_tree(
	; THRESH1-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0			; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 0
	; THRESH1-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1			; THRESH1-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[X]], [[X]]
	; THRESH1-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X0]]			; THRESH1-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH1-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; THRESH1-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH1-NEXT: store float [[ADD]], float* @a, align 4			; THRESH1-NEXT: store float [[ADD]], float* @a, align 4
	; THRESH1-NEXT: ret float [[X0]]			; THRESH1-NEXT: ret float [[TMP1]]
	;			;
	; THRESH2-LABEL: @f_used_out_of_tree(			; THRESH2-LABEL: @f_used_out_of_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 0			; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 0
	; THRESH2-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[X]], [[X]]			; THRESH2-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[X]], [[X]]
	; THRESH2-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0			; THRESH2-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH2-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			; THRESH2-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH2-NEXT: store float [[ADD]], float* @a, align 4			; THRESH2-NEXT: store float [[ADD]], float* @a, align 4
	Show All 16 Lines
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f64_8i8() #0 {		define void @fptosi_8f64_8i8() #0 {
; CHECK-LABEL: @fptosi_8f64_8i8(		; SSE-LABEL: @fptosi_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptosi_8f64_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
		; AVX256NODQ-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
		; AVX256NODQ-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
		; AVX256NODQ-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
		; AVX256NODQ-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
		; AVX256NODQ-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
		; AVX256NODQ-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f64_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f64_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f32_8i8() #0 {		define void @fptosi_8f32_8i8() #0 {
; CHECK-LABEL: @fptosi_8f32_8i8(		; SSE-LABEL: @fptosi_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		; SSE-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		; SSE-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		; SSE-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4		; SSE-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f32_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f32_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f64_8i8() #0 {		define void @fptosi_8f64_8i8() #0 {
; CHECK-LABEL: @fptosi_8f64_8i8(		; SSE-LABEL: @fptosi_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptosi_8f64_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
		; AVX256NODQ-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
		; AVX256NODQ-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
		; AVX256NODQ-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
		; AVX256NODQ-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
		; AVX256NODQ-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
		; AVX256NODQ-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f64_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f64_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f32_8i8() #0 {		define void @fptosi_8f32_8i8() #0 {
; CHECK-LABEL: @fptosi_8f32_8i8(		; SSE-LABEL: @fptosi_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		; SSE-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		; SSE-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		; SSE-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4		; SSE-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f32_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f32_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

Show First 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f64_8i8() #0 {		define void @fptoui_8f64_8i8() #0 {
; CHECK-LABEL: @fptoui_8f64_8i8(		; SSE-LABEL: @fptoui_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptoui_8f64_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
		; AVX256NODQ-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
		; AVX256NODQ-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
		; AVX256NODQ-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
		; AVX256NODQ-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
		; AVX256NODQ-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
		; AVX256NODQ-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX-LABEL: @fptoui_8f64_8i8(
		; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
		; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f32_8i8() #0 {		define void @fptoui_8f32_8i8() #0 {
; CHECK-LABEL: @fptoui_8f32_8i8(		; SSE-LABEL: @fptoui_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		; SSE-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		; SSE-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		; SSE-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4		; SSE-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX-LABEL: @fptoui_8f32_8i8(
		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i8>
		; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=haswell < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=haswell < %s -slp-min-non-power2-values-size=2 \| FileCheck %s
	@e = dso_local local_unnamed_addr global i32 0, align 4			@e = dso_local local_unnamed_addr global i32 0, align 4
	@f = dso_local local_unnamed_addr global i32 0, align 4			@f = dso_local local_unnamed_addr global i32 0, align 4

	; Function Attrs: nofree norecurse nounwind uwtable			; Function Attrs: nofree norecurse nounwind uwtable
	define dso_local i32 @g() local_unnamed_addr {			define dso_local i32 @g() local_unnamed_addr {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @e, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @e, align 4
	; CHECK-NEXT: [[TOBOOL_NOT19:%.*]] = icmp eq i32 [[TMP0]], 0			; CHECK-NEXT: [[TOBOOL_NOT19:%.*]] = icmp eq i32 [[TMP0]], 0
	; CHECK-NEXT: br i1 [[TOBOOL_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY:%.]]			; CHECK-NEXT: br i1 [[TOBOOL_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY:%.]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[C_022:%.]] = phi i32 [ [[C_022_BE:%.]], [[WHILE_BODY_BACKEDGE:%.]] ], [ undef, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <4 x i32> [ [[TMP16:%.]], [[WHILE_BODY_BACKEDGE:%.]] ], [ poison, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP14:%.*]], [[WHILE_BODY_BACKEDGE]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP1]], i32 0
	; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 1			; CHECK-NEXT: [[TMP3:%.]] = ptrtoint i32 [[TMP2]] to i64
	; CHECK-NEXT: [[TMP2:%.]] = ptrtoint i32 [[C_022]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32			; CHECK-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP1]], <4 x i64> <i64 1, i64 1, i64 1, i64 poison>
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 1, i64 1>			; CHECK-NEXT: switch i32 [[TMP4]], label [[WHILE_BODY_BACKEDGE]] [
	; CHECK-NEXT: switch i32 [[TMP3]], label [[WHILE_BODY_BACKEDGE]] [
	; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]			; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]
	; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]			; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: sw.bb:			; CHECK: sw.bb:
	; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = extractelement <4 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP6:%.]] = ptrtoint i32 [[TMP5]] to i64			; CHECK-NEXT: [[TMP7:%.]] = ptrtoint i32 [[TMP6]] to i64
	; CHECK-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[TMP7]] to i32
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[TMP9:%.]] = extractelement <4 x i32> [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP9:%.]] = extractelement <2 x i32> [[TMP4]], i32 1			; CHECK-NEXT: store i32 [[TMP8]], i32* [[TMP9]], align 4
	; CHECK-NEXT: store i32 [[TMP7]], i32* [[TMP9]], align 4			; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, <4 x i32> [[TMP1]], <4 x i64> <i64 2, i64 2, i64 2, i64 poison>
	; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: sw.bb6:			; CHECK: sw.bb6:
	; CHECK-NEXT: [[INCDEC_PTR8:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2			; CHECK-NEXT: [[TMP11:%.]] = extractelement <4 x i32> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP10:%.]] = ptrtoint i32 [[INCDEC_PTR]] to i64			; CHECK-NEXT: [[TMP12:%.]] = ptrtoint i32 [[TMP11]] to i64
	; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP10]] to i32			; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[TMP12]] to i32
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[TMP14:%.]] = getelementptr i32, <4 x i32> [[TMP1]], <4 x i64> <i64 2, i64 2, i64 2, i64 poison>
	; CHECK-NEXT: [[TMP13:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP15:%.]] = extractelement <4 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4			; CHECK-NEXT: store i32 [[TMP13]], i32* [[TMP15]], align 4
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: while.body.backedge:			; CHECK: while.body.backedge:
	; CHECK-NEXT: [[C_022_BE]] = phi i32* [ [[INCDEC_PTR]], [[WHILE_BODY]] ], [ [[INCDEC_PTR8]], [[SW_BB6]] ], [ [[INCDEC_PTR5]], [[SW_BB]] ]			; CHECK-NEXT: [[TMP16]] = phi <4 x i32*> [ [[TMP5]], [[WHILE_BODY]] ], [ [[TMP14]], [[SW_BB6]] ], [ [[TMP10]], [[SW_BB]] ]
	; CHECK-NEXT: [[TMP14]] = phi <2 x i32*> [ [[TMP4]], [[WHILE_BODY]] ], [ [[TMP12]], [[SW_BB6]] ], [ [[TMP8]], [[SW_BB]] ]
	; CHECK-NEXT: br label [[WHILE_BODY]]			; CHECK-NEXT: br label [[WHILE_BODY]]
	; CHECK: while.end:			; CHECK: while.end:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* @e, align 4			%0 = load i32, i32* @e, align 4
	%tobool.not19 = icmp eq i32 %0, 0			%tobool.not19 = icmp eq i32 %0, 0
	br i1 %tobool.not19, label %while.end, label %while.body			br i1 %tobool.not19, label %while.end, label %while.body
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hoist.ll

	Show All 10 Lines
	; A[i+2] += n;			; A[i+2] += n;
	; A[i+3] += k;			; A[i+3] += k;
	; }			; }
	;}			;}

	define i32 @foo(i32* nocapture %A, i32 %n, i32 %k) {			define i32 @foo(i32* nocapture %A, i32 %n, i32 %k) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[N:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[N:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[K:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> [[TMP0]], i32 [[K:%.]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_024:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD10:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_024:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD10:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[I_024]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[I_024]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

	Show First 20 Lines • Show All 750 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i32 [[TMP24]], i32* @var, align 8			; SSE-NEXT: store i32 [[TMP24]], i32* @var, align 8
	; SSE-NEXT: ret i32 [[TMP23]]			; SSE-NEXT: ret i32 [[TMP23]]
	;			;
	; AVX-LABEL: @maxi8_mutiple_uses(			; AVX-LABEL: @maxi8_mutiple_uses(
	; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]			; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
	; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]			; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
	; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; AVX-NEXT: [[TMP6:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
	; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; AVX-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; AVX-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
	; AVX-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP7]]			; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP7]], [[TMP5]]
	; AVX-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP7]]			; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP7]], i32 [[TMP5]]
	; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP10]], [[TMP5]]			; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP4]], i32 3, i32 4
	; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP10]], i32 [[TMP5]]			; AVX-NEXT: store i32 [[TMP8]], i32* @var, align 8
	; AVX-NEXT: [[TMP11:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; AVX-NEXT: ret i32 [[OP_EXTRA1]]
	; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[OP_EXTRA1]], [[TMP11]]
	; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[OP_EXTRA1]], i32 [[TMP11]]
	; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP4]], i32 3, i32 4
	; AVX-NEXT: store i32 [[TMP14]], i32* @var, align 8
	; AVX-NEXT: ret i32 [[TMP13]]
	;			;
	; THRESH-LABEL: @maxi8_mutiple_uses(			; THRESH-LABEL: @maxi8_mutiple_uses(
	; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0			; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1			; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
	; THRESH-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
	; THRESH-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; THRESH-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP3]], i32 [[TMP4]]
	; THRESH-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])			; THRESH-NEXT: [[TMP7:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
	; THRESH-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0			; THRESH-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP7]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; THRESH-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[TMP3]], i32 1			; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
	; THRESH-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> poison, i32 [[TMP6]], i32 0			; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP8]], [[TMP6]]
	; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1			; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP8]], i32 [[TMP6]]
	; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP9]], [[TMP11]]			; THRESH-NEXT: [[TMP9:%.*]] = select i1 [[TMP5]], i32 3, i32 4
	; THRESH-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP9]], <2 x i32> [[TMP11]]			; THRESH-NEXT: store i32 [[TMP9]], i32* @var, align 8
	; THRESH-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1			; THRESH-NEXT: ret i32 [[OP_EXTRA1]]
	; THRESH-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
	; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP15]], [[TMP14]]
	; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP15]], i32 [[TMP14]]
	; THRESH-NEXT: [[TMP16:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; THRESH-NEXT: [[TMP17:%.*]] = icmp sgt i32 [[OP_EXTRA1]], [[TMP16]]
	; THRESH-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], i32 [[OP_EXTRA1]], i32 [[TMP16]]
	; THRESH-NEXT: [[TMP19:%.*]] = extractelement <2 x i1> [[TMP12]], i32 1
	; THRESH-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 3, i32 4
	; THRESH-NEXT: store i32 [[TMP20]], i32* @var, align 8
	; THRESH-NEXT: ret i32 [[TMP18]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = icmp sgt i32 %2, %3			%4 = icmp sgt i32 %2, %3
	%5 = select i1 %4, i32 %2, i32 %3			%5 = select i1 %4, i32 %2, i32 %3
	%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	%7 = icmp sgt i32 %5, %6			%7 = icmp sgt i32 %5, %6
	%8 = select i1 %7, i32 %5, i32 %6			%8 = select i1 %7, i32 %5, i32 %6
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	;			;
	; AVX-LABEL: @maxi8_wrong_parent(			; AVX-LABEL: @maxi8_wrong_parent(
	; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]			; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
	; AVX-NEXT: br label [[PP:%.*]]			; AVX-NEXT: br label [[PP:%.*]]
	; AVX: pp:			; AVX: pp:
	; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]			; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
	; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; AVX-NEXT: [[TMP6:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
	; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; AVX-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; AVX-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
	; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP7]], [[TMP5]]
	; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP7]], i32 [[TMP5]]
	; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
	; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
	; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
	; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
	; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]
	; AVX-NEXT: ret i32 [[OP_EXTRA1]]			; AVX-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	; THRESH-LABEL: @maxi8_wrong_parent(			; THRESH-LABEL: @maxi8_wrong_parent(
	; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0			; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1			; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
	; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]			; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
	; THRESH-NEXT: br label [[PP:%.*]]			; THRESH-NEXT: br label [[PP:%.*]]
	; THRESH: pp:			; THRESH: pp:
	; THRESH-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; THRESH-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP3]], i32 [[TMP4]]
	; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; THRESH-NEXT: [[TMP7:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
	; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; THRESH-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP7]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
	; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP8]], [[TMP6]]
	; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP8]], i32 [[TMP6]]
	; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
	; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> poison, i1 [[TMP12]], i32 0
	; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1
	; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0
	; THRESH-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1
	; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
	; THRESH-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP4]], i32 1
	; THRESH-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]
	; THRESH-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1
	; THRESH-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0
	; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP21]], [[TMP20]]
	; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP21]], i32 [[TMP20]]
	; THRESH-NEXT: ret i32 [[OP_EXTRA1]]			; THRESH-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = icmp sgt i32 %2, %3			%4 = icmp sgt i32 %2, %3
	br label %pp			br label %pp

	pp:			pp:
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE5:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[SHUFFLE6:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE7:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <2 x i32> [[SHUFFLE5]], zeroinitializer
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[SHRINK_SHUFFLE]], zeroinitializer
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP1]], <2 x float> [[SHUFFLE6]], <2 x float> [[SHUFFLE7]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[SHUFFLE1]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE4:%.*]] = shufflevector <4 x float> [[SHUFFLE3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> [[SHRINK_SHUFFLE4]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer		; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[TMP5]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1		; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP6]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0		; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> poison, float [[TMP7]], i32 2
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1		; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0		; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP8]], i32 3
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[TMP17]], i32 0
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP18]], i32 1
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP16]], i32 0
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> poison, float [[TMP19]], i32 2
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP16]], i32 1
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP20]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]		; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE5:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[SHUFFLE6:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE7:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <2 x i32> [[SHUFFLE5]], zeroinitializer
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[SHRINK_SHUFFLE]], zeroinitializer
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP1]], <2 x float> [[SHUFFLE6]], <2 x float> [[SHUFFLE7]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[SHUFFLE1]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE4:%.*]] = shufflevector <4 x float> [[SHUFFLE3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> [[SHRINK_SHUFFLE4]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer		; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[TMP5]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1		; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP6]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0		; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> undef, float [[TMP7]], i32 2
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1		; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0		; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP8]], i32 3
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[TMP17]], i32 0
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP18]], i32 1
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP16]], i32 0
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> undef, float [[TMP19]], i32 2
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP16]], i32 1
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP20]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]		; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
Show All 14 Lines	;
%ra = insertelement <4 x float> undef, float %s0, i32 0		%ra = insertelement <4 x float> undef, float %s0, i32 0
%rb = insertelement <4 x float> %ra, float %s1, i32 1		%rb = insertelement <4 x float> %ra, float %s1, i32 1
%rc = insertelement <4 x float> undef, float %s2, i32 2		%rc = insertelement <4 x float> undef, float %s2, i32 2
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Make sure infinite loop doesn't happen which I ran into when trying		; Make sure infinite loop doesn't happen which I ran into when trying
; to do this backwards this backwards		; to do this backwards this backwards
		RKSimonUnsubmitted Not Done Reply Inline Actions There isn't a ANY check-prefix atm (it was cleaned out in rG119e4550ddedc75e4 as part of the unused prefix cleanup) - please can you review? RKSimon: There isn't a ANY check-prefix atm (it was cleaned out in rG119e4550ddedc75e4 as part of the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, need to remove it, I think. Most probably, caused but not quite clean merge. ABataev: Yes, need to remove it, I think. Most probably, caused but not quite clean merge.
define <4 x i32> @reconstruct(<4 x i32> %c) #0 {		define <4 x i32> @reconstruct(<4 x i32> %c) #0 {
; CHECK-LABEL: @reconstruct(		; CHECK-LABEL: @reconstruct(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x i32> undef, i32 [[C0]], i32 0		; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x i32> undef, i32 [[C0]], i32 0
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x i32> [[RA]], i32 [[C1]], i32 1		; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x i32> [[RA]], i32 [[C1]], i32 1
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> <i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32), i32 8>, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 ptrtoint (i32 () @fn1 to i32), i32 1			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> <i32 6, i32 0, i32 0, i32 0>
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <4 x i32> [[TMP4]], i32 ptrtoint (i32 () @fn1 to i32), i32 2			; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 8, i32 3
	; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP6]], <4 x i32> <i32 6, i32 0, i32 0, i32 0>
	; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4			%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	%cond = select i1 %cmp, i32 8, i32 0			%cond = select i1 %cmp, i32 8, i32 0
	store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4			store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4
	%1 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 1), align 4			%1 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 1), align 4
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX65:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX65:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX66:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX66:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP26:%.]], <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP26:%.]], <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = phi <4 x i32> [ undef, [[ENTRY]] ], [ [[TMP26]], [[FOR_INC]] ]			; CHECK-NEXT: [[TMP2:%.*]] = phi <4 x i32> [ poison, [[ENTRY]] ], [ [[TMP26]], [[FOR_INC]] ]
	; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]			; CHECK-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0		; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1		; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP1]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0
; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0		; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 1
; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP4]], i32 1		; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP4]], i32 1
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[X2]], i32 2		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 2
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3		; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[TMP5]], i32 2
		; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[I3]]		; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0		%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%x0 = load float, float* %gep0		%x0 = load float, float* %gep0
%x1 = load float, float* %gep1		%x1 = load float, float* %gep1
%x2 = load float, float* %gep2		%x2 = load float, float* %gep2
Show All 36 Lines
; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16		; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16
; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*		; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*
; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8		; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8
; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32		; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float		; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> poison, float [[T6]], i32 0		; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> poison, float [[T6]], i32 0
; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32		; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[T8]], i32 0
; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[T4]], i32 1
; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i32>
; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32		; CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <2 x float>
; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>
; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0
; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3		; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[TMP5]], i32 1
		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 1
		; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[TMP6]], i32 2
		; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[TMP6]], i32 3
; CHECK-NEXT: ret <4 x float> [[T15]]		; CHECK-NEXT: ret <4 x float> [[T15]]
;		;
%t0 = bitcast <4 x float>* %x to i64*		%t0 = bitcast <4 x float>* %x to i64*
%t1 = load i64, i64* %t0, align 16		%t1 = load i64, i64* %t0, align 16
%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%t3 = bitcast float* %t2 to i64*		%t3 = bitcast float* %t2 to i64*
%t4 = load i64, i64* %t3, align 8		%t4 = load i64, i64* %t3, align 8
%t5 = trunc i64 %t1 to i32		%t5 = trunc i64 %t1 to i32
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0		; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1		; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP1]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0
; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0		; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 1
; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP4]], i32 1		; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP4]], i32 1
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[X2]], i32 2		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 2
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3		; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[TMP5]], i32 2
		; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[I3]]		; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0		%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%x0 = load float, float* %gep0		%x0 = load float, float* %gep0
%x1 = load float, float* %gep1		%x1 = load float, float* %gep1
%x2 = load float, float* %gep2		%x2 = load float, float* %gep2
Show All 36 Lines
; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16		; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16
; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*		; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*
; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8		; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8
; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32		; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float		; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> undef, float [[T6]], i32 0		; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> undef, float [[T6]], i32 0
; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32		; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[T8]], i32 0
; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[T4]], i32 1
; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i32>
; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32		; CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <2 x float>
; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>
; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0
; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3		; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[TMP5]], i32 1
		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 1
		; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[TMP6]], i32 2
		; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[TMP6]], i32 3
; CHECK-NEXT: ret <4 x float> [[T15]]		; CHECK-NEXT: ret <4 x float> [[T15]]
;		;
%t0 = bitcast <4 x float>* %x to i64*		%t0 = bitcast <4 x float>* %x to i64*
%t1 = load i64, i64* %t0, align 16		%t1 = load i64, i64* %t0, align 16
%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%t3 = bitcast float* %t2 to i64*		%t3 = bitcast float* %t2 to i64*
%t4 = load i64, i64* %t3, align 8		%t4 = load i64, i64* %t3, align 8
%t5 = trunc i64 %t1 to i32		%t5 = trunc i64 %t1 to i32
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

	Show All 31 Lines
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[IDX6:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 6			; CHECK-NEXT: [[IDX6:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 6
	; CHECK-NEXT: [[IDX7:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 7			; CHECK-NEXT: [[IDX7:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP13]], [[TMP10]]			; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP10]], [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8			; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	Show All 37 Lines
	; S[0] S[1]			; S[0] S[1]
	;			;
	; SLP should reorder the operands of the RHS add taking into consideration the cost of external uses.			; SLP should reorder the operands of the RHS add taking into consideration the cost of external uses.
	; It is more profitable to reorder the operands of the RHS add, because A[1] has an external use.			; It is more profitable to reorder the operands of the RHS add, because A[1] has an external use.

	define void @lookahead_external_uses(double* %A, double %B, double %C, double %D, double %S, double %Ext1, double %Ext2) {			define void @lookahead_external_uses(double* %A, double %B, double %C, double %D, double %S, double %Ext1, double %Ext2) {
	; CHECK-LABEL: @lookahead_external_uses(			; CHECK-LABEL: @lookahead_external_uses(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0			; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0
	; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0			; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0
	; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0			; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0
	; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1			; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
	; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2			; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double* [[A]], i32 0			; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double* [[A]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr double, <2 x double> [[TMP1]], <2 x i64> <i64 0, i64 2>
	; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1			; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1
				; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8
	; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8			; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8
	; CHECK-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*
				; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8			; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8
	; CHECK-NEXT: [[TMP3:%.]] = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double> [[TMP2]], i32 8, <2 x i1> <i1 true, i1 true>, <2 x double> undef)			; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fsub fast <2 x double> [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = fsub fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = fadd fast <2 x double> [[TMP11]], [[TMP10]]			; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]
				; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
	; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1			; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
	; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP12]], <2 x double>* [[TMP13]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: store double [[A1]], double* [[EXT1:%.*]], align 8			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
				; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%IdxA0 = getelementptr inbounds double, double* %A, i64 0			%IdxA0 = getelementptr inbounds double, double* %A, i64 0
	%IdxB0 = getelementptr inbounds double, double* %B, i64 0			%IdxB0 = getelementptr inbounds double, double* %B, i64 0
	%IdxC0 = getelementptr inbounds double, double* %C, i64 0			%IdxC0 = getelementptr inbounds double, double* %C, i64 0
	%IdxD0 = getelementptr inbounds double, double* %D, i64 0			%IdxD0 = getelementptr inbounds double, double* %D, i64 0

	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; over A[1] (with 3 external users).			; over A[1] (with 3 external users).
	; The result is that the operands are of the Add not reordered and the loads			; The result is that the operands are of the Add not reordered and the loads
	; from A get vectorized instead of the loads from B.			; from A get vectorized instead of the loads from B.
	;			;
	define void @lookahead_limit_users_budget(double* %A, double %B, double %C, double %D, double %S, double %Ext1, double %Ext2, double %Ext3, double %Ext4, double *%Ext5) {			define void @lookahead_limit_users_budget(double* %A, double %B, double %C, double %D, double %S, double %Ext1, double %Ext2, double %Ext3, double %Ext4, double *%Ext5) {
	; CHECK-LABEL: @lookahead_limit_users_budget(			; CHECK-LABEL: @lookahead_limit_users_budget(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0			; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
				; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0
	; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0			; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0
	; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0			; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0
	; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1			; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double* [[B:%.*]], i32 0			; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double* [[B]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr double, <2 x double> [[TMP1]], <2 x i64> <i64 0, i64 2>
	; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2			; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2
	; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1			; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1
				; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8
	; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8			; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[IDXA0]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP5:%.]] = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double> [[TMP2]], i32 8, <2 x i1> <i1 true, i1 true>, <2 x double> undef)			; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8
	; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8			; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8
	; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8			; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8
	; CHECK-NEXT: [[TMP6:%.*]] = fsub fast <2 x double> [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[A2]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP9]], double [[B1]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = fsub fast <2 x double> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP11]]			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1
				; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]
				; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
	; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1			; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
	; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP12]], <2 x double>* [[TMP13]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x double> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: store double [[TMP14]], double* [[EXT1:%.*]], align 8			; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8
	; CHECK-NEXT: store double [[TMP14]], double* [[EXT2:%.*]], align 8			; CHECK-NEXT: store double [[TMP12]], double* [[EXT2:%.*]], align 8
	; CHECK-NEXT: store double [[TMP14]], double* [[EXT3:%.*]], align 8			; CHECK-NEXT: store double [[TMP12]], double* [[EXT3:%.*]], align 8
	; CHECK-NEXT: store double [[B1]], double* [[EXT4:%.*]], align 8			; CHECK-NEXT: store double [[B1]], double* [[EXT4:%.*]], align 8
	; CHECK-NEXT: store double [[B1]], double* [[EXT5:%.*]], align 8			; CHECK-NEXT: store double [[B1]], double* [[EXT5:%.*]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%IdxA0 = getelementptr inbounds double, double* %A, i64 0			%IdxA0 = getelementptr inbounds double, double* %A, i64 0
	%IdxB0 = getelementptr inbounds double, double* %B, i64 0			%IdxB0 = getelementptr inbounds double, double* %B, i64 0
	%IdxC0 = getelementptr inbounds double, double* %C, i64 0			%IdxC0 = getelementptr inbounds double, double* %C, i64 0
	▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines

	; Same as @ChecksExtractScores, but the extratelement vector operands do not match.			; Same as @ChecksExtractScores, but the extratelement vector operands do not match.
	define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {			define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {
	; CHECK-LABEL: @ChecksExtractScores_different_vectors(			; CHECK-LABEL: @ChecksExtractScores_different_vectors(
	; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0			; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
	; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1			; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4			; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4			; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
	; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0			; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
	; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1			; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
	; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4			; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4			; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
	; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0			; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
	; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1			; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP6]], double [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP10]], double [[EXTRB1]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP9]], double [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP2]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP7]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP5]], [[TMP12]]
	; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0			; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
	; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1			; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
	; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[SIDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[SIDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+sse2 -S \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+sse2 -S \| FileCheck %s --check-prefixes=CHECK
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx -S \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx -S \| FileCheck %s --check-prefixes=CHECK
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx2 -S \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx2 -S \| FileCheck %s --check-prefixes=CHECK

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; These tests ensure that we do not regress due to PR31243. Note that we set			; These tests ensure that we do not regress due to PR31243. Note that we set
	; the SLP threshold to force vectorization even when not profitable.			; the SLP threshold to force vectorization even when not profitable.

	; When computing minimum sizes, if we can prove the sign bit is zero, we can			; When computing minimum sizes, if we can prove the sign bit is zero, we can
	; zero-extend the roots back to their original sizes.			; zero-extend the roots back to their original sizes.
	;			;
	define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {			define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {
	; CHECK-LABEL: @PR31243_zext(			; CHECK-LABEL: @PR31243_zext(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>			; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i32>
	; CHECK-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i8> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i64			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP_5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
				; CHECK-NEXT: [[TMP_5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP_6:%.]] = load i8, i8 [[TMP_4]], align 1			; CHECK-NEXT: [[TMP_6:%.]] = load i8, i8 [[TMP_4]], align 1
	; CHECK-NEXT: [[TMP_7:%.]] = load i8, i8 [[TMP_5]], align 1			; CHECK-NEXT: [[TMP_7:%.]] = load i8, i8 [[TMP_5]], align 1
	; CHECK-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]			; CHECK-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]
	; CHECK-NEXT: ret i8 [[TMP_8]]			; CHECK-NEXT: ret i8 [[TMP_8]]
	;			;
	entry:			entry:
	%tmp_0 = zext i8 %v0 to i32			%tmp_0 = zext i8 %v0 to i32
	%tmp_1 = zext i8 %v1 to i32			%tmp_1 = zext i8 %v1 to i32
	Show All 16 Lines
	; if we can't prove that the upper bit of the original type is equal to			; if we can't prove that the upper bit of the original type is equal to
	; the upper bit of the proposed smaller type. If these two bits are the			; the upper bit of the proposed smaller type. If these two bits are the
	; same (either zero or one) we know that sign-extending from the smaller			; same (either zero or one) we know that sign-extending from the smaller
	; type will result in the same value. Since we don't yet perform this			; type will result in the same value. Since we don't yet perform this
	; optimization, we make the proposed smaller type (i8) larger (i16) to			; optimization, we make the proposed smaller type (i8) larger (i16) to
	; ensure correctness.			; ensure correctness.
	;			;
	define i8 @PR31243_sext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {			define i8 @PR31243_sext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {
	; SSE-LABEL: @PR31243_sext(			; CHECK-LABEL: @PR31243_sext(
	; SSE-NEXT: entry:			; CHECK-NEXT: entry:
	; SSE-NEXT: [[TMP0:%.]] = or i8 [[V0:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0
	; SSE-NEXT: [[TMP1:%.]] = or i8 [[V1:%.]], 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1
	; SSE-NEXT: [[TMP2:%.*]] = sext i8 [[TMP0]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i32>
	; SSE-NEXT: [[TMP3:%.*]] = sext i8 [[TMP1]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
	; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = sext i32 [[TMP4]] to i64
	; SSE-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]
	; SSE-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; SSE-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = sext i32 [[TMP6]] to i64
	; SSE-NEXT: ret i8 [[TMP8]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]
	;			; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1
	; AVX-LABEL: @PR31243_sext(			; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1
	; AVX-NEXT: entry:			; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; AVX-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0			; CHECK-NEXT: ret i8 [[TMP8]]
	; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i16>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP3]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = sext i16 [[TMP4]] to i64
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <2 x i16> [[TMP3]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = sext i16 [[TMP6]] to i64
	; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]
	; AVX-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1
	; AVX-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1
	; AVX-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; AVX-NEXT: ret i8 [[TMP8]]
	;			;
	entry:			entry:
	%tmp0 = sext i8 %v0 to i32			%tmp0 = sext i8 %v0 to i32
	%tmp1 = sext i8 %v1 to i32			%tmp1 = sext i8 %v1 to i32
	%tmp2 = or i32 %tmp0, 1			%tmp2 = or i32 %tmp0, 1
	%tmp3 = or i32 %tmp1, 1			%tmp3 = or i32 %tmp1, 1
	%tmp4 = getelementptr inbounds i8, i8* %ptr, i32 %tmp2			%tmp4 = getelementptr inbounds i8, i8* %ptr, i32 %tmp2
	%tmp5 = getelementptr inbounds i8, i8* %ptr, i32 %tmp3			%tmp5 = getelementptr inbounds i8, i8* %ptr, i32 %tmp3
	%tmp6 = load i8, i8* %tmp4			%tmp6 = load i8, i8* %tmp4
	%tmp7 = load i8, i8* %tmp5			%tmp7 = load i8, i8* %tmp5
	%tmp8 = add i8 %tmp6, %tmp7			%tmp8 = add i8 %tmp6, %tmp7
	ret i8 %tmp8			ret i8 %tmp8
	}			}

llvm/test/Transforms/SLPVectorizer/X86/no_alternate_divrem.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -slp-threshold=-200 -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -slp-threshold=-200 -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -S -slp-min-non-power2-stores-size=1 -slp-min-non-power2-values-size=1 \| FileCheck %s

	define void @test_add_sdiv(i32 %arr1, i32 %arr2, i32 %a0, i32 %a1, i32 %a2, i32 %a3) {			define void @test_add_sdiv(i32 %arr1, i32 %arr2, i32 %a0, i32 %a1, i32 %a2, i32 %a3) {
	; CHECK-LABEL: @test_add_sdiv(			; CHECK-LABEL: @test_add_sdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0			; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0
	; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1			; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1
	; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2			; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2
	; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3			; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3
	; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0			; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0
	; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1			; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1
	; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2			; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2
	; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3			; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3
	; CHECK-NEXT: [[V0:%.]] = load i32, i32 [[GEP1_0]]			; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]], align 4
	; CHECK-NEXT: [[V1:%.]] = load i32, i32 [[GEP1_1]]			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[GEP1_0]] to <4 x i32>*
	; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]]			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 undef>
	; CHECK-NEXT: [[Y0:%.]] = add nsw i32 [[A0:%.]], 1146
	; CHECK-NEXT: [[Y1:%.]] = add nsw i32 [[A1:%.]], 146
	; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42			; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42
	; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
	; CHECK-NEXT: [[RES0:%.*]] = add nsw i32 [[V0]], [[Y0]]			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A1:%.]], i32 1
	; CHECK-NEXT: [[RES1:%.*]] = add nsw i32 [[V1]], [[Y1]]			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <4 x i32> [[TMP4]], <i32 1146, i32 146, i32 0, i32 poison>
	; CHECK-NEXT: [[RES2:%.*]] = sdiv i32 [[V2]], [[Y2]]			; CHECK-NEXT: [[RES2:%.*]] = sdiv i32 [[V2]], [[Y2]]
	; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[SHUFFLE]], [[TMP5]]
	; CHECK-NEXT: store i32 [[RES0]], i32* [[GEP2_0]]			; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]], align 4
	; CHECK-NEXT: store i32 [[RES1]], i32* [[GEP2_1]]			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 2>
	; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]]			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[GEP2_0]] to <4 x i32>*
	; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]]			; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[SHUFFLE1]], <4 x i32>* [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 false, i1 true>)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep1.0 = getelementptr i32, i32* %arr1, i32 0			%gep1.0 = getelementptr i32, i32* %arr1, i32 0
	%gep1.1 = getelementptr i32, i32* %arr1, i32 1			%gep1.1 = getelementptr i32, i32* %arr1, i32 1
	%gep1.2 = getelementptr i32, i32* %arr1, i32 2			%gep1.2 = getelementptr i32, i32* %arr1, i32 2
	%gep1.3 = getelementptr i32, i32* %arr1, i32 3			%gep1.3 = getelementptr i32, i32* %arr1, i32 3
	%gep2.0 = getelementptr i32, i32* %arr2, i32 0			%gep2.0 = getelementptr i32, i32* %arr2, i32 0
	Show All 32 Lines
	; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0			; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0
	; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1			; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1
	; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2			; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2
	; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3			; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3
	; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0			; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0
	; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1			; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1
	; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2			; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2
	; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3			; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3
	; CHECK-NEXT: [[V0:%.]] = load i32, i32 [[GEP1_0]]			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[GEP1_0]] to <4 x i32>*
	; CHECK-NEXT: [[V1:%.]] = load i32, i32 [[GEP1_1]]			; CHECK-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP0]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x i32> undef)
	; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 0>
	; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]]			; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]], align 4
	; CHECK-NEXT: [[Y0:%.]] = add nsw i32 [[A0:%.]], 1146			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
	; CHECK-NEXT: [[Y1:%.]] = add nsw i32 [[A1:%.]], 146			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A1:%.]], i32 1
	; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A2:%.]], i32 2
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <4 x i32> [[TMP4]], <i32 1146, i32 146, i32 42, i32 poison>
				; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 0>
	; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0			; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0
	; CHECK-NEXT: [[RES0:%.*]] = urem i32 [[V0]], [[Y0]]			; CHECK-NEXT: [[TMP6:%.*]] = urem <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[RES1:%.*]] = urem i32 [[V1]], [[Y1]]
	; CHECK-NEXT: [[RES2:%.*]] = urem i32 [[V2]], [[Y2]]
	; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]			; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]
	; CHECK-NEXT: store i32 [[RES0]], i32* [[GEP2_0]]			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[GEP2_0]] to <4 x i32>*
	; CHECK-NEXT: store i32 [[RES1]], i32* [[GEP2_1]]			; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[TMP6]], <4 x i32>* [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>)
	; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]]			; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]], align 4
	; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep1.0 = getelementptr i32, i32* %arr1, i32 0			%gep1.0 = getelementptr i32, i32* %arr1, i32 0
	%gep1.1 = getelementptr i32, i32* %arr1, i32 1			%gep1.1 = getelementptr i32, i32* %arr1, i32 1
	%gep1.2 = getelementptr i32, i32* %arr1, i32 2			%gep1.2 = getelementptr i32, i32* %arr1, i32 2
	%gep1.3 = getelementptr i32, i32* %arr1, i32 3			%gep1.3 = getelementptr i32, i32* %arr1, i32 3
	%gep2.0 = getelementptr i32, i32* %arr2, i32 0			%gep2.0 = getelementptr i32, i32* %arr2, i32 0
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -instcombine -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -instcombine -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx -slp-min-non-power2-stores-size=5 \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"

	; Make sure we order the operands of commutative operations so that we get			; Make sure we order the operands of commutative operations so that we get
	; bigger vectorizable trees.			; bigger vectorizable trees.

	define void @shuffle_operands1(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @shuffle_operands1(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_operands1(			; CHECK-LABEL: @shuffle_operands1(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i32 1			; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%from_1 = getelementptr double, double *%from, i64 1			%from_1 = getelementptr double, double *%from, i64 1
	%v0_1 = load double , double * %from			%v0_1 = load double , double * %from
	%v0_2 = load double , double * %from_1			%v0_2 = load double , double * %from_1
	%v1_1 = fadd double %v0_1, %v1			%v1_1 = fadd double %v0_1, %v1
	▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	}			}

	define void @shuffle_preserve_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @shuffle_preserve_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_preserve_broadcast4(			; CHECK-LABEL: @shuffle_preserve_broadcast4(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 15 Lines
	define void @vecload_vs_broadcast5(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast5(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast5(			; CHECK-LABEL: @vecload_vs_broadcast5(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 14 Lines


	define void @shuffle_preserve_broadcast6(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @shuffle_preserve_broadcast6(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_preserve_broadcast6(			; CHECK-LABEL: @shuffle_preserve_broadcast6(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP2]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 20 Lines
	define void @good_load_order() {			define void @good_load_order() {
	; CHECK-LABEL: @good_load_order(			; CHECK-LABEL: @good_load_order(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; CHECK: for.cond1.preheader:			; CHECK: for.cond1.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float getelementptr inbounds ([32000 x float], [32000 x float]* @a, i32 0, i32 0), align 16			; CHECK-NEXT: [[TMP0:%.]] = load float, float getelementptr inbounds ([32000 x float], [32000 x float]* @a, i32 0, i32 0), align 16
	; CHECK-NEXT: br label [[FOR_BODY3:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3:%.*]]
	; CHECK: for.body3:			; CHECK: for.body3:
	; CHECK-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP14:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP12:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1			; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX]] to <8 x float>*
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]			; CHECK-NEXT: [[TMP6:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> nonnull [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false>, <8 x float> undef)
	; CHECK-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x float> poison, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[TMP7]], <8 x float> [[TMP6]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[MUL45:%.*]] = fmul float [[TMP14]], [[TMP15]]			; CHECK-NEXT: [[TMP9:%.*]] = fmul <8 x float> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: store float [[MUL45]], float* [[ARRAYIDX31]], align 4			; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[ARRAYIDX5]] to <8 x float>*
	; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP9]], <8 x float>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false>)
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP16]], 31995			; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
				; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP11]], 31995
				; CHECK-NEXT: [[TMP12]] = extractelement <8 x float> [[TMP6]], i32 4
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader:			for.cond1.preheader:
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; c[1] = b[1]+a[1]; // swapped b[1] and a[1]			; c[1] = b[1]+a[1]; // swapped b[1] and a[1]

	define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){			define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){
	; CHECK-LABEL: @load_reorder_double(			; CHECK-LABEL: @load_reorder_double(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = load double, double* %a			%1 = load double, double* %a
	%2 = load double, double* %b			%2 = load double, double* %b
	%3 = fadd double %1, %2			%3 = fadd double %1, %2
	store double %3, double* %c			store double %3, double* %c
	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @get_block(i32 %y_pos) local_unnamed_addr #0 {			define void @get_block(i32 %y_pos) local_unnamed_addr #0 {
	; CHECK-LABEL: @get_block(			; CHECK-LABEL: @get_block(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]
	; CHECK: land.lhs.true:			; CHECK: land.lhs.true:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef			; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef
	; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2			; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[SHR15]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[SUB14]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 undef, i32 1			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[SHUFFLE1]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 undef, i32 2			; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[TMP3]], poison
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 undef, i32 3			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[TMP3]], <4 x i32> poison
	; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP6]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = sext <4 x i32> [[TMP5]] to <4 x i64>
	; CHECK-NEXT: [[TMP8:%.*]] = icmp slt <4 x i32> [[TMP7]], undef			; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i64> [[TMP6]] to <4 x i32>
	; CHECK-NEXT: [[TMP9:%.*]] = select <4 x i1> [[TMP8]], <4 x i32> [[TMP7]], <4 x i32> undef			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = sext <4 x i32> [[TMP9]] to <4 x i64>			; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64
	; CHECK-NEXT: [[TMP11:%.*]] = trunc <4 x i64> [[TMP10]] to <4 x i32>			; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1
				; CHECK-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64
				; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP11]]
				; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP13:%.*]] = sext i32 [[TMP12]] to i64			; CHECK-NEXT: [[TMP13:%.*]] = sext i32 [[TMP12]] to i64
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP13]]			; CHECK-NEXT: [[ARRAYIDX31_2:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP7]], i32 3
	; CHECK-NEXT: [[TMP15:%.*]] = sext i32 [[TMP14]] to i64			; CHECK-NEXT: [[TMP15:%.*]] = sext i32 [[TMP14]] to i64
	; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP15]]			; CHECK-NEXT: [[ARRAYIDX31_3:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP15]]
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[TMP11]], i32 2
	; CHECK-NEXT: [[TMP17:%.*]] = sext i32 [[TMP16]] to i64
	; CHECK-NEXT: [[ARRAYIDX31_2:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP17]]
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i32> [[TMP11]], i32 3
	; CHECK-NEXT: [[TMP19:%.*]] = sext i32 [[TMP18]] to i64
	; CHECK-NEXT: [[ARRAYIDX31_3:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP19]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br label %land.lhs.true			br label %land.lhs.true

	land.lhs.true: ; preds = %entry			land.lhs.true: ; preds = %entry
	br i1 undef, label %if.then, label %if.end			br i1 undef, label %if.then, label %if.end

	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; }			; }
	;			;
	; return R+G+B+Y+P;			; return R+G+B+Y+P;
	; }			; }

	define float @foo3(float* nocapture readonly %A) #0 {			define float @foo3(float* nocapture readonly %A) #0 {
	; CHECK-LABEL: @foo3(			; CHECK-LABEL: @foo3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A:%.]] to <8 x float>
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP0]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false>, <8 x float> undef)
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 3
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <8 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP18:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = phi float [ [[TMP3]], [[ENTRY]] ], [ [[TMP11:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x float> [ [[SHUFFLE]], [[ENTRY]] ], [ [[SHRINK_SHUFFLE:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[TMP13:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[SHUFFLE]], [[ENTRY]] ], [ [[TMP18:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP4]]
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP5]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP7:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX14]] to <4 x float>*
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP6:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
	; CHECK-NEXT: [[TMP10:%.]] = load <2 x float>, <2 x float> [[TMP9]], align 4			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x float> poison, float [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP11]] = extractelement <2 x float> [[SHUFFLE1]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x float> poison, float [[TMP11]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x float> [[TMP8]], float [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP13]] = extractelement <2 x float> [[SHUFFLE1]], i32 1			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP12]], float [[TMP13]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[SHUFFLE2]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP8]], i32 2			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP10]], float [[TMP11]], i32 2
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP4]], i32 3			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[SHUFFLE2]], i32 1
	; CHECK-NEXT: [[TMP17:%.*]] = fmul <4 x float> [[TMP16]], <float 1.100000e+01, float 1.000000e+01, float 9.000000e+00, float 8.000000e+00>			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x float> [[TMP12]], float [[TMP13]], i32 3
	; CHECK-NEXT: [[TMP18]] = fadd <4 x float> [[TMP6]], [[TMP17]]			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP6]], i32 2
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x float> [[TMP14]], float [[TMP15]], i32 4
				; CHECK-NEXT: [[TMP17:%.*]] = fmul <8 x float> [[TMP16]], <float 7.000000e+00, float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01, float poison, float poison, float poison>
				; CHECK-NEXT: [[TMP18]] = fadd <8 x float> [[TMP2]], [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP19]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP19]], 121
				; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <4 x i32> <i32 1, i32 2, i32 undef, i32 undef>
				; CHECK-NEXT: [[SHRINK_SHUFFLE]] = shufflevector <4 x float> [[SHUFFLE1]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP18]], i32 3			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x float> [[TMP18]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP20]]			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <8 x float> [[TMP18]], i32 1
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP18]], i32 2			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[TMP20]], [[TMP21]]
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP21]]			; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x float> [[TMP18]], i32 2
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP18]], i32 1			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP22]]			; CHECK-NEXT: [[TMP23:%.*]] = extractelement <8 x float> [[TMP18]], i32 3
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP18]], i32 0			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP23]]			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <8 x float> [[TMP18]], i32 4
				; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi3.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s		; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"		target triple = "x86_64-apple-macosx10.8.0"

%struct.GPar.0.16.26 = type { [0 x double], double }		%struct.GPar.0.16.26 = type { [0 x double], double }

@d = external global double, align 8		@d = external global double, align 8

declare %struct.GPar.0.16.26* @Rf_gpptr(...)		declare %struct.GPar.0.16.26* @Rf_gpptr(...)

define void @Rf_GReset() {		define void @Rf_GReset() {
; CHECK-LABEL: @Rf_GReset(		; CHECK-LABEL: @Rf_GReset(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load double, double @d, align 8		; CHECK-NEXT: [[TMP0:%.]] = load double, double @d, align 8
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP0]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[TMP0]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, [[TMP1]]
; CHECK-NEXT: br i1 icmp eq (%struct.GPar.0.16.26* (...)* inttoptr (i64 115 to %struct.GPar.0.16.26* (...)), %struct.GPar.0.16.26 (...)* @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]		; CHECK-NEXT: br i1 icmp eq (%struct.GPar.0.16.26* (...)* inttoptr (i64 115 to %struct.GPar.0.16.26* (...)), %struct.GPar.0.16.26 (...)* @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef		; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], poison
; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], undef		; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], poison
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP5]], [[TMP6]]
; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]
; CHECK: if.then6:		; CHECK: if.then6:
; CHECK-NEXT: br label [[IF_END7]]		; CHECK-NEXT: br label [[IF_END7]]
; CHECK: if.end7:		; CHECK: if.end7:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
Show All 19 Lines	if.end7: ; preds = %if.then6, %if.then, %entry
%g.0 = phi double [ 0.000000e+00, %if.then6 ], [ %sub, %if.then ], [ %sub, %entry ]		%g.0 = phi double [ 0.000000e+00, %if.then6 ], [ %sub, %if.then ], [ %sub, %entry ]
ret void		ret void
}		}

define void @Rf_GReset_unary_fneg() {		define void @Rf_GReset_unary_fneg() {
; CHECK-LABEL: @Rf_GReset_unary_fneg(		; CHECK-LABEL: @Rf_GReset_unary_fneg(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load double, double @d, align 8		; CHECK-NEXT: [[TMP0:%.]] = load double, double @d, align 8
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP0]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[TMP0]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = fneg <2 x double> [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = fneg <2 x double> [[TMP1]]
; CHECK-NEXT: br i1 icmp eq (%struct.GPar.0.16.26* (...)* inttoptr (i64 115 to %struct.GPar.0.16.26* (...)), %struct.GPar.0.16.26 (...)* @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]		; CHECK-NEXT: br i1 icmp eq (%struct.GPar.0.16.26* (...)* inttoptr (i64 115 to %struct.GPar.0.16.26* (...)), %struct.GPar.0.16.26 (...)* @Rf_gpptr), label [[IF_THEN:%.]], label [[IF_END7:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef		; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], poison
; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], undef		; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], poison
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[TMP5]], [[TMP6]]
; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN6:%.*]], label [[IF_END7]]
; CHECK: if.then6:		; CHECK: if.then6:
; CHECK-NEXT: br label [[IF_END7]]		; CHECK-NEXT: br label [[IF_END7]]
; CHECK: if.end7:		; CHECK: if.end7:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi_landingpad.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-apple-macosx10.9.0 -S -o - \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-apple-macosx10.9.0 -S -o - \| FileCheck %s

	target datalayout = "f64:64:64-v64:64:64"			target datalayout = "f64:64:64-v64:64:64"

	define void @test_phi_in_landingpad() personality i8*			define void @test_phi_in_landingpad() personality i8*
	; CHECK-LABEL: @test_phi_in_landingpad(			; CHECK-LABEL: @test_phi_in_landingpad(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: invoke void @foo()			; CHECK-NEXT: invoke void @foo()
	; CHECK-NEXT: to label [[INNER:%.]] unwind label [[LPAD:%.]]			; CHECK-NEXT: to label [[INNER:%.]] unwind label [[LPAD:%.]]
	; CHECK: inner:			; CHECK: inner:
	; CHECK-NEXT: invoke void @foo()			; CHECK-NEXT: invoke void @foo()
	; CHECK-NEXT: to label [[DONE:%.*]] unwind label [[LPAD]]			; CHECK-NEXT: to label [[DONE:%.*]] unwind label [[LPAD]]
	; CHECK: lpad:			; CHECK: lpad:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x double> [ undef, [[ENTRY:%.]] ], [ undef, [[INNER]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x double> [ poison, [[ENTRY:%.]] ], [ poison, [[INNER]] ]
	; CHECK-NEXT: [[TMP1:%.]] = landingpad { i8, i32 }			; CHECK-NEXT: [[TMP1:%.]] = landingpad { i8, i32 }
	; CHECK-NEXT: catch i8* null			; CHECK-NEXT: catch i8* null
	; CHECK-NEXT: br label [[DONE]]			; CHECK-NEXT: br label [[DONE]]
	; CHECK: done:			; CHECK: done:
	; CHECK-NEXT: [[TMP2:%.*]] = phi <2 x double> [ undef, [[INNER]] ], [ [[TMP0]], [[LPAD]] ]			; CHECK-NEXT: [[TMP2:%.*]] = phi <2 x double> [ poison, [[INNER]] ], [ [[TMP0]], [[LPAD]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {			bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
	entry:			entry:
	invoke void @foo()			invoke void @foo()
	to label %inner unwind label %lpad			to label %inner unwind label %lpad

	inner:			inner:
	Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	for.body.lr.ph.i:
ret void		ret void
}		}

; Function Attrs: norecurse nounwind uwtable		; Function Attrs: norecurse nounwind uwtable
define void @pr35497() local_unnamed_addr #0 {		define void @pr35497() local_unnamed_addr #0 {
; SSE-LABEL: @pr35497(		; SSE-LABEL: @pr35497(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1		; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
; SSE-NEXT: [[AND:%.*]] = shl i64 [[TMP0]], 2
; SSE-NEXT: [[SHL:%.*]] = and i64 [[AND]], 20
; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef		; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef
; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1		; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1
; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5		; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
; SSE-NEXT: [[AND_1:%.*]] = shl i64 undef, 2		; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP0]], i32 1
; SSE-NEXT: [[SHL_1:%.*]] = and i64 [[AND_1]], 20		; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
; SSE-NEXT: [[SHR_1:%.*]] = lshr i64 undef, 6		; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
; SSE-NEXT: [[ADD_1:%.*]] = add nuw nsw i64 [[SHL]], [[SHR_1]]
; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4		; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
; SSE-NEXT: [[SHR_2:%.*]] = lshr i64 undef, 6		; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], poison
; SSE-NEXT: [[ADD_2:%.*]] = add nuw nsw i64 [[SHL_1]], [[SHR_2]]
; SSE-NEXT: [[AND_4:%.*]] = shl i64 [[ADD]], 2
; SSE-NEXT: [[SHL_4:%.*]] = and i64 [[AND_4]], 20
; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1		; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
; SSE-NEXT: store i64 [[ADD_1]], i64* [[ARRAYIDX2_5]], align 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
; SSE-NEXT: [[AND_5:%.*]] = shl nuw nsw i64 [[ADD_1]], 2		; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0
; SSE-NEXT: [[SHL_5:%.*]] = and i64 [[AND_5]], 20		; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1
; SSE-NEXT: [[SHR_5:%.*]] = lshr i64 [[ADD_1]], 6		; SSE-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>
; SSE-NEXT: [[ADD_5:%.*]] = add nuw nsw i64 [[SHL_4]], [[SHR_5]]		; SSE-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>
; SSE-NEXT: store i64 [[ADD_5]], i64* [[ARRAYIDX2_1]], align 1
; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0		; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
; SSE-NEXT: store i64 [[ADD_2]], i64* [[ARRAYIDX2_6]], align 1		; SSE-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
; SSE-NEXT: [[SHR_6:%.*]] = lshr i64 [[ADD_2]], 6		; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
; SSE-NEXT: [[ADD_6:%.*]] = add nuw nsw i64 [[SHL_5]], [[SHR_6]]		; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
; SSE-NEXT: store i64 [[ADD_6]], i64* [[ARRAYIDX2_2]], align 1		; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]
		; SSE-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @pr35497(		; AVX-LABEL: @pr35497(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1		; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef		; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1		; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1
; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5		; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1		; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP0]], i32 1
; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>		; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>		; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4		; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer		; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], poison
; AVX-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1		; AVX-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0		; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1		; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1
; AVX-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>		; AVX-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>
; AVX-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>		; AVX-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>
; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0		; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
; AVX-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*		; AVX-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1		; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
; AVX-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP4]], i32 0		; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
; AVX-NEXT: [[TMP12:%.*]] = insertelement <2 x i64> poison, i64 [[TMP11]], i32 0		; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]
; AVX-NEXT: [[TMP13:%.*]] = insertelement <2 x i64> [[TMP12]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
; AVX-NEXT: [[TMP14:%.*]] = lshr <2 x i64> [[TMP13]], <i64 6, i64 6>		; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
; AVX-NEXT: [[TMP15:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP14]]
; AVX-NEXT: [[TMP16:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
; AVX-NEXT: store <2 x i64> [[TMP15]], <2 x i64>* [[TMP16]], align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
entry:		entry:
%0 = load i64, i64* undef, align 1		%0 = load i64, i64* undef, align 1
%and = shl i64 %0, 2		%and = shl i64 %0, 2
%shl = and i64 %and, 20		%shl = and i64 %and, 20
%add = add i64 undef, undef		%add = add i64 undef, undef
store i64 %add, i64* undef, align 1		store i64 %add, i64* undef, align 1
Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context			; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context

	; Checks that vector insertvalues into the struct become SLP seeds.			; Checks that vector insertvalues into the struct become SLP seeds.
	define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {			define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
	; CHECK-LABEL: @StructOfVectors(			; CHECK-LABEL: @StructOfVectors(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i64 0			; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i64 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1			; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i64 0			; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1			; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]
	▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context			; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context

	; Checks that vector insertvalues into the struct become SLP seeds.			; Checks that vector insertvalues into the struct become SLP seeds.
	define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {			define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
	; CHECK-LABEL: @StructOfVectors(			; CHECK-LABEL: @StructOfVectors(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i64 0			; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i64 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1			; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> undef, float [[TMP6]], i64 0			; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> undef, float [[TMP6]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1			; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]
	Show All 32 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]			; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 23 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 26 Lines
	; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4			; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4			; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4
	; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01			; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01
	; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01			; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01
	; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01			; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01
	; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01			; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[FADD1]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[FADD1]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1
	; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2			; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2
	; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]			; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	Show All 30 Lines
	; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6			; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6
	; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7			; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] %StructIn0, i16 [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN0]], i16 [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] %StructIn2, i16 [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN2]], i16 [[TMP7]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0			; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] %StructIn4, i16 [[TMP9]], 1			; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN4]], i16 [[TMP9]], 1
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0			; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] %StructIn6, i16 [[TMP11]], 1			; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN6]], i16 [[TMP11]], 1
	; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] %StructIn1, 0			; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In0, [[STRUCT1TY]] %StructIn3, 1			; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN0]], [[STRUCT1TY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] %StructIn5, 0			; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] [[STRUCTIN5]], 0
	; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In2, [[STRUCT1TY]] %StructIn7, 1			; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN2]], [[STRUCT1TY]] [[STRUCTIN7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] %Struct2In1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] [[STRUCT2IN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] %Struct2In3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] [[STRUCT2IN3]], 1
	; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0			%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0
	%L0 = load i16, i16 * %GEP0			%L0 = load i16, i16 * %GEP0
	%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1			%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1
	%L1 = load i16, i16 * %GEP1			%L1 = load i16, i16 * %GEP1
	%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2			%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2
	%L2 = load i16, i16 * %GEP2			%L2 = load i16, i16 * %GEP2
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX


	@b = global [8 x i32] zeroinitializer, align 16			@b = global [8 x i32] zeroinitializer, align 16
	@a = global [8 x i32] zeroinitializer, align 16			@a = global [8 x i32] zeroinitializer, align 16

	define void @foo() {			define void @foo() {
	; SSE-LABEL: @foo(			; SSE-LABEL: @foo(
	; SSE-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @b to <4 x i32>*), align 16
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16			; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 2, i32 0, i32 2>
	; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			; SSE-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([8 x i32]* @a to <4 x i32>*), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 0, i32 2>
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8			; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4) to <4 x i32>*), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @foo(			; AVX-LABEL: @foo(
	; AVX-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.masked.gather.v2i32.v2p0i32(<2 x i32> <i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2)>, i32 8, <2 x i1> <i1 true, i1 true>, <2 x i32> undef)			; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @b to <4 x i32>*), align 16
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 2, i32 0, i32 2, i32 0, i32 2, i32 0, i32 2>
				xbolva00Unsubmitted Not Done Reply Inline Actions Regression on avx? xbolva00: Regression on avx?
				ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, looks like the issue with the cost of `@llvm.masked.gather` for masked gather with some undefs in the mask ABataev: Yes, looks like the issue with the cost of `@llvm.masked.gather` for masked gather with some…
				craig.topperUnsubmitted Not Done Reply Inline Actions Gather is slow on CPUs prior to AVX512. And its cost is proportional to the number of elements. I don't think the value of the mask should be a factor. craig.topper: Gather is slow on CPUs prior to AVX512. And its cost is proportional to the number of elements.
				ABataevAuthorUnsubmitted Done Reply Inline Actions True, but in some cases it can be optimized into `gather + shuffle` instead of wide `gather`, if there are undefs in mask. ABataev: True, but in some cases it can be optimized into `gather + shuffle` instead of wide `gather`…
	; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16			; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16
	%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX2		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX2
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512

define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {		define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {
; CHECK-LABEL: @gather_load(		; SSE-LABEL: @gather_load(
; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]		; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11		; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; SSE-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, [[TBAA0]]
; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0		; SSE-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1		; SSE-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 7, i32 4>
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2		; SSE-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[TMP11]], <i32 1, i32 2, i32 3, i32 4>
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3		; SSE-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; CHECK-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align 4, [[TBAA0]]
; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; SSE-NEXT: ret void
; CHECK-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]		;
; CHECK-NEXT: ret void		; AVX-LABEL: @gather_load(
		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
		; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
		; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1
		; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2
		; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3
		; AVX-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
		; AVX-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]
		; AVX-NEXT: ret void
		;
		; AVX2-LABEL: @gather_load(
		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
		; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
		; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
		; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1
		; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2
		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3
		; AVX2-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX2-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]
		; AVX2-NEXT: ret void
		;
		; AVX512-LABEL: @gather_load(
		; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
		; AVX512-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
		; AVX512-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
		; AVX512-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
		; AVX512-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
		; AVX512-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1
		; AVX512-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2
		; AVX512-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3
		; AVX512-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
		; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX512-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]
		; AVX512-NEXT: ret void
;		;
%3 = getelementptr inbounds i32, i32* %1, i64 1		%3 = getelementptr inbounds i32, i32* %1, i64 1
%4 = load i32, i32* %1, align 4, !tbaa !2		%4 = load i32, i32* %1, align 4, !tbaa !2
%5 = getelementptr inbounds i32, i32* %0, i64 1		%5 = getelementptr inbounds i32, i32* %0, i64 1
%6 = getelementptr inbounds i32, i32* %1, i64 11		%6 = getelementptr inbounds i32, i32* %1, i64 11
%7 = load i32, i32* %6, align 4, !tbaa !2		%7 = load i32, i32* %6, align 4, !tbaa !2
%8 = getelementptr inbounds i32, i32* %0, i64 2		%8 = getelementptr inbounds i32, i32* %0, i64 2
%9 = getelementptr inbounds i32, i32* %1, i64 4		%9 = getelementptr inbounds i32, i32* %1, i64 4
%10 = load i32, i32* %9, align 4, !tbaa !2		%10 = load i32, i32* %9, align 4, !tbaa !2
%11 = getelementptr inbounds i32, i32* %0, i64 3		%11 = getelementptr inbounds i32, i32* %0, i64 3
%12 = load i32, i32* %3, align 4, !tbaa !2		%12 = load i32, i32* %3, align 4, !tbaa !2
%13 = insertelement <4 x i32> poison, i32 %4, i32 0		%13 = insertelement <4 x i32> poison, i32 %4, i32 0
%14 = insertelement <4 x i32> %13, i32 %7, i32 1		%14 = insertelement <4 x i32> %13, i32 %7, i32 1
%15 = insertelement <4 x i32> %14, i32 %10, i32 2		%15 = insertelement <4 x i32> %14, i32 %10, i32 2
%16 = insertelement <4 x i32> %15, i32 %12, i32 3		%16 = insertelement <4 x i32> %15, i32 %12, i32 3
%17 = add nsw <4 x i32> %16, <i32 1, i32 2, i32 3, i32 4>		%17 = add nsw <4 x i32> %16, <i32 1, i32 2, i32 3, i32 4>
%18 = bitcast i32* %0 to <4 x i32>*		%18 = bitcast i32* %0 to <4 x i32>*
store <4 x i32> %17, <4 x i32>* %18, align 4, !tbaa !2		store <4 x i32> %17, <4 x i32>* %18, align 4, !tbaa !2
ret void		ret void
}		}

define void @gather_load_2(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {		define void @gather_load_2(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {
; SSE-LABEL: @gather_load_2(		; SSE-LABEL: @gather_load_2(
; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0:!tbaa !.*]]		; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP5:%.*]] = add nsw i32 [[TMP4]], 1		; SSE-NEXT: [[TMP5:%.*]] = add nsw i32 [[TMP4]], 1
; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
; SSE-NEXT: store i32 [[TMP5]], i32* [[TMP0]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP5]], i32* [[TMP0]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10		; SSE-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; SSE-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP8]], 2		; SSE-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP8]], 2
; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2
; SSE-NEXT: store i32 [[TMP9]], i32* [[TMP6]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP9]], i32* [[TMP6]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; SSE-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP11]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP11]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP13:%.*]] = add nsw i32 [[TMP12]], 3		; SSE-NEXT: [[TMP13:%.*]] = add nsw i32 [[TMP12]], 3
; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3		; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3
; SSE-NEXT: store i32 [[TMP13]], i32* [[TMP10]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP13]], i32* [[TMP10]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4		; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4
; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_2(		; AVX-LABEL: @gather_load_2(
; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0:!tbaa !.*]]		; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP5:%.*]] = add nsw i32 [[TMP4]], 1		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[TMP5]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP8]], 2		; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2		; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[TMP9]], i32* [[TMP6]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i32 1
; AVX-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP11]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i32 2
; AVX-NEXT: [[TMP13:%.*]] = add nsw i32 [[TMP12]], 3		; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i32 3
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3		; AVX-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: store i32 [[TMP13]], i32* [[TMP10]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4
; AVX-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_2(		; AVX2-LABEL: @gather_load_2(
; AVX2-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX2-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP4]], <4 x i64> <i64 1, i64 10, i64 3, i64 5>		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX2-NEXT: [[TMP6:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0:!tbaa !.*]]		; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[TMP6]], <i32 1, i32 2, i32 3, i32 4>		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX2-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
; AVX2-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
		; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i32 1
		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i32 2
		; AVX2-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i32 3
		; AVX2-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX2-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_2(		; AVX512-LABEL: @gather_load_2(
; AVX512-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0		; AVX512-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0
; AVX512-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX512-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> undef, <4 x i32> zeroinitializer
; AVX512-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP4]], <4 x i64> <i64 1, i64 10, i64 3, i64 5>		; AVX512-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP4]], <4 x i64> <i64 1, i64 10, i64 3, i64 5>
; AVX512-NEXT: [[TMP6:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0:!tbaa !.*]]		; AVX512-NEXT: [[TMP6:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]
; AVX512-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[TMP6]], <i32 1, i32 2, i32 3, i32 4>		; AVX512-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[TMP6]], <i32 1, i32 2, i32 3, i32 4>
; AVX512-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX512-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX512-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4, [[TBAA0]]		; AVX512-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4, [[TBAA0]]
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
%3 = getelementptr inbounds i32, i32* %1, i64 1		%3 = getelementptr inbounds i32, i32* %1, i64 1
%4 = load i32, i32* %3, align 4, !tbaa !2		%4 = load i32, i32* %3, align 4, !tbaa !2
%5 = add nsw i32 %4, 1		%5 = add nsw i32 %4, 1
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4		; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_3(		; AVX-LABEL: @gather_load_3(
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], 2		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2		; AVX-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
; AVX-NEXT: store i32 [[TMP8]], i32* [[TMP5]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
; AVX-NEXT: [[TMP12:%.*]] = add i32 [[TMP11]], 3		; AVX-NEXT: [[TMP13:%.]] = load <4 x i32>, <4 x i32> [[TMP12]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3		; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX-NEXT: store i32 [[TMP12]], i32* [[TMP9]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP16:%.*]] = add i32 [[TMP15]], 4		; AVX-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 4		; AVX-NEXT: [[TMP17:%.*]] = insertelement <8 x i32> [[TMP16]], i32 [[TMP5]], i32 1
; AVX-NEXT: store i32 [[TMP16]], i32* [[TMP13]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> [[TMP17]], i32 [[TMP7]], i32 2
; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18		; AVX-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0
; AVX-NEXT: [[TMP19:%.]] = load i32, i32 [[TMP18]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP19]], i32 3
; AVX-NEXT: [[TMP20:%.*]] = add i32 [[TMP19]], 1		; AVX-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3
; AVX-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP21]], i32 4
; AVX-NEXT: store i32 [[TMP20]], i32* [[TMP17]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP13]], i32 3
; AVX-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP23]], i32 5
; AVX-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP22]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP13]], i32 0
; AVX-NEXT: [[TMP24:%.*]] = add i32 [[TMP23]], 2		; AVX-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP25]], i32 6
; AVX-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6		; AVX-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP15]], i32 7
; AVX-NEXT: store i32 [[TMP24]], i32* [[TMP21]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP28:%.*]] = add <8 x i32> [[TMP27]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
; AVX-NEXT: [[TMP27:%.]] = load i32, i32 [[TMP26]], align 4, [[TBAA0]]		; AVX-NEXT: store <8 x i32> [[TMP28]], <8 x i32>* [[TMP29]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP28:%.*]] = add i32 [[TMP27]], 3
; AVX-NEXT: [[TMP29:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
; AVX-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
; AVX-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_3(		; AVX2-LABEL: @gather_load_3(
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, [[TBAA0]]
; AVX2-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX2-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0		; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX2-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
; AVX2-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX2-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>		; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX2-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
; AVX2-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*		; AVX2-NEXT: [[TMP13:%.]] = load <4 x i32>, <4 x i32> [[TMP12]], align 4, [[TBAA0]]
; AVX2-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX2-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
; AVX2-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2		; AVX2-NEXT: [[TMP17:%.*]] = insertelement <8 x i32> [[TMP16]], i32 [[TMP5]], i32 1
; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6		; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> [[TMP17]], i32 [[TMP7]], i32 2
; AVX2-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0
; AVX2-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP19]], i32 3
; AVX2-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3
; AVX2-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3		; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP21]], i32 4
; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7		; AVX2-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP13]], i32 3
; AVX2-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP23]], i32 5
; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX2-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP13]], i32 0
; AVX2-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP25]], i32 6
; AVX2-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4		; AVX2-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP15]], i32 7
; AVX2-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP28:%.*]] = add <8 x i32> [[TMP27]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
		; AVX2-NEXT: store <8 x i32> [[TMP28]], <8 x i32>* [[TMP29]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_3(		; AVX512-LABEL: @gather_load_3(
; AVX512-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]		; AVX512-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX512-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
; AVX512-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX512-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0		; AVX512-NEXT: [[TMP6:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
; AVX512-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP6]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
; AVX512-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX512-NEXT: [[TMP7:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
; AVX512-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX512-NEXT: [[TMP8:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), [[TBAA0]]
; AVX512-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>		; AVX512-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
; AVX512-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX512-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP5]] to <8 x i32>*
; AVX512-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*		; AVX512-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), [[TBAA0]]
; AVX512-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX512-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2
; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
; AVX512-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX512-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3
; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
; AVX512-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX512-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
; AVX512-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, [[TBAA0]]
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
%3 = load i32, i32* %1, align 4, !tbaa !2		%3 = load i32, i32* %1, align 4, !tbaa !2
%4 = add i32 %3, 1		%4 = add i32 %3, 1
%5 = getelementptr inbounds i32, i32* %0, i64 1		%5 = getelementptr inbounds i32, i32* %0, i64 1
store i32 %4, i32* %0, align 4, !tbaa !2		store i32 %4, i32* %0, align 4, !tbaa !2
%6 = getelementptr inbounds i32, i32* %1, i64 11		%6 = getelementptr inbounds i32, i32* %1, i64 11
%7 = load i32, i32* %6, align 4, !tbaa !2		%7 = load i32, i32* %6, align 4, !tbaa !2
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i32 [[T16]], i32* [[T13]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T16]], i32* [[T13]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T20]], i32* [[T17]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T20]], i32* [[T17]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_4(		; AVX-LABEL: @gather_load_4(
; AVX-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11		; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
; AVX-NEXT: [[T9:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 2
; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4		; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
; AVX-NEXT: [[T13:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 3
; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15		; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
; AVX-NEXT: [[T17:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 4
; AVX-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
; AVX-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
; AVX-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6		; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21		; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]		; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]
; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, [[TBAA0]]		; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, [[TBAA0]]
; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, [[TBAA0]]		; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, [[TBAA0]]
; AVX-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[T14]] to <4 x i32>*
; AVX-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, [[TBAA0]]
; AVX-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = bitcast i32 [[T26]] to <4 x i32>*
; AVX-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, [[TBAA0]]
; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]		; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]
; AVX-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0
; AVX-NEXT: [[T8:%.*]] = add i32 [[T7]], 2		; AVX-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T7]], i32 1
; AVX-NEXT: [[T12:%.*]] = add i32 [[T11]], 3		; AVX-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T11]], i32 2
; AVX-NEXT: [[T16:%.*]] = add i32 [[T15]], 4		; AVX-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
; AVX-NEXT: [[T20:%.*]] = add i32 [[T19]], 1		; AVX-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP8]], i32 3
; AVX-NEXT: [[T24:%.*]] = add i32 [[T23]], 2		; AVX-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
; AVX-NEXT: [[T28:%.*]] = add i32 [[T27]], 3		; AVX-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[TMP10]], i32 4
; AVX-NEXT: [[T32:%.*]] = add i32 [[T31]], 4		; AVX-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP12]], i32 5
; AVX-NEXT: store i32 [[T8]], i32* [[T5]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
; AVX-NEXT: store i32 [[T12]], i32* [[T9]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[TMP14]], i32 6
; AVX-NEXT: store i32 [[T16]], i32* [[T13]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> [[TMP15]], i32 [[T31]], i32 7
; AVX-NEXT: store i32 [[T20]], i32* [[T17]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP17:%.*]] = add <8 x i32> [[TMP16]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP18:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
; AVX-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]		; AVX-NEXT: store <8 x i32> [[TMP17]], <8 x i32>* [[TMP18]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_4(		; AVX2-LABEL: @gather_load_4(
; AVX2-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1		; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
; AVX2-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0		; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
; AVX2-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
; AVX2-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
; AVX2-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6		; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX2-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21		; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]		; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX2-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, [[TBAA0]]		; AVX2-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[T14]] to <4 x i32>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP3:%.]] = bitcast i32 [[T26]] to <4 x i32>*
		; AVX2-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]		; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0
; AVX2-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>		; AVX2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T7]], i32 1
; AVX2-NEXT: [[T24:%.*]] = add i32 [[T23]], 2		; AVX2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T11]], i32 2
; AVX2-NEXT: [[T28:%.*]] = add i32 [[T27]], 3		; AVX2-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
; AVX2-NEXT: [[T32:%.*]] = add i32 [[T31]], 4		; AVX2-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP8]], i32 3
; AVX2-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
; AVX2-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[TMP10]], i32 4
; AVX2-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX2-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP12]], i32 5
; AVX2-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
; AVX2-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[TMP14]], i32 6
		; AVX2-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> [[TMP15]], i32 [[T31]], i32 7
		; AVX2-NEXT: [[TMP17:%.*]] = add <8 x i32> [[TMP16]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP18:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
		; AVX2-NEXT: store <8 x i32> [[TMP17]], <8 x i32>* [[TMP18]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_4(		; AVX512-LABEL: @gather_load_4(
; AVX512-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1		; AVX512-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
; AVX512-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0		; AVX512-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32* [[T1:%.*]], i32 0
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX512-NEXT: [[TMP2:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
; AVX512-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX512-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
; AVX512-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX512-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX512-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX512-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX512-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]		; AVX512-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX512-NEXT: [[TMP3:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), [[TBAA0]]
; AVX512-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, [[TBAA0]]
; AVX512-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, [[TBAA0]]
; AVX512-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]
; AVX512-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX512-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
; AVX512-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>		; AVX512-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
; AVX512-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
; AVX512-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
; AVX512-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
; AVX512-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]		; AVX512-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*		; AVX512-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <8 x i32>*
; AVX512-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, [[TBAA0]]		; AVX512-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP4]], <8 x i32>* [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), [[TBAA0]]
; AVX512-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]
; AVX512-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]
; AVX512-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
%t5 = getelementptr inbounds i32, i32* %t0, i64 1		%t5 = getelementptr inbounds i32, i32* %t0, i64 1
%t6 = getelementptr inbounds i32, i32* %t1, i64 11		%t6 = getelementptr inbounds i32, i32* %t1, i64 11
%t9 = getelementptr inbounds i32, i32* %t0, i64 2		%t9 = getelementptr inbounds i32, i32* %t0, i64 2
%t10 = getelementptr inbounds i32, i32* %t1, i64 4		%t10 = getelementptr inbounds i32, i32* %t1, i64 4
%t13 = getelementptr inbounds i32, i32* %t0, i64 3		%t13 = getelementptr inbounds i32, i32* %t0, i64 3
%t14 = getelementptr inbounds i32, i32* %t1, i64 15		%t14 = getelementptr inbounds i32, i32* %t1, i64 15
Show All 34 Lines	;
store i32 %t32, i32* %t29, align 4, !tbaa !2		store i32 %t32, i32* %t29, align 4, !tbaa !2

ret void		ret void
}		}


define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {		define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {
; SSE-LABEL: @gather_load_div(		; SSE-LABEL: @gather_load_div(
; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; SSE-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; SSE-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP6:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0		; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; SSE-NEXT: [[TMP7:%.]] = insertelement <4 x float> [[TMP6]], float* [[TMP3]], i32 1		; SSE-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP8:%.]] = insertelement <4 x float> [[TMP7]], float* [[TMP4]], i32 2		; SSE-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; SSE-NEXT: [[TMP9:%.]] = insertelement <4 x float> [[TMP8]], float* [[TMP5]], i32 3		; SSE-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP10:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP9]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; SSE-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP6]], <4 x float*> undef, <4 x i32> zeroinitializer		; SSE-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP12:%.]] = getelementptr float, <4 x float> [[TMP11]], <4 x i64> <i64 4, i64 13, i64 11, i64 44>		; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; SSE-NEXT: [[TMP13:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP12]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP14:%.*]] = fdiv <4 x float> [[TMP10]], [[TMP13]]		; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4		; SSE-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP16:%.]] = bitcast float [[TMP0]] to <4 x float>*		; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; SSE-NEXT: store <4 x float> [[TMP14]], <4 x float>* [[TMP16]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP17:%.]] = getelementptr float, <4 x float> [[TMP11]], <4 x i64> <i64 17, i64 8, i64 5, i64 20>		; SSE-NEXT: [[TMP18:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
; SSE-NEXT: [[TMP18:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP17]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP19:%.*]] = insertelement <4 x float> [[TMP18]], float [[TMP7]], i32 1
; SSE-NEXT: [[TMP19:%.]] = getelementptr float, <4 x float> [[TMP11]], <4 x i64> <i64 33, i64 30, i64 27, i64 23>		; SSE-NEXT: [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP11]], i32 2
; SSE-NEXT: [[TMP20:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP19]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float [[TMP15]], i32 3
; SSE-NEXT: [[TMP21:%.*]] = fdiv <4 x float> [[TMP18]], [[TMP20]]		; SSE-NEXT: [[TMP22:%.*]] = insertelement <4 x float> poison, float [[TMP5]], i32 0
; SSE-NEXT: [[TMP22:%.]] = bitcast float [[TMP15]] to <4 x float>*		; SSE-NEXT: [[TMP23:%.*]] = insertelement <4 x float> [[TMP22]], float [[TMP9]], i32 1
; SSE-NEXT: store <4 x float> [[TMP21]], <4 x float>* [[TMP22]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP13]], i32 2
		; SSE-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP17]], i32 3
		; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]
		; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4
		; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*
		; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
		; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
		; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
		; SSE-NEXT: [[TMP34:%.]] = bitcast float [[TMP33]] to <4 x float>*
		; SSE-NEXT: [[TMP35:%.]] = load <4 x float>, <4 x float> [[TMP34]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
		; SSE-NEXT: [[TMP37:%.]] = bitcast float [[TMP36]] to <4 x float>*
		; SSE-NEXT: [[TMP38:%.]] = load <4 x float>, <4 x float> [[TMP37]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
		; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
		; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP43:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i32 0
		; SSE-NEXT: [[TMP44:%.*]] = extractelement <4 x float> [[TMP35]], i32 3
		; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> [[TMP43]], float [[TMP44]], i32 1
		; SSE-NEXT: [[TMP46:%.*]] = extractelement <4 x float> [[TMP35]], i32 0
		; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP46]], i32 2
		; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP40]], i32 3
		; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i32 0
		; SSE-NEXT: [[TMP50:%.*]] = extractelement <4 x float> [[TMP38]], i32 3
		; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP50]], i32 1
		; SSE-NEXT: [[TMP52:%.*]] = extractelement <4 x float> [[TMP38]], i32 0
		; SSE-NEXT: [[TMP53:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP52]], i32 2
		; SSE-NEXT: [[TMP54:%.*]] = insertelement <4 x float> [[TMP53]], float [[TMP42]], i32 3
		; SSE-NEXT: [[TMP55:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP54]]
		; SSE-NEXT: [[TMP56:%.]] = bitcast float [[TMP27]] to <4 x float>*
		; SSE-NEXT: store <4 x float> [[TMP55]], <4 x float>* [[TMP56]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_div(		; AVX-LABEL: @gather_load_div(
; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0		; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1		; AVX-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2		; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3		; AVX-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4		; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5		; AVX-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6		; AVX-NEXT: [[TMP16:%.]] = load float, float [[TMP15]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7		; AVX-NEXT: [[TMP17:%.]] = bitcast float [[TMP14]] to <4 x float>*
; AVX-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX-NEXT: [[TMP18:%.]] = load <4 x float>, <4 x float> [[TMP17]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer		; AVX-NEXT: [[TMP19:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>		; AVX-NEXT: [[TMP20:%.]] = load float, float [[TMP19]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]		; AVX-NEXT: [[TMP22:%.]] = bitcast float [[TMP21]] to <4 x float>*
; AVX-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX-NEXT: [[TMP23:%.]] = load <4 x float>, <4 x float> [[TMP22]], align 4, [[TBAA0]]
; AVX-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
		; AVX-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <4 x float>*
		; AVX-NEXT: [[TMP26:%.]] = load <4 x float>, <4 x float> [[TMP25]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
		; AVX-NEXT: [[TMP28:%.]] = load float, float [[TMP27]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
		; AVX-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP31:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
		; AVX-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[TMP7]], i32 1
		; AVX-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[TMP11]], i32 2
		; AVX-NEXT: [[TMP34:%.*]] = extractelement <4 x float> [[TMP18]], i32 0
		; AVX-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP33]], float [[TMP34]], i32 3
		; AVX-NEXT: [[TMP36:%.*]] = extractelement <4 x float> [[TMP18]], i32 3
		; AVX-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP36]], i32 4
		; AVX-NEXT: [[TMP38:%.*]] = extractelement <4 x float> [[TMP23]], i32 3
		; AVX-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP38]], i32 5
		; AVX-NEXT: [[TMP40:%.*]] = extractelement <4 x float> [[TMP23]], i32 0
		; AVX-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP40]], i32 6
		; AVX-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[TMP28]], i32 7
		; AVX-NEXT: [[TMP43:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0
		; AVX-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP9]], i32 1
		; AVX-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP13]], i32 2
		; AVX-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP16]], i32 3
		; AVX-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP20]], i32 4
		; AVX-NEXT: [[TMP48:%.*]] = extractelement <4 x float> [[TMP26]], i32 3
		; AVX-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP48]], i32 5
		; AVX-NEXT: [[TMP50:%.*]] = extractelement <4 x float> [[TMP26]], i32 0
		; AVX-NEXT: [[TMP51:%.*]] = insertelement <8 x float> [[TMP49]], float [[TMP50]], i32 6
		; AVX-NEXT: [[TMP52:%.*]] = insertelement <8 x float> [[TMP51]], float [[TMP30]], i32 7
		; AVX-NEXT: [[TMP53:%.*]] = fdiv <8 x float> [[TMP42]], [[TMP52]]
		; AVX-NEXT: [[TMP54:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
		; AVX-NEXT: store <8 x float> [[TMP53]], <8 x float>* [[TMP54]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_div(		; AVX2-LABEL: @gather_load_div(
; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; AVX2-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX2-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0		; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX2-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1		; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX2-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3		; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX2-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5		; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX2-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6		; AVX2-NEXT: [[TMP16:%.]] = load float, float [[TMP15]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7		; AVX2-NEXT: [[TMP17:%.]] = bitcast float [[TMP14]] to <4 x float>*
; AVX2-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX2-NEXT: [[TMP18:%.]] = load <4 x float>, <4 x float> [[TMP17]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer		; AVX2-NEXT: [[TMP19:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX2-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>		; AVX2-NEXT: [[TMP20:%.]] = load float, float [[TMP19]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX2-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]		; AVX2-NEXT: [[TMP22:%.]] = bitcast float [[TMP21]] to <4 x float>*
; AVX2-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX2-NEXT: [[TMP23:%.]] = load <4 x float>, <4 x float> [[TMP22]], align 4, [[TBAA0]]
; AVX2-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
		; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <4 x float>*
		; AVX2-NEXT: [[TMP26:%.]] = load <4 x float>, <4 x float> [[TMP25]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
		; AVX2-NEXT: [[TMP28:%.]] = load float, float [[TMP27]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
		; AVX2-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP31:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
		; AVX2-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[TMP7]], i32 1
		; AVX2-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[TMP11]], i32 2
		; AVX2-NEXT: [[TMP34:%.*]] = extractelement <4 x float> [[TMP18]], i32 0
		; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP33]], float [[TMP34]], i32 3
		; AVX2-NEXT: [[TMP36:%.*]] = extractelement <4 x float> [[TMP18]], i32 3
		; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP36]], i32 4
		; AVX2-NEXT: [[TMP38:%.*]] = extractelement <4 x float> [[TMP23]], i32 3
		; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP38]], i32 5
		; AVX2-NEXT: [[TMP40:%.*]] = extractelement <4 x float> [[TMP23]], i32 0
		; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP40]], i32 6
		; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[TMP28]], i32 7
		; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0
		; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP9]], i32 1
		; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP13]], i32 2
		; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP16]], i32 3
		; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP20]], i32 4
		; AVX2-NEXT: [[TMP48:%.*]] = extractelement <4 x float> [[TMP26]], i32 3
		; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP48]], i32 5
		; AVX2-NEXT: [[TMP50:%.*]] = extractelement <4 x float> [[TMP26]], i32 0
		; AVX2-NEXT: [[TMP51:%.*]] = insertelement <8 x float> [[TMP49]], float [[TMP50]], i32 6
		; AVX2-NEXT: [[TMP52:%.*]] = insertelement <8 x float> [[TMP51]], float [[TMP30]], i32 7
		; AVX2-NEXT: [[TMP53:%.*]] = fdiv <8 x float> [[TMP42]], [[TMP52]]
		; AVX2-NEXT: [[TMP54:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
		; AVX2-NEXT: store <8 x float> [[TMP53]], <8 x float>* [[TMP54]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_div(		; AVX512-LABEL: @gather_load_div(
; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX2		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX2
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512
; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512

define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {		define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {
; CHECK-LABEL: @gather_load(		; SSE-LABEL: @gather_load(
; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]		; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11		; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; SSE-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, [[TBAA0]]
; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0		; SSE-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1		; SSE-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 7, i32 4>
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2		; SSE-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[TMP11]], <i32 1, i32 2, i32 3, i32 4>
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3		; SSE-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; CHECK-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align 4, [[TBAA0]]
; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; SSE-NEXT: ret void
; CHECK-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]		;
; CHECK-NEXT: ret void		; AVX-LABEL: @gather_load(
		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
		; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
		; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1
		; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2
		; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3
		; AVX-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
		; AVX-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]
		; AVX-NEXT: ret void
		;
		; AVX2-LABEL: @gather_load(
		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
		; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
		; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
		; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1
		; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2
		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3
		; AVX2-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX2-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]
		; AVX2-NEXT: ret void
		;
		; AVX512-LABEL: @gather_load(
		; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
		; AVX512-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, [[TBAA0:!tbaa !.*]]
		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
		; AVX512-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
		; AVX512-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
		; AVX512-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
		; AVX512-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1
		; AVX512-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2
		; AVX512-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3
		; AVX512-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
		; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX512-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, [[TBAA0]]
		; AVX512-NEXT: ret void
;		;
%3 = getelementptr inbounds i32, i32* %1, i64 1		%3 = getelementptr inbounds i32, i32* %1, i64 1
%4 = load i32, i32* %1, align 4, !tbaa !2		%4 = load i32, i32* %1, align 4, !tbaa !2
%5 = getelementptr inbounds i32, i32* %0, i64 1		%5 = getelementptr inbounds i32, i32* %0, i64 1
%6 = getelementptr inbounds i32, i32* %1, i64 11		%6 = getelementptr inbounds i32, i32* %1, i64 11
%7 = load i32, i32* %6, align 4, !tbaa !2		%7 = load i32, i32* %6, align 4, !tbaa !2
%8 = getelementptr inbounds i32, i32* %0, i64 2		%8 = getelementptr inbounds i32, i32* %0, i64 2
%9 = getelementptr inbounds i32, i32* %1, i64 4		%9 = getelementptr inbounds i32, i32* %1, i64 4
%10 = load i32, i32* %9, align 4, !tbaa !2		%10 = load i32, i32* %9, align 4, !tbaa !2
%11 = getelementptr inbounds i32, i32* %0, i64 3		%11 = getelementptr inbounds i32, i32* %0, i64 3
%12 = load i32, i32* %3, align 4, !tbaa !2		%12 = load i32, i32* %3, align 4, !tbaa !2
%13 = insertelement <4 x i32> undef, i32 %4, i32 0		%13 = insertelement <4 x i32> undef, i32 %4, i32 0
%14 = insertelement <4 x i32> %13, i32 %7, i32 1		%14 = insertelement <4 x i32> %13, i32 %7, i32 1
%15 = insertelement <4 x i32> %14, i32 %10, i32 2		%15 = insertelement <4 x i32> %14, i32 %10, i32 2
%16 = insertelement <4 x i32> %15, i32 %12, i32 3		%16 = insertelement <4 x i32> %15, i32 %12, i32 3
%17 = add nsw <4 x i32> %16, <i32 1, i32 2, i32 3, i32 4>		%17 = add nsw <4 x i32> %16, <i32 1, i32 2, i32 3, i32 4>
%18 = bitcast i32* %0 to <4 x i32>*		%18 = bitcast i32* %0 to <4 x i32>*
store <4 x i32> %17, <4 x i32>* %18, align 4, !tbaa !2		store <4 x i32> %17, <4 x i32>* %18, align 4, !tbaa !2
ret void		ret void
}		}

define void @gather_load_2(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {		define void @gather_load_2(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {
; SSE-LABEL: @gather_load_2(		; SSE-LABEL: @gather_load_2(
; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0:!tbaa !.*]]		; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP5:%.*]] = add nsw i32 [[TMP4]], 1		; SSE-NEXT: [[TMP5:%.*]] = add nsw i32 [[TMP4]], 1
; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
; SSE-NEXT: store i32 [[TMP5]], i32* [[TMP0]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP5]], i32* [[TMP0]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10		; SSE-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; SSE-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP8]], 2		; SSE-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP8]], 2
; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2
; SSE-NEXT: store i32 [[TMP9]], i32* [[TMP6]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP9]], i32* [[TMP6]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; SSE-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP11]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP11]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP13:%.*]] = add nsw i32 [[TMP12]], 3		; SSE-NEXT: [[TMP13:%.*]] = add nsw i32 [[TMP12]], 3
; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3		; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3
; SSE-NEXT: store i32 [[TMP13]], i32* [[TMP10]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP13]], i32* [[TMP10]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4		; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4
; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_2(		; AVX-LABEL: @gather_load_2(
; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0:!tbaa !.*]]		; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP5:%.*]] = add nsw i32 [[TMP4]], 1		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[TMP5]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP8]], 2		; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2		; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[TMP9]], i32* [[TMP6]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i32 1
; AVX-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP11]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i32 2
; AVX-NEXT: [[TMP13:%.*]] = add nsw i32 [[TMP12]], 3		; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i32 3
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3		; AVX-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: store i32 [[TMP13]], i32* [[TMP10]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4
; AVX-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_2(		; AVX2-LABEL: @gather_load_2(
; AVX2-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX2-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP4]], <4 x i64> <i64 1, i64 10, i64 3, i64 5>		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX2-NEXT: [[TMP6:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0:!tbaa !.*]]		; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[TMP6]], <i32 1, i32 2, i32 3, i32 4>		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX2-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, [[TBAA0]]
; AVX2-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
		; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i32 1
		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i32 2
		; AVX2-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i32 3
		; AVX2-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
		; AVX2-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_2(		; AVX512-LABEL: @gather_load_2(
; AVX512-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0		; AVX512-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0
; AVX512-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX512-NEXT: [[TMP4:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> undef, <4 x i32> zeroinitializer
; AVX512-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP4]], <4 x i64> <i64 1, i64 10, i64 3, i64 5>		; AVX512-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP4]], <4 x i64> <i64 1, i64 10, i64 3, i64 5>
; AVX512-NEXT: [[TMP6:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0:!tbaa !.*]]		; AVX512-NEXT: [[TMP6:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]
; AVX512-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[TMP6]], <i32 1, i32 2, i32 3, i32 4>		; AVX512-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[TMP6]], <i32 1, i32 2, i32 3, i32 4>
; AVX512-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX512-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX512-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4, [[TBAA0]]		; AVX512-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4, [[TBAA0]]
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
%3 = getelementptr inbounds i32, i32* %1, i64 1		%3 = getelementptr inbounds i32, i32* %1, i64 1
%4 = load i32, i32* %3, align 4, !tbaa !2		%4 = load i32, i32* %3, align 4, !tbaa !2
%5 = add nsw i32 %4, 1		%5 = add nsw i32 %4, 1
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4		; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_3(		; AVX-LABEL: @gather_load_3(
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], 2		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2		; AVX-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
; AVX-NEXT: store i32 [[TMP8]], i32* [[TMP5]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
; AVX-NEXT: [[TMP12:%.*]] = add i32 [[TMP11]], 3		; AVX-NEXT: [[TMP13:%.]] = load <4 x i32>, <4 x i32> [[TMP12]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 3		; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX-NEXT: store i32 [[TMP12]], i32* [[TMP9]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP16:%.*]] = add i32 [[TMP15]], 4		; AVX-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 4		; AVX-NEXT: [[TMP17:%.*]] = insertelement <8 x i32> [[TMP16]], i32 [[TMP5]], i32 1
; AVX-NEXT: store i32 [[TMP16]], i32* [[TMP13]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> [[TMP17]], i32 [[TMP7]], i32 2
; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18		; AVX-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0
; AVX-NEXT: [[TMP19:%.]] = load i32, i32 [[TMP18]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP19]], i32 3
; AVX-NEXT: [[TMP20:%.*]] = add i32 [[TMP19]], 1		; AVX-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3
; AVX-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP21]], i32 4
; AVX-NEXT: store i32 [[TMP20]], i32* [[TMP17]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP13]], i32 3
; AVX-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP23]], i32 5
; AVX-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP22]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP13]], i32 0
; AVX-NEXT: [[TMP24:%.*]] = add i32 [[TMP23]], 2		; AVX-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP25]], i32 6
; AVX-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6		; AVX-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP15]], i32 7
; AVX-NEXT: store i32 [[TMP24]], i32* [[TMP21]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP28:%.*]] = add <8 x i32> [[TMP27]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
; AVX-NEXT: [[TMP27:%.]] = load i32, i32 [[TMP26]], align 4, [[TBAA0]]		; AVX-NEXT: store <8 x i32> [[TMP28]], <8 x i32>* [[TMP29]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP28:%.*]] = add i32 [[TMP27]], 3
; AVX-NEXT: [[TMP29:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
; AVX-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
; AVX-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_3(		; AVX2-LABEL: @gather_load_3(
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, [[TBAA0]]
; AVX2-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX2-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0		; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX2-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
; AVX2-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX2-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>		; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX2-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
; AVX2-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*		; AVX2-NEXT: [[TMP13:%.]] = load <4 x i32>, <4 x i32> [[TMP12]], align 4, [[TBAA0]]
; AVX2-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX2-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
; AVX2-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2		; AVX2-NEXT: [[TMP17:%.*]] = insertelement <8 x i32> [[TMP16]], i32 [[TMP5]], i32 1
; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6		; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> [[TMP17]], i32 [[TMP7]], i32 2
; AVX2-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0
; AVX2-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP19]], i32 3
; AVX2-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3
; AVX2-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3		; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP21]], i32 4
; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7		; AVX2-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP13]], i32 3
; AVX2-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP23]], i32 5
; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX2-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP13]], i32 0
; AVX2-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP25]], i32 6
; AVX2-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4		; AVX2-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP15]], i32 7
; AVX2-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP28:%.*]] = add <8 x i32> [[TMP27]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
		; AVX2-NEXT: store <8 x i32> [[TMP28]], <8 x i32>* [[TMP29]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_3(		; AVX512-LABEL: @gather_load_3(
; AVX512-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]		; AVX512-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX512-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
; AVX512-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]		; AVX512-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0		; AVX512-NEXT: [[TMP6:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
; AVX512-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP6]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
; AVX512-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX512-NEXT: [[TMP7:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
; AVX512-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX512-NEXT: [[TMP8:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), [[TBAA0]]
; AVX512-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>		; AVX512-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
; AVX512-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX512-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP5]] to <8 x i32>*
; AVX512-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*		; AVX512-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), [[TBAA0]]
; AVX512-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX512-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2
; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
; AVX512-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX512-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3
; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
; AVX512-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX512-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
; AVX512-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, [[TBAA0]]
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
%3 = load i32, i32* %1, align 4, !tbaa !2		%3 = load i32, i32* %1, align 4, !tbaa !2
%4 = add i32 %3, 1		%4 = add i32 %3, 1
%5 = getelementptr inbounds i32, i32* %0, i64 1		%5 = getelementptr inbounds i32, i32* %0, i64 1
store i32 %4, i32* %0, align 4, !tbaa !2		store i32 %4, i32* %0, align 4, !tbaa !2
%6 = getelementptr inbounds i32, i32* %1, i64 11		%6 = getelementptr inbounds i32, i32* %1, i64 11
%7 = load i32, i32* %6, align 4, !tbaa !2		%7 = load i32, i32* %6, align 4, !tbaa !2
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i32 [[T16]], i32* [[T13]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T16]], i32* [[T13]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T20]], i32* [[T17]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T20]], i32* [[T17]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]
; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]		; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_4(		; AVX-LABEL: @gather_load_4(
; AVX-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11		; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
; AVX-NEXT: [[T9:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 2
; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4		; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
; AVX-NEXT: [[T13:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 3
; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15		; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
; AVX-NEXT: [[T17:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 4
; AVX-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
; AVX-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
; AVX-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6		; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21		; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]		; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]
; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, [[TBAA0]]		; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, [[TBAA0]]
; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, [[TBAA0]]		; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, [[TBAA0]]
; AVX-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[T14]] to <4 x i32>*
; AVX-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, [[TBAA0]]
; AVX-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = bitcast i32 [[T26]] to <4 x i32>*
; AVX-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, [[TBAA0]]
; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]		; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]
; AVX-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0
; AVX-NEXT: [[T8:%.*]] = add i32 [[T7]], 2		; AVX-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T7]], i32 1
; AVX-NEXT: [[T12:%.*]] = add i32 [[T11]], 3		; AVX-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T11]], i32 2
; AVX-NEXT: [[T16:%.*]] = add i32 [[T15]], 4		; AVX-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
; AVX-NEXT: [[T20:%.*]] = add i32 [[T19]], 1		; AVX-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP8]], i32 3
; AVX-NEXT: [[T24:%.*]] = add i32 [[T23]], 2		; AVX-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
; AVX-NEXT: [[T28:%.*]] = add i32 [[T27]], 3		; AVX-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[TMP10]], i32 4
; AVX-NEXT: [[T32:%.*]] = add i32 [[T31]], 4		; AVX-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP12]], i32 5
; AVX-NEXT: store i32 [[T8]], i32* [[T5]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
; AVX-NEXT: store i32 [[T12]], i32* [[T9]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[TMP14]], i32 6
; AVX-NEXT: store i32 [[T16]], i32* [[T13]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> [[TMP15]], i32 [[T31]], i32 7
; AVX-NEXT: store i32 [[T20]], i32* [[T17]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP17:%.*]] = add <8 x i32> [[TMP16]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP18:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
; AVX-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]		; AVX-NEXT: store <8 x i32> [[TMP17]], <8 x i32>* [[TMP18]], align 4, [[TBAA0]]
; AVX-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_4(		; AVX2-LABEL: @gather_load_4(
; AVX2-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1		; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
; AVX2-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0		; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
; AVX2-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
; AVX2-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
; AVX2-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6		; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX2-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21		; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]		; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX2-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, [[TBAA0]]		; AVX2-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[T14]] to <4 x i32>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP3:%.]] = bitcast i32 [[T26]] to <4 x i32>*
		; AVX2-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]		; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]
; AVX2-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0
; AVX2-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>		; AVX2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T7]], i32 1
; AVX2-NEXT: [[T24:%.*]] = add i32 [[T23]], 2		; AVX2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T11]], i32 2
; AVX2-NEXT: [[T28:%.*]] = add i32 [[T27]], 3		; AVX2-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
; AVX2-NEXT: [[T32:%.*]] = add i32 [[T31]], 4		; AVX2-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP8]], i32 3
; AVX2-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
; AVX2-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[TMP10]], i32 4
; AVX2-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX2-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP12]], i32 5
; AVX2-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
; AVX2-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP15:%.*]] = insertelement <8 x i32> [[TMP13]], i32 [[TMP14]], i32 6
		; AVX2-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> [[TMP15]], i32 [[T31]], i32 7
		; AVX2-NEXT: [[TMP17:%.*]] = add <8 x i32> [[TMP16]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
		; AVX2-NEXT: [[TMP18:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
		; AVX2-NEXT: store <8 x i32> [[TMP17]], <8 x i32>* [[TMP18]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_4(		; AVX512-LABEL: @gather_load_4(
; AVX512-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1		; AVX512-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
; AVX512-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0		; AVX512-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32* [[T1:%.*]], i32 0
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> undef, <4 x i32> zeroinitializer		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX512-NEXT: [[TMP2:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
; AVX512-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX512-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
; AVX512-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX512-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX512-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX512-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX512-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]		; AVX512-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), [[TBAA0]]		; AVX512-NEXT: [[TMP3:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), [[TBAA0]]
; AVX512-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, [[TBAA0]]
; AVX512-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, [[TBAA0]]
; AVX512-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, [[TBAA0]]
; AVX512-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX512-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
; AVX512-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>		; AVX512-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
; AVX512-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
; AVX512-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
; AVX512-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
; AVX512-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]		; AVX512-NEXT: store i32 [[T4]], i32* [[T0]], align 4, [[TBAA0]]
; AVX512-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*		; AVX512-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <8 x i32>*
; AVX512-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, [[TBAA0]]		; AVX512-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP4]], <8 x i32>* [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), [[TBAA0]]
; AVX512-NEXT: store i32 [[T24]], i32* [[T21]], align 4, [[TBAA0]]
; AVX512-NEXT: store i32 [[T28]], i32* [[T25]], align 4, [[TBAA0]]
; AVX512-NEXT: store i32 [[T32]], i32* [[T29]], align 4, [[TBAA0]]
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
%t5 = getelementptr inbounds i32, i32* %t0, i64 1		%t5 = getelementptr inbounds i32, i32* %t0, i64 1
%t6 = getelementptr inbounds i32, i32* %t1, i64 11		%t6 = getelementptr inbounds i32, i32* %t1, i64 11
%t9 = getelementptr inbounds i32, i32* %t0, i64 2		%t9 = getelementptr inbounds i32, i32* %t0, i64 2
%t10 = getelementptr inbounds i32, i32* %t1, i64 4		%t10 = getelementptr inbounds i32, i32* %t1, i64 4
%t13 = getelementptr inbounds i32, i32* %t0, i64 3		%t13 = getelementptr inbounds i32, i32* %t0, i64 3
%t14 = getelementptr inbounds i32, i32* %t1, i64 15		%t14 = getelementptr inbounds i32, i32* %t1, i64 15
Show All 34 Lines	;
store i32 %t32, i32* %t29, align 4, !tbaa !2		store i32 %t32, i32* %t29, align 4, !tbaa !2

ret void		ret void
}		}


define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {		define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {
; SSE-LABEL: @gather_load_div(		; SSE-LABEL: @gather_load_div(
; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; SSE-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; SSE-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP6:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0		; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; SSE-NEXT: [[TMP7:%.]] = insertelement <4 x float> [[TMP6]], float* [[TMP3]], i32 1		; SSE-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP8:%.]] = insertelement <4 x float> [[TMP7]], float* [[TMP4]], i32 2		; SSE-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; SSE-NEXT: [[TMP9:%.]] = insertelement <4 x float> [[TMP8]], float* [[TMP5]], i32 3		; SSE-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP10:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP9]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; SSE-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP6]], <4 x float*> undef, <4 x i32> zeroinitializer		; SSE-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP12:%.]] = getelementptr float, <4 x float> [[TMP11]], <4 x i64> <i64 4, i64 13, i64 11, i64 44>		; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; SSE-NEXT: [[TMP13:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP12]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP14:%.*]] = fdiv <4 x float> [[TMP10]], [[TMP13]]		; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4		; SSE-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP16:%.]] = bitcast float [[TMP0]] to <4 x float>*		; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; SSE-NEXT: store <4 x float> [[TMP14]], <4 x float>* [[TMP16]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, [[TBAA0]]
; SSE-NEXT: [[TMP17:%.]] = getelementptr float, <4 x float> [[TMP11]], <4 x i64> <i64 17, i64 8, i64 5, i64 20>		; SSE-NEXT: [[TMP18:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
; SSE-NEXT: [[TMP18:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP17]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP19:%.*]] = insertelement <4 x float> [[TMP18]], float [[TMP7]], i32 1
; SSE-NEXT: [[TMP19:%.]] = getelementptr float, <4 x float> [[TMP11]], <4 x i64> <i64 33, i64 30, i64 27, i64 23>		; SSE-NEXT: [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP11]], i32 2
; SSE-NEXT: [[TMP20:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP19]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), [[TBAA0]]		; SSE-NEXT: [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float [[TMP15]], i32 3
; SSE-NEXT: [[TMP21:%.*]] = fdiv <4 x float> [[TMP18]], [[TMP20]]		; SSE-NEXT: [[TMP22:%.*]] = insertelement <4 x float> poison, float [[TMP5]], i32 0
; SSE-NEXT: [[TMP22:%.]] = bitcast float [[TMP15]] to <4 x float>*		; SSE-NEXT: [[TMP23:%.*]] = insertelement <4 x float> [[TMP22]], float [[TMP9]], i32 1
; SSE-NEXT: store <4 x float> [[TMP21]], <4 x float>* [[TMP22]], align 4, [[TBAA0]]		; SSE-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP13]], i32 2
		; SSE-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP17]], i32 3
		; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]
		; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4
		; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*
		; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
		; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
		; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
		; SSE-NEXT: [[TMP34:%.]] = bitcast float [[TMP33]] to <4 x float>*
		; SSE-NEXT: [[TMP35:%.]] = load <4 x float>, <4 x float> [[TMP34]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
		; SSE-NEXT: [[TMP37:%.]] = bitcast float [[TMP36]] to <4 x float>*
		; SSE-NEXT: [[TMP38:%.]] = load <4 x float>, <4 x float> [[TMP37]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
		; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
		; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, [[TBAA0]]
		; SSE-NEXT: [[TMP43:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i32 0
		; SSE-NEXT: [[TMP44:%.*]] = extractelement <4 x float> [[TMP35]], i32 3
		; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> [[TMP43]], float [[TMP44]], i32 1
		; SSE-NEXT: [[TMP46:%.*]] = extractelement <4 x float> [[TMP35]], i32 0
		; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP46]], i32 2
		; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP40]], i32 3
		; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i32 0
		; SSE-NEXT: [[TMP50:%.*]] = extractelement <4 x float> [[TMP38]], i32 3
		; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP50]], i32 1
		; SSE-NEXT: [[TMP52:%.*]] = extractelement <4 x float> [[TMP38]], i32 0
		; SSE-NEXT: [[TMP53:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP52]], i32 2
		; SSE-NEXT: [[TMP54:%.*]] = insertelement <4 x float> [[TMP53]], float [[TMP42]], i32 3
		; SSE-NEXT: [[TMP55:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP54]]
		; SSE-NEXT: [[TMP56:%.]] = bitcast float [[TMP27]] to <4 x float>*
		; SSE-NEXT: store <4 x float> [[TMP55]], <4 x float>* [[TMP56]], align 4, [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_div(		; AVX-LABEL: @gather_load_div(
; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0		; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1		; AVX-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2		; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3		; AVX-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4		; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5		; AVX-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6		; AVX-NEXT: [[TMP16:%.]] = load float, float [[TMP15]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7		; AVX-NEXT: [[TMP17:%.]] = bitcast float [[TMP14]] to <4 x float>*
; AVX-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX-NEXT: [[TMP18:%.]] = load <4 x float>, <4 x float> [[TMP17]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer		; AVX-NEXT: [[TMP19:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>		; AVX-NEXT: [[TMP20:%.]] = load float, float [[TMP19]], align 4, [[TBAA0]]
; AVX-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]		; AVX-NEXT: [[TMP22:%.]] = bitcast float [[TMP21]] to <4 x float>*
; AVX-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX-NEXT: [[TMP23:%.]] = load <4 x float>, <4 x float> [[TMP22]], align 4, [[TBAA0]]
; AVX-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, [[TBAA0]]		; AVX-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
		; AVX-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <4 x float>*
		; AVX-NEXT: [[TMP26:%.]] = load <4 x float>, <4 x float> [[TMP25]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
		; AVX-NEXT: [[TMP28:%.]] = load float, float [[TMP27]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
		; AVX-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, [[TBAA0]]
		; AVX-NEXT: [[TMP31:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
		; AVX-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[TMP7]], i32 1
		; AVX-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[TMP11]], i32 2
		; AVX-NEXT: [[TMP34:%.*]] = extractelement <4 x float> [[TMP18]], i32 0
		; AVX-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP33]], float [[TMP34]], i32 3
		; AVX-NEXT: [[TMP36:%.*]] = extractelement <4 x float> [[TMP18]], i32 3
		; AVX-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP36]], i32 4
		; AVX-NEXT: [[TMP38:%.*]] = extractelement <4 x float> [[TMP23]], i32 3
		; AVX-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP38]], i32 5
		; AVX-NEXT: [[TMP40:%.*]] = extractelement <4 x float> [[TMP23]], i32 0
		; AVX-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP40]], i32 6
		; AVX-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[TMP28]], i32 7
		; AVX-NEXT: [[TMP43:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0
		; AVX-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP9]], i32 1
		; AVX-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP13]], i32 2
		; AVX-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP16]], i32 3
		; AVX-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP20]], i32 4
		; AVX-NEXT: [[TMP48:%.*]] = extractelement <4 x float> [[TMP26]], i32 3
		; AVX-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP48]], i32 5
		; AVX-NEXT: [[TMP50:%.*]] = extractelement <4 x float> [[TMP26]], i32 0
		; AVX-NEXT: [[TMP51:%.*]] = insertelement <8 x float> [[TMP49]], float [[TMP50]], i32 6
		; AVX-NEXT: [[TMP52:%.*]] = insertelement <8 x float> [[TMP51]], float [[TMP30]], i32 7
		; AVX-NEXT: [[TMP53:%.*]] = fdiv <8 x float> [[TMP42]], [[TMP52]]
		; AVX-NEXT: [[TMP54:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
		; AVX-NEXT: store <8 x float> [[TMP53]], <8 x float>* [[TMP54]], align 4, [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_div(		; AVX2-LABEL: @gather_load_div(
; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; AVX2-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX2-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0		; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX2-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1		; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX2-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3		; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX2-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5		; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX2-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6		; AVX2-NEXT: [[TMP16:%.]] = load float, float [[TMP15]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7		; AVX2-NEXT: [[TMP17:%.]] = bitcast float [[TMP14]] to <4 x float>*
; AVX2-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX2-NEXT: [[TMP18:%.]] = load <4 x float>, <4 x float> [[TMP17]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer		; AVX2-NEXT: [[TMP19:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX2-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>		; AVX2-NEXT: [[TMP20:%.]] = load float, float [[TMP19]], align 4, [[TBAA0]]
; AVX2-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), [[TBAA0]]		; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX2-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]		; AVX2-NEXT: [[TMP22:%.]] = bitcast float [[TMP21]] to <4 x float>*
; AVX2-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX2-NEXT: [[TMP23:%.]] = load <4 x float>, <4 x float> [[TMP22]], align 4, [[TBAA0]]
; AVX2-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, [[TBAA0]]		; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
		; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <4 x float>*
		; AVX2-NEXT: [[TMP26:%.]] = load <4 x float>, <4 x float> [[TMP25]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
		; AVX2-NEXT: [[TMP28:%.]] = load float, float [[TMP27]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
		; AVX2-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, [[TBAA0]]
		; AVX2-NEXT: [[TMP31:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
		; AVX2-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[TMP7]], i32 1
		; AVX2-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[TMP11]], i32 2
		; AVX2-NEXT: [[TMP34:%.*]] = extractelement <4 x float> [[TMP18]], i32 0
		; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP33]], float [[TMP34]], i32 3
		; AVX2-NEXT: [[TMP36:%.*]] = extractelement <4 x float> [[TMP18]], i32 3
		; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP36]], i32 4
		; AVX2-NEXT: [[TMP38:%.*]] = extractelement <4 x float> [[TMP23]], i32 3
		; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP38]], i32 5
		; AVX2-NEXT: [[TMP40:%.*]] = extractelement <4 x float> [[TMP23]], i32 0
		; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP40]], i32 6
		; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[TMP28]], i32 7
		; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0
		; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP9]], i32 1
		; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP13]], i32 2
		; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP16]], i32 3
		; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP20]], i32 4
		; AVX2-NEXT: [[TMP48:%.*]] = extractelement <4 x float> [[TMP26]], i32 3
		; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP48]], i32 5
		; AVX2-NEXT: [[TMP50:%.*]] = extractelement <4 x float> [[TMP26]], i32 0
		; AVX2-NEXT: [[TMP51:%.*]] = insertelement <8 x float> [[TMP49]], float [[TMP50]], i32 6
		; AVX2-NEXT: [[TMP52:%.*]] = insertelement <8 x float> [[TMP51]], float [[TMP30]], i32 7
		; AVX2-NEXT: [[TMP53:%.*]] = fdiv <8 x float> [[TMP42]], [[TMP52]]
		; AVX2-NEXT: [[TMP54:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
		; AVX2-NEXT: store <8 x float> [[TMP53]], <8 x float>* [[TMP54]], align 4, [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @gather_load_div(		; AVX512-LABEL: @gather_load_div(
; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10		; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s
	; These conversions should be vectorized by reviews.llvm.org/D57059			; These conversions should be vectorized by reviews.llvm.org/D57059

	define dso_local <4 x float> @foo(<4 x i32> %0) {			define dso_local <4 x float> @foo(<4 x i32> %0) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP0:%.]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP0:%.]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = sitofp i32 [[TMP2]] to float			; CHECK-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[SHUFFLE]] to <4 x float>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP3]], i32 1			; CHECK-NEXT: ret <4 x float> [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp i32 [[TMP6]] to float
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP5]], float [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = sitofp i32 [[TMP9]] to float
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP10]], i32 3
	; CHECK-NEXT: ret <4 x float> [[TMP11]]
	;			;
	%2 = extractelement <4 x i32> %0, i32 1			%2 = extractelement <4 x i32> %0, i32 1
	%3 = sitofp i32 %2 to float			%3 = sitofp i32 %2 to float
	%4 = insertelement <4 x float> undef, float %3, i32 0			%4 = insertelement <4 x float> undef, float %3, i32 0
	%5 = insertelement <4 x float> %4, float %3, i32 1			%5 = insertelement <4 x float> %4, float %3, i32 1
	%6 = extractelement <4 x i32> %0, i32 2			%6 = extractelement <4 x i32> %0, i32 2
	%7 = sitofp i32 %6 to float			%7 = sitofp i32 %6 to float
	%8 = insertelement <4 x float> %5, float %7, i32 2			%8 = insertelement <4 x float> %5, float %7, i32 2
	%9 = extractelement <4 x i32> %0, i32 3			%9 = extractelement <4 x i32> %0, i32 3
	%10 = sitofp i32 %9 to float			%10 = sitofp i32 %9 to float
	%11 = insertelement <4 x float> %8, float %10, i32 3			%11 = insertelement <4 x float> %8, float %10, i32 3
	ret <4 x float> %11			ret <4 x float> %11
	}			}

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines

	; PR43745 https://bugs.llvm.org/show_bug.cgi?id=43745			; PR43745 https://bugs.llvm.org/show_bug.cgi?id=43745

	define i1 @fcmp_lt_gt(double %a, double %b, double %c) {			define i1 @fcmp_lt_gt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt_gt(			; CHECK-LABEL: @fcmp_lt_gt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D
	Show All 31 Lines
	cleanup:			cleanup:
	ret i1 false			ret i1 false
	}			}

	define i1 @fcmp_lt(double %a, double %b, double %c) {			define i1 @fcmp_lt(double %a, double %b, double %c) {
	; CHECK-LABEL: @fcmp_lt(			; CHECK-LABEL: @fcmp_lt(
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> [[TMP1]], double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[MUL]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[MUL]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>			; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0
	Show All 15 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @hoge() {			define void @hoge() {
	; CHECK-LABEL: @hoge(			; CHECK-LABEL: @hoge(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br i1 undef, label [[BB1:%.]], label [[BB2:%.]]			; CHECK-NEXT: br i1 undef, label [[BB1:%.]], label [[BB2:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15			; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> poison, i16 [[T]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> poison, i16 [[T]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 undef, i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i16> [[TMP1]] to <2 x i32>			; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i16> [[SHUFFLE]] to <2 x i32>
	; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> <i32 undef, i32 63>, [[TMP2]]			; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <2 x i32> <i32 poison, i32 63>, [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP3]], undef			; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP2]], poison
	; CHECK-NEXT: [[SHUFFLE10:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE11:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[SHUFFLE10]], <i32 15, i32 31, i32 47, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[SHUFFLE11]], <i32 15, i32 31, i32 47, i32 poison>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP6]], i32 undef			; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP5]], i32 undef
	; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63			; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63
	; CHECK-NEXT: [[TMP7:%.*]] = sub nsw <2 x i32> undef, [[TMP2]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i32> poison, [[TMP1]]
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP7]], undef			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP6]], poison
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHUFFLE1]], <i32 -49, i32 -33, i32 -33, i32 -17>
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP9]])			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP8]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP10]], undef			; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP9]], undef
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP10]], i32 undef			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP9]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = icmp slt i32 [[OP_EXTRA1]], undef			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = icmp slt i32 [[OP_EXTRA2]], undef
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = select i1 [[OP_EXTRA2]], i32 [[OP_EXTRA1]], i32 undef			; CHECK-NEXT: [[OP_EXTRA4:%.*]] = select i1 [[OP_EXTRA3]], i32 [[OP_EXTRA2]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = icmp slt i32 [[OP_EXTRA3]], undef			; CHECK-NEXT: [[OP_EXTRA5:%.*]] = icmp slt i32 [[OP_EXTRA4]], undef
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = select i1 [[OP_EXTRA4]], i32 [[OP_EXTRA3]], i32 undef			; CHECK-NEXT: [[OP_EXTRA6:%.*]] = select i1 [[OP_EXTRA5]], i32 [[OP_EXTRA4]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = icmp slt i32 [[OP_EXTRA5]], undef			; CHECK-NEXT: [[OP_EXTRA7:%.*]] = icmp slt i32 [[OP_EXTRA6]], undef
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[OP_EXTRA6]], i32 [[OP_EXTRA5]], i32 undef			; CHECK-NEXT: [[OP_EXTRA8:%.*]] = select i1 [[OP_EXTRA7]], i32 [[OP_EXTRA6]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA8:%.*]] = icmp slt i32 [[OP_EXTRA7]], undef			; CHECK-NEXT: [[OP_EXTRA9:%.*]] = icmp slt i32 [[OP_EXTRA8]], undef
	; CHECK-NEXT: [[OP_EXTRA9:%.*]] = select i1 [[OP_EXTRA8]], i32 [[OP_EXTRA7]], i32 undef			; CHECK-NEXT: [[OP_EXTRA10:%.*]] = select i1 [[OP_EXTRA9]], i32 [[OP_EXTRA8]], i32 undef
	; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_EXTRA9]]			; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_EXTRA10]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	bb:			bb:
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1: ; preds = %bb			bb1: ; preds = %bb
	ret void			ret void

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

	Show All 13 Lines
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2
	; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[CONV31_I]], i32 4
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[CONV31_I]], i32 5
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[CONV31_I]], i32 6
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[CONV31_I]], i32 7
	; CHECK-NEXT: [[TMP9:%.*]] = lshr <8 x i32> [[TMP8]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8			; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8
	; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9			; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9
	; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10			; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10
	; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11			; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP14:%.*]] = lshr <4 x i32> [[TMP13]], <i32 9, i32 10, i32 11, i32 12>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12			; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12
	; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13
	; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13			; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13
	; CHECK-NEXT: [[SHR_13_I_I:%.*]] = lshr i32 [[CONV31_I]], 14
	; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14			; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14
	; CHECK-NEXT: [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP1]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = lshr <16 x i32> [[SHUFFLE]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 poison>
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP16]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[TMP2]], i32 14
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP18]], i32 2			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP20]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x i32> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP22]], i32 4			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x i32> [[TMP2]], i32 2
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP9]], i32 4			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[TMP9]], i32 3
	; CHECK-NEXT: [[TMP25:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP24]], i32 5			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x i32> [[TMP2]], i32 3
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <8 x i32> [[TMP9]], i32 5			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[TMP11]], i32 4
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <16 x i32> [[TMP25]], i32 [[TMP26]], i32 6			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x i32> [[TMP2]], i32 4
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP9]], i32 6			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[TMP13]], i32 5
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <16 x i32> [[TMP27]], i32 [[TMP28]], i32 7			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x i32> [[TMP2]], i32 5
	; CHECK-NEXT: [[TMP30:%.*]] = extractelement <8 x i32> [[TMP9]], i32 7			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[TMP15]], i32 6
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <16 x i32> [[TMP29]], i32 [[TMP30]], i32 8			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x i32> [[TMP2]], i32 6
	; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <16 x i32> [[TMP16]], i32 [[TMP17]], i32 7
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <16 x i32> [[TMP31]], i32 [[TMP32]], i32 9			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <16 x i32> [[TMP2]], i32 7
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <16 x i32> [[TMP18]], i32 [[TMP19]], i32 8
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <16 x i32> [[TMP33]], i32 [[TMP34]], i32 10			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <16 x i32> [[TMP2]], i32 8
	; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[TMP21]], i32 9
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <16 x i32> [[TMP35]], i32 [[TMP36]], i32 11			; CHECK-NEXT: [[TMP23:%.*]] = extractelement <16 x i32> [[TMP2]], i32 9
	; CHECK-NEXT: [[TMP38:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3			; CHECK-NEXT: [[TMP24:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[TMP23]], i32 10
	; CHECK-NEXT: [[TMP39:%.*]] = insertelement <16 x i32> [[TMP37]], i32 [[TMP38]], i32 12			; CHECK-NEXT: [[TMP25:%.*]] = extractelement <16 x i32> [[TMP2]], i32 10
	; CHECK-NEXT: [[TMP40:%.*]] = insertelement <16 x i32> [[TMP39]], i32 [[SHR_12_I_I]], i32 13			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <16 x i32> [[TMP24]], i32 [[TMP25]], i32 11
	; CHECK-NEXT: [[TMP41:%.*]] = insertelement <16 x i32> [[TMP40]], i32 [[SHR_13_I_I]], i32 14			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <16 x i32> [[TMP2]], i32 11
	; CHECK-NEXT: [[TMP42:%.*]] = insertelement <16 x i32> [[TMP41]], i32 [[SHR_14_I_I]], i32 15			; CHECK-NEXT: [[TMP28:%.*]] = insertelement <16 x i32> [[TMP26]], i32 [[TMP27]], i32 12
	; CHECK-NEXT: [[TMP43:%.*]] = trunc <16 x i32> [[TMP42]] to <16 x i8>			; CHECK-NEXT: [[TMP29:%.*]] = extractelement <16 x i32> [[TMP2]], i32 12
	; CHECK-NEXT: [[TMP44:%.*]] = and <16 x i8> [[TMP43]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; CHECK-NEXT: [[TMP30:%.*]] = insertelement <16 x i32> [[TMP28]], i32 [[TMP29]], i32 13
				; CHECK-NEXT: [[TMP31:%.*]] = extractelement <16 x i32> [[TMP2]], i32 13
				; CHECK-NEXT: [[TMP32:%.*]] = insertelement <16 x i32> [[TMP30]], i32 [[TMP31]], i32 14
				; CHECK-NEXT: [[TMP33:%.*]] = insertelement <16 x i32> [[TMP32]], i32 [[TMP3]], i32 15
				; CHECK-NEXT: [[TMP34:%.*]] = trunc <16 x i32> [[TMP33]] to <16 x i8>
				; CHECK-NEXT: [[TMP35:%.*]] = and <16 x i8> [[TMP34]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15			; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15
	; CHECK-NEXT: [[TMP45:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*			; CHECK-NEXT: [[TMP36:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP44]], <16 x i8>* [[TMP45]], align 1			; CHECK-NEXT: store <16 x i8> [[TMP35]], <16 x i8>* [[TMP36]], align 1
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end50.i:			; CHECK: if.end50.i:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end50.i, label %if.then22.i			br i1 undef, label %if.end50.i, label %if.then22.i

	if.then22.i: ; preds = %entry			if.then22.i: ; preds = %entry
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/rgb_phi.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx -slp-min-non-power2-values-size=2 \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
	target triple = "i386-apple-macosx10.9.0"			target triple = "i386-apple-macosx10.9.0"

	; We disable the vectorization of <3 x float> for now			; We disable the vectorization of <3 x float> for now

	; float foo(float *A) {			; float foo(float *A) {
	;			;
	; float R = A[0];			; float R = A[0];
	; float G = A[1];			; float G = A[1];
	; float B = A[2];			; float B = A[2];
	; for (int i=0; i < 121; i+=3) {			; for (int i=0; i < 121; i+=3) {
	; R+=A[i+0]*7;			; R+=A[i+0]*7;
	; G+=A[i+1]*8;			; G+=A[i+1]*8;
	; B+=A[i+2]*9;			; B+=A[i+2]*9;
	; }			; }
	;			;
	; return R+G+B;			; return R+G+B;
	; }			; }

	define float @foo(float* nocapture readonly %A) {			define float @foo(float* nocapture readonly %A) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A:%.]] to <4 x float>
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP0]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX1]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[TMP3:%.]] = phi float [ [[TMP0]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi float [ [[TMP2]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[B_032:%.]] = phi float [ [[TMP2]], [[ENTRY]] ], [ [[ADD14:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP14:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[G_031:%.]] = phi float [ [[TMP1]], [[ENTRY]] ], [ [[ADD9:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[TMP5:%.*]] = add nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[R_030:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD4:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP5]]
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP3]], 7.000000e+00			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX7]] to <2 x float>*
	; CHECK-NEXT: [[ADD4]] = fadd float [[R_030]], [[MUL]]			; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX7]], align 4			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP9]], i32 1
	; CHECK-NEXT: [[MUL8:%.*]] = fmul float [[TMP5]], 8.000000e+00			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
	; CHECK-NEXT: [[ADD9]] = fadd float [[G_031]], [[MUL8]]			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x float> [[TMP10]], float [[TMP11]], i32 2
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP13:%.*]] = fmul <4 x float> [[TMP12]], <float 7.000000e+00, float 8.000000e+00, float 9.000000e+00, float poison>
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP14]] = fadd <4 x float> [[TMP4]], [[TMP13]]
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX12]], align 4
	; CHECK-NEXT: [[MUL13:%.*]] = fmul float [[TMP7]], 9.000000e+00
	; CHECK-NEXT: [[ADD14]] = fadd float [[B_032]], [[MUL13]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP15:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP8]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP15]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]
	; CHECK: for.body.for.body_crit_edge:			; CHECK: for.body.for.body_crit_edge:
	; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4			; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[ADD4]], [[ADD9]]			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x float> [[TMP14]], i32 0
	; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[ADD14]]			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[TMP14]], i32 1
				; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[TMP16]], [[TMP17]]
				; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP14]], i32 2
				; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[TMP18]]
	; CHECK-NEXT: ret float [[ADD17]]			; CHECK-NEXT: ret float [[ADD17]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4

	define i32 @slp_schedule_bundle() local_unnamed_addr #0 {			define i32 @slp_schedule_bundle() local_unnamed_addr #0 {
	; CHECK-LABEL: @slp_schedule_bundle(			; CHECK-LABEL: @slp_schedule_bundle(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([1 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast ([1 x i32]* @b to <8 x i32>*), i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
	; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31>			; CHECK-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31, i32 31, i32 31, i32 poison, i32 poison>
	; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = xor <8 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 poison, i32 poison>
	; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([1 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP2]], <8 x i32>* bitcast ([1 x i32]* @a to <8 x i32>*), i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>)
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0), align 4
	; CHECK-NEXT: [[DOTLOBIT_4:%.*]] = lshr i32 [[TMP3]], 31
	; CHECK-NEXT: [[DOTLOBIT_NOT_4:%.*]] = xor i32 [[DOTLOBIT_4]], 1
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_4]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0), align 4
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 5, i64 0), align 4
	; CHECK-NEXT: [[DOTLOBIT_5:%.*]] = lshr i32 [[TMP4]], 31
	; CHECK-NEXT: [[DOTLOBIT_NOT_5:%.*]] = xor i32 [[DOTLOBIT_5]], 1
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_5]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 5, i64 0), align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4			%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4
	%.lobit = lshr i32 %0, 31			%.lobit = lshr i32 %0, 31
	%.lobit.not = xor i32 %.lobit, 1			%.lobit.not = xor i32 %.lobit, 1
	store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4			store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4
	%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4			%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

Show All 9 Lines
; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, i32 [[PTR1:%.*]], i32 3		; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, i32 [[PTR1:%.*]], i32 3
; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>
; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: [[TMP34:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 4		; CHECK-NEXT: [[TMP34:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 4
; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 5		; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 5
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], undef		; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], poison
; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> undef, <4 x i32> [[SHUFFLE1]]		; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> poison, <4 x i32> [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> zeroinitializer, <4 x i32> zeroinitializer, <4 x i32> [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> poison, <4 x i32> zeroinitializer, <4 x i32> [[TMP4]]
; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 6		; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 6
; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP27]] to <4 x i32>*		; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP27]] to <4 x i32>*
; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 8		; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
bb:		bb:
%tmp7 = getelementptr inbounds i32, i32* %ptr, i64 1		%tmp7 = getelementptr inbounds i32, i32* %ptr, i64 1
%tmp8 = getelementptr inbounds i32, i32* %ptr, i64 0		%tmp8 = getelementptr inbounds i32, i32* %ptr, i64 0
Show All 32 Lines	bb:
ret void		ret void
}		}

define internal i32 @ipvideo_decode_block_opcode_0xD_16() {		define internal i32 @ipvideo_decode_block_opcode_0xD_16() {
; CHECK-LABEL: @ipvideo_decode_block_opcode_0xD_16(		; CHECK-LABEL: @ipvideo_decode_block_opcode_0xD_16(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ undef, [[ENTRY:%.]] ], [ [[SHRINK_SHUFFLE:%.]], [[IF_END:%.]] ]		; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ poison, [[ENTRY:%.]] ], [ [[TMP4:%.]], [[IF_END:%.]] ]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: br label [[IF_END]]		; CHECK-NEXT: br label [[IF_END]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[ARRAYIDX11_1:%.]] = getelementptr inbounds i16, i16 undef, i32 1		; CHECK-NEXT: [[ARRAYIDX11_1:%.]] = getelementptr inbounds i16, i16 undef, i32 1
; CHECK-NEXT: [[ARRAYIDX11_2:%.]] = getelementptr inbounds i16, i16 undef, i32 2		; CHECK-NEXT: [[ARRAYIDX11_2:%.]] = getelementptr inbounds i16, i16 undef, i32 2
; CHECK-NEXT: [[ARRAYIDX11_3:%.]] = getelementptr inbounds i16, i16 undef, i32 3		; CHECK-NEXT: [[ARRAYIDX11_3:%.]] = getelementptr inbounds i16, i16 undef, i32 3
; CHECK-NEXT: [[ARRAYIDX11_4:%.]] = getelementptr inbounds i16, i16 undef, i32 4		; CHECK-NEXT: [[ARRAYIDX11_4:%.]] = getelementptr inbounds i16, i16 undef, i32 4
; CHECK-NEXT: [[ARRAYIDX11_5:%.]] = getelementptr inbounds i16, i16 undef, i32 5		; CHECK-NEXT: [[ARRAYIDX11_5:%.]] = getelementptr inbounds i16, i16 undef, i32 5
; CHECK-NEXT: [[ARRAYIDX11_6:%.]] = getelementptr inbounds i16, i16 undef, i32 6		; CHECK-NEXT: [[ARRAYIDX11_6:%.]] = getelementptr inbounds i16, i16 undef, i32 6
; CHECK-NEXT: [[ARRAYIDX11_7:%.]] = getelementptr inbounds i16, i16 undef, i32 7		; CHECK-NEXT: [[ARRAYIDX11_7:%.]] = getelementptr inbounds i16, i16 undef, i32 7
; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* undef, align 2		; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* undef, align 2
; CHECK-NEXT: [[SHRINK_SHUFFLE]] = shufflevector <8 x i16> [[SHUFFLE]], <8 x i16> poison, <2 x i32> <i32 0, i32 4>		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i16> [[SHUFFLE]], i32 0
		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i16> poison, i16 [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i16> [[SHUFFLE]], i32 4
		; CHECK-NEXT: [[TMP4]] = insertelement <2 x i16> [[TMP2]], i16 [[TMP3]], i32 1
; CHECK-NEXT: br label [[FOR_BODY]]		; CHECK-NEXT: br label [[FOR_BODY]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %if.end, %entry		for.body: ; preds = %if.end, %entry
%P.sroa.7.0 = phi i16 [ undef, %entry ], [ %P.sroa.7.0, %if.end ]		%P.sroa.7.0 = phi i16 [ undef, %entry ], [ %P.sroa.7.0, %if.end ]
%P.sroa.0.0 = phi i16 [ undef, %entry ], [ %P.sroa.0.0, %if.end ]		%P.sroa.0.0 = phi i16 [ undef, %entry ], [ %P.sroa.0.0, %if.end ]
Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/supernode.ll

	Show All 17 Lines
	; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8			; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8
	; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8
	; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*
	; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8			; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8
	; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0
	; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1			; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1
	; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP1]]			; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1			; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1
	; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]			; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]
	; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; ENABLED-NEXT: ret void			; ENABLED-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/used-reduced-op.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell < %s \| FileCheck %s

	@k = external dso_local constant [8 x [4 x i32]], align 16			@k = external dso_local constant [8 x [4 x i32]], align 16
	@l = external dso_local global [366 x i32], align 16			@l = external dso_local global [366 x i32], align 16

	; Function Attrs: nofree norecurse noreturn nounwind writeonly			; Function Attrs: nofree norecurse noreturn nounwind writeonly
	define void @n() local_unnamed_addr #0 {			define void @n() local_unnamed_addr #0 {
	; CHECK-LABEL: @n(			; CHECK-LABEL: @n(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 0, i64 0), align 16			; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x [4 x i32]]* @k to <8 x i32>*), align 1
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 0, i64 1) to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 1, i64 1), align 4
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 1, i64 1), align 4			; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 1, i64 2), align 8
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 1, i64 2), align 8			; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 1, i64 3), align 4
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 1, i64 3), align 4			; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 0), align 16
	; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 0), align 16			; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 1), align 4
	; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 1), align 4			; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 2), align 8
	; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 2), align 8			; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 3), align 4
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 2, i64 3), align 4			; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 0), align 16
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 0), align 16			; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 1), align 4
	; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 1), align 4			; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 2), align 8
	; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 2), align 8			; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 3), align 4
	; CHECK-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 3, i64 3), align 4			; CHECK-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 0), align 16
	; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 0), align 16			; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 1), align 4
	; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 1), align 4			; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 2), align 8
	; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 2), align 8			; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 3), align 4
	; CHECK-NEXT: [[TMP16:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 4, i64 3), align 4			; CHECK-NEXT: [[TMP16:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 0), align 16
	; CHECK-NEXT: [[TMP17:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 0), align 16			; CHECK-NEXT: [[TMP17:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 1), align 4
	; CHECK-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 1), align 4			; CHECK-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 2), align 8
	; CHECK-NEXT: [[TMP19:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 2), align 8			; CHECK-NEXT: [[TMP19:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 3), align 4
	; CHECK-NEXT: [[TMP20:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 5, i64 3), align 4			; CHECK-NEXT: [[TMP20:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 0), align 16
	; CHECK-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 0), align 16			; CHECK-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 1), align 4
	; CHECK-NEXT: [[TMP22:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 1), align 4			; CHECK-NEXT: [[TMP22:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 2), align 8
	; CHECK-NEXT: [[TMP23:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 2), align 8			; CHECK-NEXT: [[TMP23:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 3), align 4
	; CHECK-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 6, i64 3), align 4			; CHECK-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 0), align 16
	; CHECK-NEXT: [[TMP25:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 0), align 16			; CHECK-NEXT: [[TMP25:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 1), align 4
	; CHECK-NEXT: [[TMP26:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 1), align 4			; CHECK-NEXT: [[TMP26:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 2), align 8
	; CHECK-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 2), align 8			; CHECK-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 3), align 4
	; CHECK-NEXT: [[TMP28:%.]] = load i32, i32 getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 7, i64 3), align 4
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[B_0:%.]] = phi i32 [ [[SPEC_SELECT8_3_7:%.]], [[FOR_COND]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[B_0:%.]] = phi i32 [ [[SPEC_SELECT8_3_7:%.]], [[FOR_COND]] ], [ undef, [[ENTRY]] ]
	; CHECK-NEXT: [[TMP29:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP28:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP30:%.*]] = add i32 [[TMP29]], -183			; CHECK-NEXT: [[TMP29:%.*]] = add i32 [[TMP28]], -183
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 [[TMP30]], [[TMP0]]			; CHECK-NEXT: [[TMP30:%.*]] = insertelement <8 x i32> poison, i32 [[TMP29]], i32 0
	; CHECK-NEXT: [[TMP31:%.*]] = icmp slt i32 [[SUB]], 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP30]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[NEG:%.*]] = sub nsw i32 0, [[SUB]]			; CHECK-NEXT: [[TMP31:%.*]] = sub <8 x i32> [[SHUFFLE]], [[TMP0]]
	; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[NEG]], i32 [[SUB]]			; CHECK-NEXT: [[TMP32:%.*]] = icmp slt <8 x i32> [[TMP31]], <i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <4 x i32> poison, i32 [[TMP30]], i32 0			; CHECK-NEXT: [[TMP33:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 poison, i32 poison>, [[TMP31]]
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x i32> [[TMP33]], i32 [[TMP30]], i32 1			; CHECK-NEXT: [[TMP34:%.*]] = select <8 x i1> [[TMP32]], <8 x i32> [[TMP33]], <8 x i32> [[TMP31]]
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <4 x i32> [[TMP34]], i32 [[TMP30]], i32 2			; CHECK-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP34]], <8 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 10>
	; CHECK-NEXT: [[TMP36:%.*]] = insertelement <4 x i32> [[TMP35]], i32 [[TMP30]], i32 3			; CHECK-NEXT: [[TMP35:%.*]] = call i32 @llvm.vector.reduce.smin.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
	; CHECK-NEXT: [[TMP37:%.*]] = sub <4 x i32> [[TMP36]], [[TMP1]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP35]], [[B_0]]
	; CHECK-NEXT: [[TMP38:%.*]] = icmp slt <4 x i32> [[TMP37]], zeroinitializer			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP35]], i32 [[B_0]]
	; CHECK-NEXT: [[TMP39:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP37]]			; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP29]], [[TMP1]]
	; CHECK-NEXT: [[TMP40:%.*]] = select <4 x i1> [[TMP38]], <4 x i32> [[TMP39]], <4 x i32> [[TMP37]]			; CHECK-NEXT: [[TMP36:%.*]] = icmp slt i32 [[SUB_1_1]], 0
	; CHECK-NEXT: [[TMP41:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP40]])
	; CHECK-NEXT: [[TMP42:%.*]] = icmp slt i32 [[TMP41]], [[TMP32]]
	; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP42]], i32 [[TMP41]], i32 [[TMP32]]
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP43]], [[B_0]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP43]], i32 [[B_0]]
	; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP30]], [[TMP2]]
	; CHECK-NEXT: [[TMP44:%.*]] = icmp slt i32 [[SUB_1_1]], 0
	; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]			; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]
	; CHECK-NEXT: [[TMP45:%.*]] = select i1 [[TMP44]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]			; CHECK-NEXT: [[TMP37:%.*]] = select i1 [[TMP36]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]
	; CHECK-NEXT: [[CMP12_1_1:%.*]] = icmp slt i32 [[TMP45]], [[OP_EXTRA1]]			; CHECK-NEXT: [[CMP12_1_1:%.*]] = icmp slt i32 [[TMP37]], [[OP_EXTRA1]]
	; CHECK-NEXT: [[NARROW:%.*]] = or i1 [[CMP12_1_1]], [[OP_EXTRA]]			; CHECK-NEXT: [[NARROW:%.*]] = or i1 [[CMP12_1_1]], [[OP_EXTRA]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_1:%.*]] = select i1 [[CMP12_1_1]], i32 [[TMP45]], i32 [[OP_EXTRA1]]			; CHECK-NEXT: [[SPEC_SELECT8_1_1:%.*]] = select i1 [[CMP12_1_1]], i32 [[TMP37]], i32 [[OP_EXTRA1]]
	; CHECK-NEXT: [[SUB_2_1:%.*]] = sub i32 [[TMP30]], [[TMP3]]			; CHECK-NEXT: [[SUB_2_1:%.*]] = sub i32 [[TMP29]], [[TMP2]]
	; CHECK-NEXT: [[TMP46:%.*]] = icmp slt i32 [[SUB_2_1]], 0			; CHECK-NEXT: [[TMP38:%.*]] = icmp slt i32 [[SUB_2_1]], 0
	; CHECK-NEXT: [[NEG_2_1:%.*]] = sub nsw i32 0, [[SUB_2_1]]			; CHECK-NEXT: [[NEG_2_1:%.*]] = sub nsw i32 0, [[SUB_2_1]]
	; CHECK-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[NEG_2_1]], i32 [[SUB_2_1]]			; CHECK-NEXT: [[TMP39:%.*]] = select i1 [[TMP38]], i32 [[NEG_2_1]], i32 [[SUB_2_1]]
	; CHECK-NEXT: [[CMP12_2_1:%.*]] = icmp slt i32 [[TMP47]], [[SPEC_SELECT8_1_1]]			; CHECK-NEXT: [[CMP12_2_1:%.*]] = icmp slt i32 [[TMP39]], [[SPEC_SELECT8_1_1]]
	; CHECK-NEXT: [[NARROW34:%.*]] = or i1 [[CMP12_2_1]], [[NARROW]]			; CHECK-NEXT: [[NARROW34:%.*]] = or i1 [[CMP12_2_1]], [[NARROW]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_1:%.*]] = select i1 [[CMP12_2_1]], i32 [[TMP47]], i32 [[SPEC_SELECT8_1_1]]			; CHECK-NEXT: [[SPEC_SELECT8_2_1:%.*]] = select i1 [[CMP12_2_1]], i32 [[TMP39]], i32 [[SPEC_SELECT8_1_1]]
	; CHECK-NEXT: [[SUB_3_1:%.*]] = sub i32 [[TMP30]], [[TMP4]]			; CHECK-NEXT: [[SUB_3_1:%.*]] = sub i32 [[TMP29]], [[TMP3]]
	; CHECK-NEXT: [[TMP48:%.*]] = icmp slt i32 [[SUB_3_1]], 0			; CHECK-NEXT: [[TMP40:%.*]] = icmp slt i32 [[SUB_3_1]], 0
	; CHECK-NEXT: [[NEG_3_1:%.*]] = sub nsw i32 0, [[SUB_3_1]]			; CHECK-NEXT: [[NEG_3_1:%.*]] = sub nsw i32 0, [[SUB_3_1]]
	; CHECK-NEXT: [[TMP49:%.*]] = select i1 [[TMP48]], i32 [[NEG_3_1]], i32 [[SUB_3_1]]			; CHECK-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[NEG_3_1]], i32 [[SUB_3_1]]
	; CHECK-NEXT: [[CMP12_3_1:%.*]] = icmp slt i32 [[TMP49]], [[SPEC_SELECT8_2_1]]			; CHECK-NEXT: [[CMP12_3_1:%.*]] = icmp slt i32 [[TMP41]], [[SPEC_SELECT8_2_1]]
	; CHECK-NEXT: [[NARROW35:%.*]] = or i1 [[CMP12_3_1]], [[NARROW34]]			; CHECK-NEXT: [[NARROW35:%.*]] = or i1 [[CMP12_3_1]], [[NARROW34]]
	; CHECK-NEXT: [[SPEC_SELECT_3_1:%.*]] = zext i1 [[NARROW35]] to i32			; CHECK-NEXT: [[SPEC_SELECT_3_1:%.*]] = zext i1 [[NARROW35]] to i32
	; CHECK-NEXT: [[SPEC_SELECT8_3_1:%.*]] = select i1 [[CMP12_3_1]], i32 [[TMP49]], i32 [[SPEC_SELECT8_2_1]]			; CHECK-NEXT: [[SPEC_SELECT8_3_1:%.*]] = select i1 [[CMP12_3_1]], i32 [[TMP41]], i32 [[SPEC_SELECT8_2_1]]
	; CHECK-NEXT: [[SUB_222:%.*]] = sub i32 [[TMP30]], [[TMP5]]			; CHECK-NEXT: [[SUB_222:%.*]] = sub i32 [[TMP29]], [[TMP4]]
	; CHECK-NEXT: [[TMP50:%.*]] = icmp slt i32 [[SUB_222]], 0			; CHECK-NEXT: [[TMP42:%.*]] = icmp slt i32 [[SUB_222]], 0
	; CHECK-NEXT: [[NEG_223:%.*]] = sub nsw i32 0, [[SUB_222]]			; CHECK-NEXT: [[NEG_223:%.*]] = sub nsw i32 0, [[SUB_222]]
	; CHECK-NEXT: [[TMP51:%.*]] = select i1 [[TMP50]], i32 [[NEG_223]], i32 [[SUB_222]]			; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP42]], i32 [[NEG_223]], i32 [[SUB_222]]
	; CHECK-NEXT: [[CMP12_224:%.*]] = icmp slt i32 [[TMP51]], [[SPEC_SELECT8_3_1]]			; CHECK-NEXT: [[CMP12_224:%.*]] = icmp slt i32 [[TMP43]], [[SPEC_SELECT8_3_1]]
	; CHECK-NEXT: [[SPEC_SELECT8_226:%.*]] = select i1 [[CMP12_224]], i32 [[TMP51]], i32 [[SPEC_SELECT8_3_1]]			; CHECK-NEXT: [[SPEC_SELECT8_226:%.*]] = select i1 [[CMP12_224]], i32 [[TMP43]], i32 [[SPEC_SELECT8_3_1]]
	; CHECK-NEXT: [[SUB_1_2:%.*]] = sub i32 [[TMP30]], [[TMP6]]			; CHECK-NEXT: [[SUB_1_2:%.*]] = sub i32 [[TMP29]], [[TMP5]]
	; CHECK-NEXT: [[TMP52:%.*]] = icmp slt i32 [[SUB_1_2]], 0			; CHECK-NEXT: [[TMP44:%.*]] = icmp slt i32 [[SUB_1_2]], 0
	; CHECK-NEXT: [[NEG_1_2:%.*]] = sub nsw i32 0, [[SUB_1_2]]			; CHECK-NEXT: [[NEG_1_2:%.*]] = sub nsw i32 0, [[SUB_1_2]]
	; CHECK-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[NEG_1_2]], i32 [[SUB_1_2]]			; CHECK-NEXT: [[TMP45:%.*]] = select i1 [[TMP44]], i32 [[NEG_1_2]], i32 [[SUB_1_2]]
	; CHECK-NEXT: [[CMP12_1_2:%.*]] = icmp slt i32 [[TMP53]], [[SPEC_SELECT8_226]]			; CHECK-NEXT: [[CMP12_1_2:%.*]] = icmp slt i32 [[TMP45]], [[SPEC_SELECT8_226]]
	; CHECK-NEXT: [[TMP54:%.*]] = or i1 [[CMP12_1_2]], [[CMP12_224]]			; CHECK-NEXT: [[TMP46:%.*]] = or i1 [[CMP12_1_2]], [[CMP12_224]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_2:%.*]] = select i1 [[CMP12_1_2]], i32 [[TMP53]], i32 [[SPEC_SELECT8_226]]			; CHECK-NEXT: [[SPEC_SELECT8_1_2:%.*]] = select i1 [[CMP12_1_2]], i32 [[TMP45]], i32 [[SPEC_SELECT8_226]]
	; CHECK-NEXT: [[SUB_2_2:%.*]] = sub i32 [[TMP30]], [[TMP7]]			; CHECK-NEXT: [[SUB_2_2:%.*]] = sub i32 [[TMP29]], [[TMP6]]
	; CHECK-NEXT: [[TMP55:%.*]] = icmp slt i32 [[SUB_2_2]], 0			; CHECK-NEXT: [[TMP47:%.*]] = icmp slt i32 [[SUB_2_2]], 0
	; CHECK-NEXT: [[NEG_2_2:%.*]] = sub nsw i32 0, [[SUB_2_2]]			; CHECK-NEXT: [[NEG_2_2:%.*]] = sub nsw i32 0, [[SUB_2_2]]
	; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[NEG_2_2]], i32 [[SUB_2_2]]			; CHECK-NEXT: [[TMP48:%.*]] = select i1 [[TMP47]], i32 [[NEG_2_2]], i32 [[SUB_2_2]]
	; CHECK-NEXT: [[CMP12_2_2:%.*]] = icmp slt i32 [[TMP56]], [[SPEC_SELECT8_1_2]]			; CHECK-NEXT: [[CMP12_2_2:%.*]] = icmp slt i32 [[TMP48]], [[SPEC_SELECT8_1_2]]
	; CHECK-NEXT: [[TMP57:%.*]] = or i1 [[CMP12_2_2]], [[TMP54]]			; CHECK-NEXT: [[TMP49:%.*]] = or i1 [[CMP12_2_2]], [[TMP46]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_2:%.*]] = select i1 [[CMP12_2_2]], i32 [[TMP56]], i32 [[SPEC_SELECT8_1_2]]			; CHECK-NEXT: [[SPEC_SELECT8_2_2:%.*]] = select i1 [[CMP12_2_2]], i32 [[TMP48]], i32 [[SPEC_SELECT8_1_2]]
	; CHECK-NEXT: [[SUB_3_2:%.*]] = sub i32 [[TMP30]], [[TMP8]]			; CHECK-NEXT: [[SUB_3_2:%.*]] = sub i32 [[TMP29]], [[TMP7]]
	; CHECK-NEXT: [[TMP58:%.*]] = icmp slt i32 [[SUB_3_2]], 0			; CHECK-NEXT: [[TMP50:%.*]] = icmp slt i32 [[SUB_3_2]], 0
	; CHECK-NEXT: [[NEG_3_2:%.*]] = sub nsw i32 0, [[SUB_3_2]]			; CHECK-NEXT: [[NEG_3_2:%.*]] = sub nsw i32 0, [[SUB_3_2]]
	; CHECK-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[NEG_3_2]], i32 [[SUB_3_2]]			; CHECK-NEXT: [[TMP51:%.*]] = select i1 [[TMP50]], i32 [[NEG_3_2]], i32 [[SUB_3_2]]
	; CHECK-NEXT: [[CMP12_3_2:%.*]] = icmp slt i32 [[TMP59]], [[SPEC_SELECT8_2_2]]			; CHECK-NEXT: [[CMP12_3_2:%.*]] = icmp slt i32 [[TMP51]], [[SPEC_SELECT8_2_2]]
	; CHECK-NEXT: [[TMP60:%.*]] = or i1 [[CMP12_3_2]], [[TMP57]]			; CHECK-NEXT: [[TMP52:%.*]] = or i1 [[CMP12_3_2]], [[TMP49]]
	; CHECK-NEXT: [[SPEC_SELECT_3_2:%.*]] = select i1 [[TMP60]], i32 2, i32 [[SPEC_SELECT_3_1]]			; CHECK-NEXT: [[SPEC_SELECT_3_2:%.*]] = select i1 [[TMP52]], i32 2, i32 [[SPEC_SELECT_3_1]]
	; CHECK-NEXT: [[SPEC_SELECT8_3_2:%.*]] = select i1 [[CMP12_3_2]], i32 [[TMP59]], i32 [[SPEC_SELECT8_2_2]]			; CHECK-NEXT: [[SPEC_SELECT8_3_2:%.*]] = select i1 [[CMP12_3_2]], i32 [[TMP51]], i32 [[SPEC_SELECT8_2_2]]
	; CHECK-NEXT: [[SUB_328:%.*]] = sub i32 [[TMP30]], [[TMP9]]			; CHECK-NEXT: [[SUB_328:%.*]] = sub i32 [[TMP29]], [[TMP8]]
	; CHECK-NEXT: [[TMP61:%.*]] = icmp slt i32 [[SUB_328]], 0			; CHECK-NEXT: [[TMP53:%.*]] = icmp slt i32 [[SUB_328]], 0
	; CHECK-NEXT: [[NEG_329:%.*]] = sub nsw i32 0, [[SUB_328]]			; CHECK-NEXT: [[NEG_329:%.*]] = sub nsw i32 0, [[SUB_328]]
	; CHECK-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[NEG_329]], i32 [[SUB_328]]			; CHECK-NEXT: [[TMP54:%.*]] = select i1 [[TMP53]], i32 [[NEG_329]], i32 [[SUB_328]]
	; CHECK-NEXT: [[CMP12_330:%.*]] = icmp slt i32 [[TMP62]], [[SPEC_SELECT8_3_2]]			; CHECK-NEXT: [[CMP12_330:%.*]] = icmp slt i32 [[TMP54]], [[SPEC_SELECT8_3_2]]
	; CHECK-NEXT: [[SPEC_SELECT8_332:%.*]] = select i1 [[CMP12_330]], i32 [[TMP62]], i32 [[SPEC_SELECT8_3_2]]			; CHECK-NEXT: [[SPEC_SELECT8_332:%.*]] = select i1 [[CMP12_330]], i32 [[TMP54]], i32 [[SPEC_SELECT8_3_2]]
	; CHECK-NEXT: [[SUB_1_3:%.*]] = sub i32 [[TMP30]], [[TMP10]]			; CHECK-NEXT: [[SUB_1_3:%.*]] = sub i32 [[TMP29]], [[TMP9]]
	; CHECK-NEXT: [[TMP63:%.*]] = icmp slt i32 [[SUB_1_3]], 0			; CHECK-NEXT: [[TMP55:%.*]] = icmp slt i32 [[SUB_1_3]], 0
	; CHECK-NEXT: [[NEG_1_3:%.*]] = sub nsw i32 0, [[SUB_1_3]]			; CHECK-NEXT: [[NEG_1_3:%.*]] = sub nsw i32 0, [[SUB_1_3]]
	; CHECK-NEXT: [[TMP64:%.*]] = select i1 [[TMP63]], i32 [[NEG_1_3]], i32 [[SUB_1_3]]			; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[NEG_1_3]], i32 [[SUB_1_3]]
	; CHECK-NEXT: [[CMP12_1_3:%.*]] = icmp slt i32 [[TMP64]], [[SPEC_SELECT8_332]]			; CHECK-NEXT: [[CMP12_1_3:%.*]] = icmp slt i32 [[TMP56]], [[SPEC_SELECT8_332]]
	; CHECK-NEXT: [[TMP65:%.*]] = or i1 [[CMP12_1_3]], [[CMP12_330]]			; CHECK-NEXT: [[TMP57:%.*]] = or i1 [[CMP12_1_3]], [[CMP12_330]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_3:%.*]] = select i1 [[CMP12_1_3]], i32 [[TMP64]], i32 [[SPEC_SELECT8_332]]			; CHECK-NEXT: [[SPEC_SELECT8_1_3:%.*]] = select i1 [[CMP12_1_3]], i32 [[TMP56]], i32 [[SPEC_SELECT8_332]]
	; CHECK-NEXT: [[SUB_2_3:%.*]] = sub i32 [[TMP30]], [[TMP11]]			; CHECK-NEXT: [[SUB_2_3:%.*]] = sub i32 [[TMP29]], [[TMP10]]
	; CHECK-NEXT: [[TMP66:%.*]] = icmp slt i32 [[SUB_2_3]], 0			; CHECK-NEXT: [[TMP58:%.*]] = icmp slt i32 [[SUB_2_3]], 0
	; CHECK-NEXT: [[NEG_2_3:%.*]] = sub nsw i32 0, [[SUB_2_3]]			; CHECK-NEXT: [[NEG_2_3:%.*]] = sub nsw i32 0, [[SUB_2_3]]
	; CHECK-NEXT: [[TMP67:%.*]] = select i1 [[TMP66]], i32 [[NEG_2_3]], i32 [[SUB_2_3]]			; CHECK-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[NEG_2_3]], i32 [[SUB_2_3]]
	; CHECK-NEXT: [[CMP12_2_3:%.*]] = icmp slt i32 [[TMP67]], [[SPEC_SELECT8_1_3]]			; CHECK-NEXT: [[CMP12_2_3:%.*]] = icmp slt i32 [[TMP59]], [[SPEC_SELECT8_1_3]]
	; CHECK-NEXT: [[TMP68:%.*]] = or i1 [[CMP12_2_3]], [[TMP65]]			; CHECK-NEXT: [[TMP60:%.*]] = or i1 [[CMP12_2_3]], [[TMP57]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_3:%.*]] = select i1 [[CMP12_2_3]], i32 [[TMP67]], i32 [[SPEC_SELECT8_1_3]]			; CHECK-NEXT: [[SPEC_SELECT8_2_3:%.*]] = select i1 [[CMP12_2_3]], i32 [[TMP59]], i32 [[SPEC_SELECT8_1_3]]
	; CHECK-NEXT: [[SUB_3_3:%.*]] = sub i32 [[TMP30]], [[TMP12]]			; CHECK-NEXT: [[SUB_3_3:%.*]] = sub i32 [[TMP29]], [[TMP11]]
	; CHECK-NEXT: [[TMP69:%.*]] = icmp slt i32 [[SUB_3_3]], 0			; CHECK-NEXT: [[TMP61:%.*]] = icmp slt i32 [[SUB_3_3]], 0
	; CHECK-NEXT: [[NEG_3_3:%.*]] = sub nsw i32 0, [[SUB_3_3]]			; CHECK-NEXT: [[NEG_3_3:%.*]] = sub nsw i32 0, [[SUB_3_3]]
	; CHECK-NEXT: [[TMP70:%.*]] = select i1 [[TMP69]], i32 [[NEG_3_3]], i32 [[SUB_3_3]]			; CHECK-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[NEG_3_3]], i32 [[SUB_3_3]]
	; CHECK-NEXT: [[CMP12_3_3:%.*]] = icmp slt i32 [[TMP70]], [[SPEC_SELECT8_2_3]]			; CHECK-NEXT: [[CMP12_3_3:%.*]] = icmp slt i32 [[TMP62]], [[SPEC_SELECT8_2_3]]
	; CHECK-NEXT: [[TMP71:%.*]] = or i1 [[CMP12_3_3]], [[TMP68]]			; CHECK-NEXT: [[TMP63:%.*]] = or i1 [[CMP12_3_3]], [[TMP60]]
	; CHECK-NEXT: [[SPEC_SELECT_3_3:%.*]] = select i1 [[TMP71]], i32 3, i32 [[SPEC_SELECT_3_2]]			; CHECK-NEXT: [[SPEC_SELECT_3_3:%.*]] = select i1 [[TMP63]], i32 3, i32 [[SPEC_SELECT_3_2]]
	; CHECK-NEXT: [[SPEC_SELECT8_3_3:%.*]] = select i1 [[CMP12_3_3]], i32 [[TMP70]], i32 [[SPEC_SELECT8_2_3]]			; CHECK-NEXT: [[SPEC_SELECT8_3_3:%.*]] = select i1 [[CMP12_3_3]], i32 [[TMP62]], i32 [[SPEC_SELECT8_2_3]]
	; CHECK-NEXT: [[SUB_4:%.*]] = sub i32 [[TMP30]], [[TMP13]]			; CHECK-NEXT: [[SUB_4:%.*]] = sub i32 [[TMP29]], [[TMP12]]
	; CHECK-NEXT: [[TMP72:%.*]] = icmp slt i32 [[SUB_4]], 0			; CHECK-NEXT: [[TMP64:%.*]] = icmp slt i32 [[SUB_4]], 0
	; CHECK-NEXT: [[NEG_4:%.*]] = sub nsw i32 0, [[SUB_4]]			; CHECK-NEXT: [[NEG_4:%.*]] = sub nsw i32 0, [[SUB_4]]
	; CHECK-NEXT: [[TMP73:%.*]] = select i1 [[TMP72]], i32 [[NEG_4]], i32 [[SUB_4]]			; CHECK-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[NEG_4]], i32 [[SUB_4]]
	; CHECK-NEXT: [[CMP12_4:%.*]] = icmp slt i32 [[TMP73]], [[SPEC_SELECT8_3_3]]			; CHECK-NEXT: [[CMP12_4:%.*]] = icmp slt i32 [[TMP65]], [[SPEC_SELECT8_3_3]]
	; CHECK-NEXT: [[SPEC_SELECT8_4:%.*]] = select i1 [[CMP12_4]], i32 [[TMP73]], i32 [[SPEC_SELECT8_3_3]]			; CHECK-NEXT: [[SPEC_SELECT8_4:%.*]] = select i1 [[CMP12_4]], i32 [[TMP65]], i32 [[SPEC_SELECT8_3_3]]
	; CHECK-NEXT: [[SUB_1_4:%.*]] = sub i32 [[TMP30]], [[TMP14]]			; CHECK-NEXT: [[SUB_1_4:%.*]] = sub i32 [[TMP29]], [[TMP13]]
	; CHECK-NEXT: [[TMP74:%.*]] = icmp slt i32 [[SUB_1_4]], 0			; CHECK-NEXT: [[TMP66:%.*]] = icmp slt i32 [[SUB_1_4]], 0
	; CHECK-NEXT: [[NEG_1_4:%.*]] = sub nsw i32 0, [[SUB_1_4]]			; CHECK-NEXT: [[NEG_1_4:%.*]] = sub nsw i32 0, [[SUB_1_4]]
	; CHECK-NEXT: [[TMP75:%.*]] = select i1 [[TMP74]], i32 [[NEG_1_4]], i32 [[SUB_1_4]]			; CHECK-NEXT: [[TMP67:%.*]] = select i1 [[TMP66]], i32 [[NEG_1_4]], i32 [[SUB_1_4]]
	; CHECK-NEXT: [[CMP12_1_4:%.*]] = icmp slt i32 [[TMP75]], [[SPEC_SELECT8_4]]			; CHECK-NEXT: [[CMP12_1_4:%.*]] = icmp slt i32 [[TMP67]], [[SPEC_SELECT8_4]]
	; CHECK-NEXT: [[TMP76:%.*]] = or i1 [[CMP12_1_4]], [[CMP12_4]]			; CHECK-NEXT: [[TMP68:%.*]] = or i1 [[CMP12_1_4]], [[CMP12_4]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_4:%.*]] = select i1 [[CMP12_1_4]], i32 [[TMP75]], i32 [[SPEC_SELECT8_4]]			; CHECK-NEXT: [[SPEC_SELECT8_1_4:%.*]] = select i1 [[CMP12_1_4]], i32 [[TMP67]], i32 [[SPEC_SELECT8_4]]
	; CHECK-NEXT: [[SUB_2_4:%.*]] = sub i32 [[TMP30]], [[TMP15]]			; CHECK-NEXT: [[SUB_2_4:%.*]] = sub i32 [[TMP29]], [[TMP14]]
	; CHECK-NEXT: [[TMP77:%.*]] = icmp slt i32 [[SUB_2_4]], 0			; CHECK-NEXT: [[TMP69:%.*]] = icmp slt i32 [[SUB_2_4]], 0
	; CHECK-NEXT: [[NEG_2_4:%.*]] = sub nsw i32 0, [[SUB_2_4]]			; CHECK-NEXT: [[NEG_2_4:%.*]] = sub nsw i32 0, [[SUB_2_4]]
	; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP77]], i32 [[NEG_2_4]], i32 [[SUB_2_4]]			; CHECK-NEXT: [[TMP70:%.*]] = select i1 [[TMP69]], i32 [[NEG_2_4]], i32 [[SUB_2_4]]
	; CHECK-NEXT: [[CMP12_2_4:%.*]] = icmp slt i32 [[TMP78]], [[SPEC_SELECT8_1_4]]			; CHECK-NEXT: [[CMP12_2_4:%.*]] = icmp slt i32 [[TMP70]], [[SPEC_SELECT8_1_4]]
	; CHECK-NEXT: [[TMP79:%.*]] = or i1 [[CMP12_2_4]], [[TMP76]]			; CHECK-NEXT: [[TMP71:%.*]] = or i1 [[CMP12_2_4]], [[TMP68]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_4:%.*]] = select i1 [[CMP12_2_4]], i32 [[TMP78]], i32 [[SPEC_SELECT8_1_4]]			; CHECK-NEXT: [[SPEC_SELECT8_2_4:%.*]] = select i1 [[CMP12_2_4]], i32 [[TMP70]], i32 [[SPEC_SELECT8_1_4]]
	; CHECK-NEXT: [[SUB_3_4:%.*]] = sub i32 [[TMP30]], [[TMP16]]			; CHECK-NEXT: [[SUB_3_4:%.*]] = sub i32 [[TMP29]], [[TMP15]]
	; CHECK-NEXT: [[TMP80:%.*]] = icmp slt i32 [[SUB_3_4]], 0			; CHECK-NEXT: [[TMP72:%.*]] = icmp slt i32 [[SUB_3_4]], 0
	; CHECK-NEXT: [[NEG_3_4:%.*]] = sub nsw i32 0, [[SUB_3_4]]			; CHECK-NEXT: [[NEG_3_4:%.*]] = sub nsw i32 0, [[SUB_3_4]]
	; CHECK-NEXT: [[TMP81:%.*]] = select i1 [[TMP80]], i32 [[NEG_3_4]], i32 [[SUB_3_4]]			; CHECK-NEXT: [[TMP73:%.*]] = select i1 [[TMP72]], i32 [[NEG_3_4]], i32 [[SUB_3_4]]
	; CHECK-NEXT: [[CMP12_3_4:%.*]] = icmp slt i32 [[TMP81]], [[SPEC_SELECT8_2_4]]			; CHECK-NEXT: [[CMP12_3_4:%.*]] = icmp slt i32 [[TMP73]], [[SPEC_SELECT8_2_4]]
	; CHECK-NEXT: [[TMP82:%.*]] = or i1 [[CMP12_3_4]], [[TMP79]]			; CHECK-NEXT: [[TMP74:%.*]] = or i1 [[CMP12_3_4]], [[TMP71]]
	; CHECK-NEXT: [[SPEC_SELECT_3_4:%.*]] = select i1 [[TMP82]], i32 4, i32 [[SPEC_SELECT_3_3]]			; CHECK-NEXT: [[SPEC_SELECT_3_4:%.*]] = select i1 [[TMP74]], i32 4, i32 [[SPEC_SELECT_3_3]]
	; CHECK-NEXT: [[SPEC_SELECT8_3_4:%.*]] = select i1 [[CMP12_3_4]], i32 [[TMP81]], i32 [[SPEC_SELECT8_2_4]]			; CHECK-NEXT: [[SPEC_SELECT8_3_4:%.*]] = select i1 [[CMP12_3_4]], i32 [[TMP73]], i32 [[SPEC_SELECT8_2_4]]
	; CHECK-NEXT: [[SUB_5:%.*]] = sub i32 [[TMP30]], [[TMP17]]			; CHECK-NEXT: [[SUB_5:%.*]] = sub i32 [[TMP29]], [[TMP16]]
	; CHECK-NEXT: [[TMP83:%.*]] = icmp slt i32 [[SUB_5]], 0			; CHECK-NEXT: [[TMP75:%.*]] = icmp slt i32 [[SUB_5]], 0
	; CHECK-NEXT: [[NEG_5:%.*]] = sub nsw i32 0, [[SUB_5]]			; CHECK-NEXT: [[NEG_5:%.*]] = sub nsw i32 0, [[SUB_5]]
	; CHECK-NEXT: [[TMP84:%.*]] = select i1 [[TMP83]], i32 [[NEG_5]], i32 [[SUB_5]]			; CHECK-NEXT: [[TMP76:%.*]] = select i1 [[TMP75]], i32 [[NEG_5]], i32 [[SUB_5]]
	; CHECK-NEXT: [[CMP12_5:%.*]] = icmp slt i32 [[TMP84]], [[SPEC_SELECT8_3_4]]			; CHECK-NEXT: [[CMP12_5:%.*]] = icmp slt i32 [[TMP76]], [[SPEC_SELECT8_3_4]]
	; CHECK-NEXT: [[SPEC_SELECT8_5:%.*]] = select i1 [[CMP12_5]], i32 [[TMP84]], i32 [[SPEC_SELECT8_3_4]]			; CHECK-NEXT: [[SPEC_SELECT8_5:%.*]] = select i1 [[CMP12_5]], i32 [[TMP76]], i32 [[SPEC_SELECT8_3_4]]
	; CHECK-NEXT: [[SUB_1_5:%.*]] = sub i32 [[TMP30]], [[TMP18]]			; CHECK-NEXT: [[SUB_1_5:%.*]] = sub i32 [[TMP29]], [[TMP17]]
	; CHECK-NEXT: [[TMP85:%.*]] = icmp slt i32 [[SUB_1_5]], 0			; CHECK-NEXT: [[TMP77:%.*]] = icmp slt i32 [[SUB_1_5]], 0
	; CHECK-NEXT: [[NEG_1_5:%.*]] = sub nsw i32 0, [[SUB_1_5]]			; CHECK-NEXT: [[NEG_1_5:%.*]] = sub nsw i32 0, [[SUB_1_5]]
	; CHECK-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[NEG_1_5]], i32 [[SUB_1_5]]			; CHECK-NEXT: [[TMP78:%.*]] = select i1 [[TMP77]], i32 [[NEG_1_5]], i32 [[SUB_1_5]]
	; CHECK-NEXT: [[CMP12_1_5:%.*]] = icmp slt i32 [[TMP86]], [[SPEC_SELECT8_5]]			; CHECK-NEXT: [[CMP12_1_5:%.*]] = icmp slt i32 [[TMP78]], [[SPEC_SELECT8_5]]
	; CHECK-NEXT: [[TMP87:%.*]] = or i1 [[CMP12_1_5]], [[CMP12_5]]			; CHECK-NEXT: [[TMP79:%.*]] = or i1 [[CMP12_1_5]], [[CMP12_5]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_5:%.*]] = select i1 [[CMP12_1_5]], i32 [[TMP86]], i32 [[SPEC_SELECT8_5]]			; CHECK-NEXT: [[SPEC_SELECT8_1_5:%.*]] = select i1 [[CMP12_1_5]], i32 [[TMP78]], i32 [[SPEC_SELECT8_5]]
	; CHECK-NEXT: [[SUB_2_5:%.*]] = sub i32 [[TMP30]], [[TMP19]]			; CHECK-NEXT: [[SUB_2_5:%.*]] = sub i32 [[TMP29]], [[TMP18]]
	; CHECK-NEXT: [[TMP88:%.*]] = icmp slt i32 [[SUB_2_5]], 0			; CHECK-NEXT: [[TMP80:%.*]] = icmp slt i32 [[SUB_2_5]], 0
	; CHECK-NEXT: [[NEG_2_5:%.*]] = sub nsw i32 0, [[SUB_2_5]]			; CHECK-NEXT: [[NEG_2_5:%.*]] = sub nsw i32 0, [[SUB_2_5]]
	; CHECK-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[NEG_2_5]], i32 [[SUB_2_5]]			; CHECK-NEXT: [[TMP81:%.*]] = select i1 [[TMP80]], i32 [[NEG_2_5]], i32 [[SUB_2_5]]
	; CHECK-NEXT: [[CMP12_2_5:%.*]] = icmp slt i32 [[TMP89]], [[SPEC_SELECT8_1_5]]			; CHECK-NEXT: [[CMP12_2_5:%.*]] = icmp slt i32 [[TMP81]], [[SPEC_SELECT8_1_5]]
	; CHECK-NEXT: [[TMP90:%.*]] = or i1 [[CMP12_2_5]], [[TMP87]]			; CHECK-NEXT: [[TMP82:%.*]] = or i1 [[CMP12_2_5]], [[TMP79]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_5:%.*]] = select i1 [[CMP12_2_5]], i32 [[TMP89]], i32 [[SPEC_SELECT8_1_5]]			; CHECK-NEXT: [[SPEC_SELECT8_2_5:%.*]] = select i1 [[CMP12_2_5]], i32 [[TMP81]], i32 [[SPEC_SELECT8_1_5]]
	; CHECK-NEXT: [[SUB_3_5:%.*]] = sub i32 [[TMP30]], [[TMP20]]			; CHECK-NEXT: [[SUB_3_5:%.*]] = sub i32 [[TMP29]], [[TMP19]]
	; CHECK-NEXT: [[TMP91:%.*]] = icmp slt i32 [[SUB_3_5]], 0			; CHECK-NEXT: [[TMP83:%.*]] = icmp slt i32 [[SUB_3_5]], 0
	; CHECK-NEXT: [[NEG_3_5:%.*]] = sub nsw i32 0, [[SUB_3_5]]			; CHECK-NEXT: [[NEG_3_5:%.*]] = sub nsw i32 0, [[SUB_3_5]]
	; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[NEG_3_5]], i32 [[SUB_3_5]]			; CHECK-NEXT: [[TMP84:%.*]] = select i1 [[TMP83]], i32 [[NEG_3_5]], i32 [[SUB_3_5]]
	; CHECK-NEXT: [[CMP12_3_5:%.*]] = icmp slt i32 [[TMP92]], [[SPEC_SELECT8_2_5]]			; CHECK-NEXT: [[CMP12_3_5:%.*]] = icmp slt i32 [[TMP84]], [[SPEC_SELECT8_2_5]]
	; CHECK-NEXT: [[TMP93:%.*]] = or i1 [[CMP12_3_5]], [[TMP90]]			; CHECK-NEXT: [[TMP85:%.*]] = or i1 [[CMP12_3_5]], [[TMP82]]
	; CHECK-NEXT: [[SPEC_SELECT_3_5:%.*]] = select i1 [[TMP93]], i32 5, i32 [[SPEC_SELECT_3_4]]			; CHECK-NEXT: [[SPEC_SELECT_3_5:%.*]] = select i1 [[TMP85]], i32 5, i32 [[SPEC_SELECT_3_4]]
	; CHECK-NEXT: [[SPEC_SELECT8_3_5:%.*]] = select i1 [[CMP12_3_5]], i32 [[TMP92]], i32 [[SPEC_SELECT8_2_5]]			; CHECK-NEXT: [[SPEC_SELECT8_3_5:%.*]] = select i1 [[CMP12_3_5]], i32 [[TMP84]], i32 [[SPEC_SELECT8_2_5]]
	; CHECK-NEXT: [[SUB_6:%.*]] = sub i32 [[TMP30]], [[TMP21]]			; CHECK-NEXT: [[SUB_6:%.*]] = sub i32 [[TMP29]], [[TMP20]]
	; CHECK-NEXT: [[TMP94:%.*]] = icmp slt i32 [[SUB_6]], 0			; CHECK-NEXT: [[TMP86:%.*]] = icmp slt i32 [[SUB_6]], 0
	; CHECK-NEXT: [[NEG_6:%.*]] = sub nsw i32 0, [[SUB_6]]			; CHECK-NEXT: [[NEG_6:%.*]] = sub nsw i32 0, [[SUB_6]]
	; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[NEG_6]], i32 [[SUB_6]]			; CHECK-NEXT: [[TMP87:%.*]] = select i1 [[TMP86]], i32 [[NEG_6]], i32 [[SUB_6]]
	; CHECK-NEXT: [[CMP12_6:%.*]] = icmp slt i32 [[TMP95]], [[SPEC_SELECT8_3_5]]			; CHECK-NEXT: [[CMP12_6:%.*]] = icmp slt i32 [[TMP87]], [[SPEC_SELECT8_3_5]]
	; CHECK-NEXT: [[SPEC_SELECT8_6:%.*]] = select i1 [[CMP12_6]], i32 [[TMP95]], i32 [[SPEC_SELECT8_3_5]]			; CHECK-NEXT: [[SPEC_SELECT8_6:%.*]] = select i1 [[CMP12_6]], i32 [[TMP87]], i32 [[SPEC_SELECT8_3_5]]
	; CHECK-NEXT: [[SUB_1_6:%.*]] = sub i32 [[TMP30]], [[TMP22]]			; CHECK-NEXT: [[SUB_1_6:%.*]] = sub i32 [[TMP29]], [[TMP21]]
	; CHECK-NEXT: [[TMP96:%.*]] = icmp slt i32 [[SUB_1_6]], 0			; CHECK-NEXT: [[TMP88:%.*]] = icmp slt i32 [[SUB_1_6]], 0
	; CHECK-NEXT: [[NEG_1_6:%.*]] = sub nsw i32 0, [[SUB_1_6]]			; CHECK-NEXT: [[NEG_1_6:%.*]] = sub nsw i32 0, [[SUB_1_6]]
	; CHECK-NEXT: [[TMP97:%.*]] = select i1 [[TMP96]], i32 [[NEG_1_6]], i32 [[SUB_1_6]]			; CHECK-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[NEG_1_6]], i32 [[SUB_1_6]]
	; CHECK-NEXT: [[CMP12_1_6:%.*]] = icmp slt i32 [[TMP97]], [[SPEC_SELECT8_6]]			; CHECK-NEXT: [[CMP12_1_6:%.*]] = icmp slt i32 [[TMP89]], [[SPEC_SELECT8_6]]
	; CHECK-NEXT: [[TMP98:%.*]] = or i1 [[CMP12_1_6]], [[CMP12_6]]			; CHECK-NEXT: [[TMP90:%.*]] = or i1 [[CMP12_1_6]], [[CMP12_6]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_6:%.*]] = select i1 [[CMP12_1_6]], i32 [[TMP97]], i32 [[SPEC_SELECT8_6]]			; CHECK-NEXT: [[SPEC_SELECT8_1_6:%.*]] = select i1 [[CMP12_1_6]], i32 [[TMP89]], i32 [[SPEC_SELECT8_6]]
	; CHECK-NEXT: [[SUB_2_6:%.*]] = sub i32 [[TMP30]], [[TMP23]]			; CHECK-NEXT: [[SUB_2_6:%.*]] = sub i32 [[TMP29]], [[TMP22]]
	; CHECK-NEXT: [[TMP99:%.*]] = icmp slt i32 [[SUB_2_6]], 0			; CHECK-NEXT: [[TMP91:%.*]] = icmp slt i32 [[SUB_2_6]], 0
	; CHECK-NEXT: [[NEG_2_6:%.*]] = sub nsw i32 0, [[SUB_2_6]]			; CHECK-NEXT: [[NEG_2_6:%.*]] = sub nsw i32 0, [[SUB_2_6]]
	; CHECK-NEXT: [[TMP100:%.*]] = select i1 [[TMP99]], i32 [[NEG_2_6]], i32 [[SUB_2_6]]			; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[NEG_2_6]], i32 [[SUB_2_6]]
	; CHECK-NEXT: [[CMP12_2_6:%.*]] = icmp slt i32 [[TMP100]], [[SPEC_SELECT8_1_6]]			; CHECK-NEXT: [[CMP12_2_6:%.*]] = icmp slt i32 [[TMP92]], [[SPEC_SELECT8_1_6]]
	; CHECK-NEXT: [[TMP101:%.*]] = or i1 [[CMP12_2_6]], [[TMP98]]			; CHECK-NEXT: [[TMP93:%.*]] = or i1 [[CMP12_2_6]], [[TMP90]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_6:%.*]] = select i1 [[CMP12_2_6]], i32 [[TMP100]], i32 [[SPEC_SELECT8_1_6]]			; CHECK-NEXT: [[SPEC_SELECT8_2_6:%.*]] = select i1 [[CMP12_2_6]], i32 [[TMP92]], i32 [[SPEC_SELECT8_1_6]]
	; CHECK-NEXT: [[SUB_3_6:%.*]] = sub i32 [[TMP30]], [[TMP24]]			; CHECK-NEXT: [[SUB_3_6:%.*]] = sub i32 [[TMP29]], [[TMP23]]
	; CHECK-NEXT: [[TMP102:%.*]] = icmp slt i32 [[SUB_3_6]], 0			; CHECK-NEXT: [[TMP94:%.*]] = icmp slt i32 [[SUB_3_6]], 0
	; CHECK-NEXT: [[NEG_3_6:%.*]] = sub nsw i32 0, [[SUB_3_6]]			; CHECK-NEXT: [[NEG_3_6:%.*]] = sub nsw i32 0, [[SUB_3_6]]
	; CHECK-NEXT: [[TMP103:%.*]] = select i1 [[TMP102]], i32 [[NEG_3_6]], i32 [[SUB_3_6]]			; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[NEG_3_6]], i32 [[SUB_3_6]]
	; CHECK-NEXT: [[CMP12_3_6:%.*]] = icmp slt i32 [[TMP103]], [[SPEC_SELECT8_2_6]]			; CHECK-NEXT: [[CMP12_3_6:%.*]] = icmp slt i32 [[TMP95]], [[SPEC_SELECT8_2_6]]
	; CHECK-NEXT: [[TMP104:%.*]] = or i1 [[CMP12_3_6]], [[TMP101]]			; CHECK-NEXT: [[TMP96:%.*]] = or i1 [[CMP12_3_6]], [[TMP93]]
	; CHECK-NEXT: [[SPEC_SELECT_3_6:%.*]] = select i1 [[TMP104]], i32 6, i32 [[SPEC_SELECT_3_5]]			; CHECK-NEXT: [[SPEC_SELECT_3_6:%.*]] = select i1 [[TMP96]], i32 6, i32 [[SPEC_SELECT_3_5]]
	; CHECK-NEXT: [[SPEC_SELECT8_3_6:%.*]] = select i1 [[CMP12_3_6]], i32 [[TMP103]], i32 [[SPEC_SELECT8_2_6]]			; CHECK-NEXT: [[SPEC_SELECT8_3_6:%.*]] = select i1 [[CMP12_3_6]], i32 [[TMP95]], i32 [[SPEC_SELECT8_2_6]]
	; CHECK-NEXT: [[SUB_7:%.*]] = sub i32 [[TMP30]], [[TMP25]]			; CHECK-NEXT: [[SUB_7:%.*]] = sub i32 [[TMP29]], [[TMP24]]
	; CHECK-NEXT: [[TMP105:%.*]] = icmp slt i32 [[SUB_7]], 0			; CHECK-NEXT: [[TMP97:%.*]] = icmp slt i32 [[SUB_7]], 0
	; CHECK-NEXT: [[NEG_7:%.*]] = sub nsw i32 0, [[SUB_7]]			; CHECK-NEXT: [[NEG_7:%.*]] = sub nsw i32 0, [[SUB_7]]
	; CHECK-NEXT: [[TMP106:%.*]] = select i1 [[TMP105]], i32 [[NEG_7]], i32 [[SUB_7]]			; CHECK-NEXT: [[TMP98:%.*]] = select i1 [[TMP97]], i32 [[NEG_7]], i32 [[SUB_7]]
	; CHECK-NEXT: [[CMP12_7:%.*]] = icmp slt i32 [[TMP106]], [[SPEC_SELECT8_3_6]]			; CHECK-NEXT: [[CMP12_7:%.*]] = icmp slt i32 [[TMP98]], [[SPEC_SELECT8_3_6]]
	; CHECK-NEXT: [[SPEC_SELECT8_7:%.*]] = select i1 [[CMP12_7]], i32 [[TMP106]], i32 [[SPEC_SELECT8_3_6]]			; CHECK-NEXT: [[SPEC_SELECT8_7:%.*]] = select i1 [[CMP12_7]], i32 [[TMP98]], i32 [[SPEC_SELECT8_3_6]]
	; CHECK-NEXT: [[SUB_1_7:%.*]] = sub i32 [[TMP30]], [[TMP26]]			; CHECK-NEXT: [[SUB_1_7:%.*]] = sub i32 [[TMP29]], [[TMP25]]
	; CHECK-NEXT: [[TMP107:%.*]] = icmp slt i32 [[SUB_1_7]], 0			; CHECK-NEXT: [[TMP99:%.*]] = icmp slt i32 [[SUB_1_7]], 0
	; CHECK-NEXT: [[NEG_1_7:%.*]] = sub nsw i32 0, [[SUB_1_7]]			; CHECK-NEXT: [[NEG_1_7:%.*]] = sub nsw i32 0, [[SUB_1_7]]
	; CHECK-NEXT: [[TMP108:%.*]] = select i1 [[TMP107]], i32 [[NEG_1_7]], i32 [[SUB_1_7]]			; CHECK-NEXT: [[TMP100:%.*]] = select i1 [[TMP99]], i32 [[NEG_1_7]], i32 [[SUB_1_7]]
	; CHECK-NEXT: [[CMP12_1_7:%.*]] = icmp slt i32 [[TMP108]], [[SPEC_SELECT8_7]]			; CHECK-NEXT: [[CMP12_1_7:%.*]] = icmp slt i32 [[TMP100]], [[SPEC_SELECT8_7]]
	; CHECK-NEXT: [[TMP109:%.*]] = or i1 [[CMP12_1_7]], [[CMP12_7]]			; CHECK-NEXT: [[TMP101:%.*]] = or i1 [[CMP12_1_7]], [[CMP12_7]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_7:%.*]] = select i1 [[CMP12_1_7]], i32 [[TMP108]], i32 [[SPEC_SELECT8_7]]			; CHECK-NEXT: [[SPEC_SELECT8_1_7:%.*]] = select i1 [[CMP12_1_7]], i32 [[TMP100]], i32 [[SPEC_SELECT8_7]]
	; CHECK-NEXT: [[SUB_2_7:%.*]] = sub i32 [[TMP30]], [[TMP27]]			; CHECK-NEXT: [[SUB_2_7:%.*]] = sub i32 [[TMP29]], [[TMP26]]
	; CHECK-NEXT: [[TMP110:%.*]] = icmp slt i32 [[SUB_2_7]], 0			; CHECK-NEXT: [[TMP102:%.*]] = icmp slt i32 [[SUB_2_7]], 0
	; CHECK-NEXT: [[NEG_2_7:%.*]] = sub nsw i32 0, [[SUB_2_7]]			; CHECK-NEXT: [[NEG_2_7:%.*]] = sub nsw i32 0, [[SUB_2_7]]
	; CHECK-NEXT: [[TMP111:%.*]] = select i1 [[TMP110]], i32 [[NEG_2_7]], i32 [[SUB_2_7]]			; CHECK-NEXT: [[TMP103:%.*]] = select i1 [[TMP102]], i32 [[NEG_2_7]], i32 [[SUB_2_7]]
	; CHECK-NEXT: [[CMP12_2_7:%.*]] = icmp slt i32 [[TMP111]], [[SPEC_SELECT8_1_7]]			; CHECK-NEXT: [[CMP12_2_7:%.*]] = icmp slt i32 [[TMP103]], [[SPEC_SELECT8_1_7]]
	; CHECK-NEXT: [[TMP112:%.*]] = or i1 [[CMP12_2_7]], [[TMP109]]			; CHECK-NEXT: [[TMP104:%.*]] = or i1 [[CMP12_2_7]], [[TMP101]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_7:%.*]] = select i1 [[CMP12_2_7]], i32 [[TMP111]], i32 [[SPEC_SELECT8_1_7]]			; CHECK-NEXT: [[SPEC_SELECT8_2_7:%.*]] = select i1 [[CMP12_2_7]], i32 [[TMP103]], i32 [[SPEC_SELECT8_1_7]]
	; CHECK-NEXT: [[SUB_3_7:%.*]] = sub i32 [[TMP30]], [[TMP28]]			; CHECK-NEXT: [[SUB_3_7:%.*]] = sub i32 [[TMP29]], [[TMP27]]
	; CHECK-NEXT: [[TMP113:%.*]] = icmp slt i32 [[SUB_3_7]], 0			; CHECK-NEXT: [[TMP105:%.*]] = icmp slt i32 [[SUB_3_7]], 0
	; CHECK-NEXT: [[NEG_3_7:%.*]] = sub nsw i32 0, [[SUB_3_7]]			; CHECK-NEXT: [[NEG_3_7:%.*]] = sub nsw i32 0, [[SUB_3_7]]
	; CHECK-NEXT: [[TMP114:%.*]] = select i1 [[TMP113]], i32 [[NEG_3_7]], i32 [[SUB_3_7]]			; CHECK-NEXT: [[TMP106:%.*]] = select i1 [[TMP105]], i32 [[NEG_3_7]], i32 [[SUB_3_7]]
	; CHECK-NEXT: [[CMP12_3_7:%.*]] = icmp slt i32 [[TMP114]], [[SPEC_SELECT8_2_7]]			; CHECK-NEXT: [[CMP12_3_7:%.*]] = icmp slt i32 [[TMP106]], [[SPEC_SELECT8_2_7]]
	; CHECK-NEXT: [[TMP115:%.*]] = or i1 [[CMP12_3_7]], [[TMP112]]			; CHECK-NEXT: [[TMP107:%.*]] = or i1 [[CMP12_3_7]], [[TMP104]]
	; CHECK-NEXT: [[SPEC_SELECT_3_7:%.*]] = select i1 [[TMP115]], i32 7, i32 [[SPEC_SELECT_3_6]]			; CHECK-NEXT: [[SPEC_SELECT_3_7:%.*]] = select i1 [[TMP107]], i32 7, i32 [[SPEC_SELECT_3_6]]
	; CHECK-NEXT: [[SPEC_SELECT8_3_7]] = select i1 [[CMP12_3_7]], i32 [[TMP114]], i32 [[SPEC_SELECT8_2_7]]			; CHECK-NEXT: [[SPEC_SELECT8_3_7]] = select i1 [[CMP12_3_7]], i32 [[TMP106]], i32 [[SPEC_SELECT8_2_7]]
	; CHECK-NEXT: [[K:%.]] = getelementptr inbounds [366 x i32], [366 x i32] @l, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[K:%.]] = getelementptr inbounds [366 x i32], [366 x i32] @l, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[SPEC_SELECT_3_7]], i32* [[K]], align 4			; CHECK-NEXT: store i32 [[SPEC_SELECT_3_7]], i32* [[K]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: br label [[FOR_COND]]			; CHECK-NEXT: br label [[FOR_COND]]
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 0, i64 0), align 16			%0 = load i32, i32* getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 0, i64 0), align 16
	%1 = load i32, i32* getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 0, i64 1), align 4			%1 = load i32, i32* getelementptr inbounds ([8 x [4 x i32]], [8 x [4 x i32]]* @k, i64 0, i64 0, i64 1), align 4
	▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/value-bug-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; We used to crash on this example because we were building a constant			; We used to crash on this example because we were building a constant
	; expression during vectorization and the vectorizer expects instructions			; expression during vectorization and the vectorizer expects instructions
	; as elements of the vectorized tree.			; as elements of the vectorized tree.
	; PR19621			; PR19621

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb279:			; CHECK-NEXT: bb279:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> poison, float undef, i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> poison, float poison, i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> [[TMP0]], float undef, i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> [[TMP0]], float poison, i32 1
	; CHECK-NEXT: br label [[BB283:%.*]]			; CHECK-NEXT: br label [[BB283:%.*]]
	; CHECK: bb283:			; CHECK: bb283:
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP13:%.]], [[EXIT:%.]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ poison, [[BB279:%.]] ], [ [[TMP13:%.]], [[EXIT:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ [[TMP1]], [[EXIT]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x float> [ poison, [[BB279]] ], [ [[TMP1]], [[EXIT]] ]
	; CHECK-NEXT: br label [[BB284:%.*]]			; CHECK-NEXT: br label [[BB284:%.*]]
	; CHECK: bb284:			; CHECK: bb284:
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP4]], undef			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP4]], poison
	; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP5]], undef			; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP5]], poison
	; CHECK-NEXT: br label [[BB21_I:%.*]]			; CHECK-NEXT: br label [[BB21_I:%.*]]
	; CHECK: bb21.i:			; CHECK: bb21.i:
	; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]			; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]
	; CHECK: bb22.i:			; CHECK: bb22.i:
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> undef, [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> poison, [[TMP6]]
	; CHECK-NEXT: br label [[BB32_I:%.*]]			; CHECK-NEXT: br label [[BB32_I:%.*]]
	; CHECK: bb32.i:			; CHECK: bb32.i:
	; CHECK-NEXT: [[TMP8:%.*]] = phi <2 x double> [ [[TMP7]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]			; CHECK-NEXT: [[TMP8:%.*]] = phi <2 x double> [ [[TMP7]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]
	; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]			; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP9:%.*]] = fpext <2 x float> [[TMP3]] to <2 x double>			; CHECK-NEXT: [[TMP9:%.*]] = fpext <2 x float> [[TMP3]] to <2 x double>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], <double undef, double 0.000000e+00>			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], <double poison, double 0.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> undef, [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> poison, [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP11]], undef			; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP11]], poison
	; CHECK-NEXT: [[TMP13]] = fptrunc <2 x double> [[TMP12]] to <2 x float>			; CHECK-NEXT: [[TMP13]] = fptrunc <2 x double> [[TMP12]] to <2 x float>
	; CHECK-NEXT: br label [[BB283]]			; CHECK-NEXT: br label [[BB283]]
	;			;
	bb279:			bb279:
	br label %bb283			br label %bb283

	bb283:			bb283:
	%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]			%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/value-bug.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; We used to crash on this example because we were building a constant			; We used to crash on this example because we were building a constant
	; expression during vectorization and the vectorizer expects instructions			; expression during vectorization and the vectorizer expects instructions
	; as elements of the vectorized tree.			; as elements of the vectorized tree.
	; PR19621			; PR19621

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb279:			; CHECK-NEXT: bb279:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> poison, float undef, i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> poison, float poison, i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> [[TMP0]], float undef, i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> [[TMP0]], float poison, i32 1
	; CHECK-NEXT: br label [[BB283:%.*]]			; CHECK-NEXT: br label [[BB283:%.*]]
	; CHECK: bb283:			; CHECK: bb283:
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP13:%.]], [[EXIT:%.]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ poison, [[BB279:%.]] ], [ [[TMP13:%.]], [[EXIT:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ [[TMP1]], [[EXIT]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x float> [ poison, [[BB279]] ], [ [[TMP1]], [[EXIT]] ]
	; CHECK-NEXT: br label [[BB284:%.*]]			; CHECK-NEXT: br label [[BB284:%.*]]
	; CHECK: bb284:			; CHECK: bb284:
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP4]], undef			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP4]], poison
	; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP5]], undef			; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP5]], poison
	; CHECK-NEXT: br label [[BB21_I:%.*]]			; CHECK-NEXT: br label [[BB21_I:%.*]]
	; CHECK: bb21.i:			; CHECK: bb21.i:
	; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]			; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]
	; CHECK: bb22.i:			; CHECK: bb22.i:
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> undef, [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> poison, [[TMP6]]
	; CHECK-NEXT: br label [[BB32_I:%.*]]			; CHECK-NEXT: br label [[BB32_I:%.*]]
	; CHECK: bb32.i:			; CHECK: bb32.i:
	; CHECK-NEXT: [[TMP8:%.*]] = phi <2 x double> [ [[TMP7]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]			; CHECK-NEXT: [[TMP8:%.*]] = phi <2 x double> [ [[TMP7]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]
	; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]			; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP9:%.*]] = fpext <2 x float> [[TMP3]] to <2 x double>			; CHECK-NEXT: [[TMP9:%.*]] = fpext <2 x float> [[TMP3]] to <2 x double>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], <double undef, double 0.000000e+00>			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], <double poison, double 0.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> undef, [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> poison, [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP11]], undef			; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP11]], poison
	; CHECK-NEXT: [[TMP13]] = fptrunc <2 x double> [[TMP12]] to <2 x float>			; CHECK-NEXT: [[TMP13]] = fptrunc <2 x double> [[TMP12]] to <2 x float>
	; CHECK-NEXT: br label [[BB283]]			; CHECK-NEXT: br label [[BB283]]
	;			;
	bb279:			bb279:
	br label %bb283			br label %bb283

	bb283:			bb283:
	%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]			%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

	Show All 21 Lines
	; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4			; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4
	; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3			; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3
	; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4			; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4
	; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4			; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4
	; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4			; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4
	; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]			; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]
	; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]			; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]
	; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]			; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]
	; CHECK-NEXT: [[T28:%.*]] = add nsw i32 [[T15]], [[T9]]
	; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]			; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]
	; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]			; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]
	; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433			; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433
	; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270
	; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137
	; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]			; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]
	; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]			; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]
	; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]			; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]
	; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633			; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[T50:%.*]] = add nsw i32 [[T40]], [[T48]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> poison, i32 [[T28]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[T27]], i32 2
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[T50]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[T29]], i32 3
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 3, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> <i32 poison, i32 poison, i32 6270, i32 poison, i32 -15137, i32 poison, i32 poison, i32 poison>, i32 [[T9]], i32 0
	; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[T68]], i32 [[T28]], i32 4			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[T50]], i32 5			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[SHUFFLE]], [[TMP7]]
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <8 x i32> [[SHUFFLE]], [[TMP7]]
				; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 10, i32 3, i32 12, i32 5, i32 6, i32 7>
				; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 4, i32 3>
				; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 0
				; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> poison, i32 [[TMP11]], i32 0
				; CHECK-NEXT: [[TMP12:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 1
				; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[TMP12]], i32 1
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 2
				; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[TMP13]], i32 2
				; CHECK-NEXT: [[TMP14:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 3
				; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[TMP14]], i32 3
				; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[T68]], i32 [[TMP11]], i32 4
				; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[TMP12]], i32 5
				; CHECK-NEXT: [[TMP15:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 6
				; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[TMP15]], i32 6
				; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP14]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	%t5 = load i32, i32* %t4, align 4			%t5 = load i32, i32* %t4, align 4
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

	Show All 21 Lines
	; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4			; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4
	; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3			; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3
	; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4			; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4
	; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4			; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4
	; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4			; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4
	; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]			; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]
	; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]			; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]
	; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]			; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]
	; CHECK-NEXT: [[T28:%.*]] = add nsw i32 [[T15]], [[T9]]
	; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]			; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]
	; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]			; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]
	; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433			; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433
	; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270
	; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137
	; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]			; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]
	; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]			; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]
	; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]			; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]
	; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633			; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[T50:%.*]] = add nsw i32 [[T40]], [[T48]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> undef, i32 [[T28]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[T27]], i32 2
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[T50]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[T29]], i32 3
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 3, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> <i32 poison, i32 poison, i32 6270, i32 poison, i32 -15137, i32 poison, i32 poison, i32 poison>, i32 [[T9]], i32 0
	; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[T68]], i32 [[T28]], i32 4			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[T50]], i32 5			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[SHUFFLE]], [[TMP7]]
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <8 x i32> [[SHUFFLE]], [[TMP7]]
				; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 10, i32 3, i32 12, i32 5, i32 6, i32 7>
				; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 4, i32 3>
				; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 0
				; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> undef, i32 [[TMP11]], i32 0
				; CHECK-NEXT: [[TMP12:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 1
				; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[TMP12]], i32 1
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 2
				; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[TMP13]], i32 2
				; CHECK-NEXT: [[TMP14:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 3
				; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[TMP14]], i32 3
				; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[T68]], i32 [[TMP11]], i32 4
				; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[TMP12]], i32 5
				; CHECK-NEXT: [[TMP15:%.*]] = extractelement <8 x i32> [[SHUFFLE1]], i32 6
				; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[TMP15]], i32 6
				; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP14]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	%t5 = load i32, i32* %t4, align 4			%t5 = load i32, i32* %t4, align 4
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	define i32 @foo(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 0, i32 0>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A7:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A8:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A1:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A2:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A3:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A4:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A5:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A6:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	Show All 25 Lines
	define i32 @foo1(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo1(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo1(			; CHECK-LABEL: @foo1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 3			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 3>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 1, i32 2, i32 3, i32 1, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A6:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A1:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A4:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A5:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A8:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A2:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A3:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	Show All 29 Lines
	define i32 @foo2(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo2(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo2(			; CHECK-LABEL: @foo2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 3			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 3
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 3, i32 2, i32 3, i32 0, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A4:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A6:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A5:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A8:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A2:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A7:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A1:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A3:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	; MAX32-NEXT: [[PHI32:%.*]] = phi float [ [[I67]], [[BB3]] ], [ [[I67]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I67]], [[BB1]] ]			; MAX32-NEXT: [[PHI32:%.*]] = phi float [ [[I67]], [[BB3]] ], [ [[I67]], [[BB4]] ], [ [[FVAL]], [[BB5]] ], [ [[I67]], [[BB1]] ]
	; MAX32-NEXT: store float [[PHI31]], float* undef, align 4			; MAX32-NEXT: store float [[PHI31]], float* undef, align 4
	; MAX32-NEXT: ret void			; MAX32-NEXT: ret void
	;			;
	; MAX256-LABEL: @phi_float32(			; MAX256-LABEL: @phi_float32(
	; MAX256-NEXT: bb:			; MAX256-NEXT: bb:
	; MAX256-NEXT: br label [[BB1:%.*]]			; MAX256-NEXT: br label [[BB1:%.*]]
	; MAX256: bb1:			; MAX256: bb1:
	; MAX256-NEXT: [[TMP0:%.]] = insertelement <4 x half> poison, half [[HVAL:%.]], i32 0			; MAX256-NEXT: [[I:%.]] = fpext half [[HVAL:%.]] to float
	; MAX256-NEXT: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half [[HVAL]], i32 1			; MAX256-NEXT: [[I3:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half [[HVAL]], i32 2			; MAX256-NEXT: [[I6:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half [[HVAL]], i32 3			; MAX256-NEXT: [[I9:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[TMP4:%.*]] = fpext <4 x half> [[TMP3]] to <4 x float>			; MAX256-NEXT: [[TMP0:%.*]] = insertelement <8 x float> poison, float [[I]], i32 0
	; MAX256-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>			; MAX256-NEXT: [[TMP1:%.*]] = insertelement <8 x float> [[TMP0]], float [[I]], i32 1
	; MAX256-NEXT: [[TMP5:%.]] = insertelement <8 x float> poison, float [[FVAL:%.]], i32 0			; MAX256-NEXT: [[TMP2:%.*]] = insertelement <8 x float> [[TMP1]], float [[I]], i32 2
	; MAX256-NEXT: [[TMP6:%.*]] = insertelement <8 x float> [[TMP5]], float [[FVAL]], i32 1			; MAX256-NEXT: [[TMP3:%.*]] = insertelement <8 x float> [[TMP2]], float [[I]], i32 3
	; MAX256-NEXT: [[TMP7:%.*]] = insertelement <8 x float> [[TMP6]], float [[FVAL]], i32 2			; MAX256-NEXT: [[TMP4:%.*]] = insertelement <8 x float> [[TMP3]], float [[I]], i32 4
	; MAX256-NEXT: [[TMP8:%.*]] = insertelement <8 x float> [[TMP7]], float [[FVAL]], i32 3			; MAX256-NEXT: [[TMP5:%.*]] = insertelement <8 x float> [[TMP4]], float [[I]], i32 5
	; MAX256-NEXT: [[TMP9:%.*]] = insertelement <8 x float> [[TMP8]], float [[FVAL]], i32 4			; MAX256-NEXT: [[TMP6:%.*]] = insertelement <8 x float> [[TMP5]], float [[I]], i32 6
	; MAX256-NEXT: [[TMP10:%.*]] = insertelement <8 x float> [[TMP9]], float [[FVAL]], i32 5			; MAX256-NEXT: [[TMP7:%.*]] = insertelement <8 x float> [[TMP6]], float [[I]], i32 7
	; MAX256-NEXT: [[TMP11:%.*]] = insertelement <8 x float> [[TMP10]], float [[FVAL]], i32 6			; MAX256-NEXT: [[TMP8:%.]] = insertelement <8 x float> poison, float [[FVAL:%.]], i32 0
	; MAX256-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[FVAL]], i32 7			; MAX256-NEXT: [[TMP9:%.*]] = insertelement <8 x float> [[TMP8]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP13:%.*]] = fmul <8 x float> [[SHUFFLE]], [[TMP12]]			; MAX256-NEXT: [[TMP10:%.*]] = insertelement <8 x float> [[TMP9]], float [[FVAL]], i32 2
	; MAX256-NEXT: [[TMP14:%.*]] = fadd <8 x float> zeroinitializer, [[TMP13]]			; MAX256-NEXT: [[TMP11:%.*]] = insertelement <8 x float> [[TMP10]], float [[FVAL]], i32 3
	; MAX256-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 3			; MAX256-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[FVAL]], i32 4
	; MAX256-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 2			; MAX256-NEXT: [[TMP13:%.*]] = insertelement <8 x float> [[TMP12]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP17:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 1			; MAX256-NEXT: [[TMP14:%.*]] = insertelement <8 x float> [[TMP13]], float [[FVAL]], i32 6
	; MAX256-NEXT: [[TMP18:%.*]] = extractelement <8 x float> [[SHUFFLE]], i32 0			; MAX256-NEXT: [[TMP15:%.*]] = insertelement <8 x float> [[TMP14]], float [[FVAL]], i32 7
	; MAX256-NEXT: [[TMP19:%.*]] = insertelement <8 x float> poison, float [[TMP15]], i32 0			; MAX256-NEXT: [[TMP16:%.*]] = fmul <8 x float> [[TMP7]], [[TMP15]]
	; MAX256-NEXT: [[TMP20:%.*]] = insertelement <8 x float> [[TMP19]], float [[TMP16]], i32 1			; MAX256-NEXT: [[TMP17:%.*]] = fadd <8 x float> zeroinitializer, [[TMP16]]
	; MAX256-NEXT: [[TMP21:%.*]] = insertelement <8 x float> [[TMP20]], float [[TMP17]], i32 2			; MAX256-NEXT: [[TMP18:%.*]] = insertelement <8 x float> poison, float [[I3]], i32 0
	; MAX256-NEXT: [[TMP22:%.*]] = insertelement <8 x float> [[TMP21]], float [[TMP18]], i32 3			; MAX256-NEXT: [[TMP19:%.*]] = insertelement <8 x float> [[TMP18]], float [[I3]], i32 1
	; MAX256-NEXT: [[TMP23:%.*]] = insertelement <8 x float> [[TMP22]], float [[TMP15]], i32 4			; MAX256-NEXT: [[TMP20:%.*]] = insertelement <8 x float> [[TMP19]], float [[I3]], i32 2
	; MAX256-NEXT: [[TMP24:%.*]] = insertelement <8 x float> [[TMP23]], float [[TMP16]], i32 5			; MAX256-NEXT: [[TMP21:%.*]] = insertelement <8 x float> [[TMP20]], float [[I3]], i32 3
	; MAX256-NEXT: [[TMP25:%.*]] = insertelement <8 x float> [[TMP24]], float [[TMP17]], i32 6			; MAX256-NEXT: [[TMP22:%.*]] = insertelement <8 x float> [[TMP21]], float [[I3]], i32 4
	; MAX256-NEXT: [[TMP26:%.*]] = insertelement <8 x float> [[TMP25]], float [[TMP18]], i32 7			; MAX256-NEXT: [[TMP23:%.*]] = insertelement <8 x float> [[TMP22]], float [[I3]], i32 5
	; MAX256-NEXT: [[TMP27:%.*]] = fmul <8 x float> [[TMP26]], [[TMP12]]			; MAX256-NEXT: [[TMP24:%.*]] = insertelement <8 x float> [[TMP23]], float [[I3]], i32 6
	; MAX256-NEXT: [[TMP28:%.*]] = fadd <8 x float> zeroinitializer, [[TMP27]]			; MAX256-NEXT: [[TMP25:%.*]] = insertelement <8 x float> [[TMP24]], float [[I3]], i32 7
	; MAX256-NEXT: [[TMP29:%.*]] = fmul <8 x float> [[TMP26]], [[TMP12]]			; MAX256-NEXT: [[TMP26:%.*]] = fmul <8 x float> [[TMP25]], [[TMP15]]
	; MAX256-NEXT: [[TMP30:%.*]] = fadd <8 x float> zeroinitializer, [[TMP29]]			; MAX256-NEXT: [[TMP27:%.*]] = fadd <8 x float> zeroinitializer, [[TMP26]]
	; MAX256-NEXT: [[TMP31:%.*]] = fmul <8 x float> [[TMP26]], [[TMP12]]			; MAX256-NEXT: [[TMP28:%.*]] = insertelement <8 x float> poison, float [[I6]], i32 0
	; MAX256-NEXT: [[TMP32:%.*]] = fadd <8 x float> zeroinitializer, [[TMP31]]			; MAX256-NEXT: [[TMP29:%.*]] = insertelement <8 x float> [[TMP28]], float [[I6]], i32 1
	; MAX256-NEXT: [[TMP33:%.*]] = extractelement <8 x float> [[TMP14]], i32 0			; MAX256-NEXT: [[TMP30:%.*]] = insertelement <8 x float> [[TMP29]], float [[I6]], i32 2
	; MAX256-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP33]], i32 0			; MAX256-NEXT: [[TMP31:%.*]] = insertelement <8 x float> [[TMP30]], float [[I6]], i32 3
	; MAX256-NEXT: [[TMP35:%.*]] = extractelement <8 x float> [[TMP14]], i32 1			; MAX256-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[I6]], i32 4
	; MAX256-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP35]], i32 1			; MAX256-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[I6]], i32 5
	; MAX256-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[FVAL]], i32 2			; MAX256-NEXT: [[TMP34:%.*]] = insertelement <8 x float> [[TMP33]], float [[I6]], i32 6
	; MAX256-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[FVAL]], i32 3			; MAX256-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[I6]], i32 7
	; MAX256-NEXT: [[TMP39:%.*]] = extractelement <8 x float> [[TMP14]], i32 4			; MAX256-NEXT: [[TMP36:%.*]] = fmul <8 x float> [[TMP35]], [[TMP15]]
	; MAX256-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP39]], i32 4			; MAX256-NEXT: [[TMP37:%.*]] = fadd <8 x float> zeroinitializer, [[TMP36]]
	; MAX256-NEXT: [[TMP41:%.*]] = extractelement <8 x float> [[TMP14]], i32 5			; MAX256-NEXT: [[TMP38:%.*]] = insertelement <8 x float> poison, float [[I9]], i32 0
	; MAX256-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP41]], i32 5			; MAX256-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[I9]], i32 1
	; MAX256-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[FVAL]], i32 6			; MAX256-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[I9]], i32 2
	; MAX256-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[FVAL]], i32 7			; MAX256-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[I9]], i32 3
	; MAX256-NEXT: [[TMP45:%.*]] = extractelement <8 x float> [[TMP28]], i32 2			; MAX256-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[I9]], i32 4
	; MAX256-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP6]], float [[TMP45]], i32 2			; MAX256-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[I9]], i32 5
	; MAX256-NEXT: [[TMP47:%.*]] = extractelement <8 x float> [[TMP28]], i32 3			; MAX256-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[I9]], i32 6
	; MAX256-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP47]], i32 3			; MAX256-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[I9]], i32 7
	; MAX256-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[FVAL]], i32 4			; MAX256-NEXT: [[TMP46:%.*]] = fmul <8 x float> [[TMP45]], [[TMP15]]
	; MAX256-NEXT: [[TMP50:%.*]] = insertelement <8 x float> [[TMP49]], float [[FVAL]], i32 5			; MAX256-NEXT: [[TMP47:%.*]] = fadd <8 x float> zeroinitializer, [[TMP46]]
	; MAX256-NEXT: [[TMP51:%.*]] = extractelement <8 x float> [[TMP28]], i32 6
	; MAX256-NEXT: [[TMP52:%.*]] = insertelement <8 x float> [[TMP50]], float [[TMP51]], i32 6
	; MAX256-NEXT: [[TMP53:%.*]] = extractelement <8 x float> [[TMP28]], i32 7
	; MAX256-NEXT: [[TMP54:%.*]] = insertelement <8 x float> [[TMP52]], float [[TMP53]], i32 7
	; MAX256-NEXT: [[TMP55:%.*]] = extractelement <8 x float> [[TMP30]], i32 2
	; MAX256-NEXT: [[TMP56:%.*]] = insertelement <8 x float> [[TMP6]], float [[TMP55]], i32 2
	; MAX256-NEXT: [[TMP57:%.*]] = extractelement <8 x float> [[TMP30]], i32 3
	; MAX256-NEXT: [[TMP58:%.*]] = insertelement <8 x float> [[TMP56]], float [[TMP57]], i32 3
	; MAX256-NEXT: [[TMP59:%.*]] = insertelement <8 x float> [[TMP58]], float [[FVAL]], i32 4
	; MAX256-NEXT: [[TMP60:%.*]] = insertelement <8 x float> [[TMP59]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP61:%.*]] = extractelement <8 x float> [[TMP30]], i32 6
	; MAX256-NEXT: [[TMP62:%.*]] = insertelement <8 x float> [[TMP60]], float [[TMP61]], i32 6
	; MAX256-NEXT: [[TMP63:%.*]] = extractelement <8 x float> [[TMP30]], i32 7
	; MAX256-NEXT: [[TMP64:%.*]] = insertelement <8 x float> [[TMP62]], float [[TMP63]], i32 7
	; MAX256-NEXT: [[TMP65:%.*]] = extractelement <8 x float> [[TMP32]], i32 2
	; MAX256-NEXT: [[TMP66:%.*]] = insertelement <8 x float> [[TMP6]], float [[TMP65]], i32 2
	; MAX256-NEXT: [[TMP67:%.*]] = extractelement <8 x float> [[TMP32]], i32 3
	; MAX256-NEXT: [[TMP68:%.*]] = insertelement <8 x float> [[TMP66]], float [[TMP67]], i32 3
	; MAX256-NEXT: [[TMP69:%.*]] = insertelement <8 x float> [[TMP68]], float [[FVAL]], i32 4
	; MAX256-NEXT: [[TMP70:%.*]] = insertelement <8 x float> [[TMP69]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP71:%.*]] = extractelement <8 x float> [[TMP32]], i32 6
	; MAX256-NEXT: [[TMP72:%.*]] = insertelement <8 x float> [[TMP70]], float [[TMP71]], i32 6
	; MAX256-NEXT: [[TMP73:%.*]] = extractelement <8 x float> [[TMP32]], i32 7
	; MAX256-NEXT: [[TMP74:%.*]] = insertelement <8 x float> [[TMP72]], float [[TMP73]], i32 7
	; MAX256-NEXT: switch i32 undef, label [[BB5:%.*]] [			; MAX256-NEXT: switch i32 undef, label [[BB5:%.*]] [
	; MAX256-NEXT: i32 0, label [[BB2:%.*]]			; MAX256-NEXT: i32 0, label [[BB2:%.*]]
	; MAX256-NEXT: i32 1, label [[BB3:%.*]]			; MAX256-NEXT: i32 1, label [[BB3:%.*]]
	; MAX256-NEXT: i32 2, label [[BB4:%.*]]			; MAX256-NEXT: i32 2, label [[BB4:%.*]]
	; MAX256-NEXT: ]			; MAX256-NEXT: ]
	; MAX256: bb3:			; MAX256: bb3:
	; MAX256-NEXT: br label [[BB2]]			; MAX256-NEXT: br label [[BB2]]
	; MAX256: bb4:			; MAX256: bb4:
	; MAX256-NEXT: [[TMP75:%.*]] = insertelement <8 x float> [[TMP34]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP76:%.*]] = insertelement <8 x float> [[TMP75]], float [[FVAL]], i32 2
	; MAX256-NEXT: [[TMP77:%.*]] = extractelement <8 x float> [[TMP14]], i32 3
	; MAX256-NEXT: [[TMP78:%.*]] = insertelement <8 x float> [[TMP76]], float [[TMP77]], i32 3
	; MAX256-NEXT: [[TMP79:%.*]] = insertelement <8 x float> [[TMP78]], float [[TMP39]], i32 4
	; MAX256-NEXT: [[TMP80:%.*]] = insertelement <8 x float> [[TMP79]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP81:%.*]] = insertelement <8 x float> [[TMP80]], float [[FVAL]], i32 6
	; MAX256-NEXT: [[TMP82:%.*]] = extractelement <8 x float> [[TMP14]], i32 7
	; MAX256-NEXT: [[TMP83:%.*]] = insertelement <8 x float> [[TMP81]], float [[TMP82]], i32 7
	; MAX256-NEXT: [[TMP84:%.*]] = extractelement <8 x float> [[TMP28]], i32 0
	; MAX256-NEXT: [[TMP85:%.*]] = insertelement <8 x float> poison, float [[TMP84]], i32 0
	; MAX256-NEXT: [[TMP86:%.*]] = insertelement <8 x float> [[TMP85]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP87:%.*]] = insertelement <8 x float> [[TMP86]], float [[FVAL]], i32 2
	; MAX256-NEXT: [[TMP88:%.*]] = insertelement <8 x float> [[TMP87]], float [[TMP47]], i32 3
	; MAX256-NEXT: [[TMP89:%.*]] = extractelement <8 x float> [[TMP28]], i32 4
	; MAX256-NEXT: [[TMP90:%.*]] = insertelement <8 x float> [[TMP88]], float [[TMP89]], i32 4
	; MAX256-NEXT: [[TMP91:%.*]] = insertelement <8 x float> [[TMP90]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP92:%.*]] = insertelement <8 x float> [[TMP91]], float [[FVAL]], i32 6
	; MAX256-NEXT: [[TMP93:%.*]] = insertelement <8 x float> [[TMP92]], float [[TMP53]], i32 7
	; MAX256-NEXT: [[TMP94:%.*]] = extractelement <8 x float> [[TMP30]], i32 0
	; MAX256-NEXT: [[TMP95:%.*]] = insertelement <8 x float> poison, float [[TMP94]], i32 0
	; MAX256-NEXT: [[TMP96:%.*]] = insertelement <8 x float> [[TMP95]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP97:%.*]] = insertelement <8 x float> [[TMP96]], float [[FVAL]], i32 2
	; MAX256-NEXT: [[TMP98:%.*]] = insertelement <8 x float> [[TMP97]], float [[TMP57]], i32 3
	; MAX256-NEXT: [[TMP99:%.*]] = extractelement <8 x float> [[TMP30]], i32 4
	; MAX256-NEXT: [[TMP100:%.*]] = insertelement <8 x float> [[TMP98]], float [[TMP99]], i32 4
	; MAX256-NEXT: [[TMP101:%.*]] = insertelement <8 x float> [[TMP100]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP102:%.*]] = insertelement <8 x float> [[TMP101]], float [[FVAL]], i32 6
	; MAX256-NEXT: [[TMP103:%.*]] = insertelement <8 x float> [[TMP102]], float [[TMP63]], i32 7
	; MAX256-NEXT: [[TMP104:%.*]] = extractelement <8 x float> [[TMP32]], i32 0
	; MAX256-NEXT: [[TMP105:%.*]] = insertelement <8 x float> poison, float [[TMP104]], i32 0
	; MAX256-NEXT: [[TMP106:%.*]] = insertelement <8 x float> [[TMP105]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP107:%.*]] = insertelement <8 x float> [[TMP106]], float [[FVAL]], i32 2
	; MAX256-NEXT: [[TMP108:%.*]] = insertelement <8 x float> [[TMP107]], float [[TMP67]], i32 3
	; MAX256-NEXT: [[TMP109:%.*]] = extractelement <8 x float> [[TMP32]], i32 4
	; MAX256-NEXT: [[TMP110:%.*]] = insertelement <8 x float> [[TMP108]], float [[TMP109]], i32 4
	; MAX256-NEXT: [[TMP111:%.*]] = insertelement <8 x float> [[TMP110]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP112:%.*]] = insertelement <8 x float> [[TMP111]], float [[FVAL]], i32 6
	; MAX256-NEXT: [[TMP113:%.*]] = insertelement <8 x float> [[TMP112]], float [[TMP73]], i32 7
	; MAX256-NEXT: br label [[BB2]]			; MAX256-NEXT: br label [[BB2]]
	; MAX256: bb5:			; MAX256: bb5:
	; MAX256-NEXT: [[TMP114:%.*]] = insertelement <8 x float> [[TMP5]], float [[TMP35]], i32 1
	; MAX256-NEXT: [[TMP115:%.*]] = insertelement <8 x float> [[TMP114]], float [[FVAL]], i32 2
	; MAX256-NEXT: [[TMP116:%.*]] = extractelement <8 x float> [[TMP14]], i32 3
	; MAX256-NEXT: [[TMP117:%.*]] = insertelement <8 x float> [[TMP115]], float [[TMP116]], i32 3
	; MAX256-NEXT: [[TMP118:%.*]] = insertelement <8 x float> [[TMP117]], float [[FVAL]], i32 4
	; MAX256-NEXT: [[TMP119:%.*]] = insertelement <8 x float> [[TMP118]], float [[TMP41]], i32 5
	; MAX256-NEXT: [[TMP120:%.*]] = insertelement <8 x float> [[TMP119]], float [[FVAL]], i32 6
	; MAX256-NEXT: [[TMP121:%.*]] = extractelement <8 x float> [[TMP14]], i32 7
	; MAX256-NEXT: [[TMP122:%.*]] = insertelement <8 x float> [[TMP120]], float [[TMP121]], i32 7
	; MAX256-NEXT: [[TMP123:%.*]] = extractelement <8 x float> [[TMP28]], i32 0
	; MAX256-NEXT: [[TMP124:%.*]] = insertelement <8 x float> poison, float [[TMP123]], i32 0
	; MAX256-NEXT: [[TMP125:%.*]] = insertelement <8 x float> [[TMP124]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP126:%.*]] = insertelement <8 x float> [[TMP125]], float [[TMP45]], i32 2
	; MAX256-NEXT: [[TMP127:%.*]] = insertelement <8 x float> [[TMP126]], float [[FVAL]], i32 3
	; MAX256-NEXT: [[TMP128:%.*]] = extractelement <8 x float> [[TMP28]], i32 4
	; MAX256-NEXT: [[TMP129:%.*]] = insertelement <8 x float> [[TMP127]], float [[TMP128]], i32 4
	; MAX256-NEXT: [[TMP130:%.*]] = insertelement <8 x float> [[TMP129]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP131:%.*]] = insertelement <8 x float> [[TMP130]], float [[TMP51]], i32 6
	; MAX256-NEXT: [[TMP132:%.*]] = insertelement <8 x float> [[TMP131]], float [[FVAL]], i32 7
	; MAX256-NEXT: [[TMP133:%.*]] = extractelement <8 x float> [[TMP30]], i32 0
	; MAX256-NEXT: [[TMP134:%.*]] = insertelement <8 x float> poison, float [[TMP133]], i32 0
	; MAX256-NEXT: [[TMP135:%.*]] = insertelement <8 x float> [[TMP134]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP136:%.*]] = insertelement <8 x float> [[TMP135]], float [[TMP55]], i32 2
	; MAX256-NEXT: [[TMP137:%.*]] = insertelement <8 x float> [[TMP136]], float [[FVAL]], i32 3
	; MAX256-NEXT: [[TMP138:%.*]] = extractelement <8 x float> [[TMP30]], i32 4
	; MAX256-NEXT: [[TMP139:%.*]] = insertelement <8 x float> [[TMP137]], float [[TMP138]], i32 4
	; MAX256-NEXT: [[TMP140:%.*]] = insertelement <8 x float> [[TMP139]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP141:%.*]] = insertelement <8 x float> [[TMP140]], float [[TMP61]], i32 6
	; MAX256-NEXT: [[TMP142:%.*]] = insertelement <8 x float> [[TMP141]], float [[FVAL]], i32 7
	; MAX256-NEXT: [[TMP143:%.*]] = extractelement <8 x float> [[TMP32]], i32 0
	; MAX256-NEXT: [[TMP144:%.*]] = insertelement <8 x float> poison, float [[TMP143]], i32 0
	; MAX256-NEXT: [[TMP145:%.*]] = insertelement <8 x float> [[TMP144]], float [[FVAL]], i32 1
	; MAX256-NEXT: [[TMP146:%.*]] = insertelement <8 x float> [[TMP145]], float [[TMP65]], i32 2
	; MAX256-NEXT: [[TMP147:%.*]] = insertelement <8 x float> [[TMP146]], float [[FVAL]], i32 3
	; MAX256-NEXT: [[TMP148:%.*]] = extractelement <8 x float> [[TMP32]], i32 4
	; MAX256-NEXT: [[TMP149:%.*]] = insertelement <8 x float> [[TMP147]], float [[TMP148]], i32 4
	; MAX256-NEXT: [[TMP150:%.*]] = insertelement <8 x float> [[TMP149]], float [[FVAL]], i32 5
	; MAX256-NEXT: [[TMP151:%.*]] = insertelement <8 x float> [[TMP150]], float [[TMP71]], i32 6
	; MAX256-NEXT: [[TMP152:%.*]] = insertelement <8 x float> [[TMP151]], float [[FVAL]], i32 7
	; MAX256-NEXT: br label [[BB2]]			; MAX256-NEXT: br label [[BB2]]
	; MAX256: bb2:			; MAX256: bb2:
	; MAX256-NEXT: [[TMP153:%.*]] = phi <8 x float> [ [[TMP14]], [[BB3]] ], [ [[TMP83]], [[BB4]] ], [ [[TMP122]], [[BB5]] ], [ [[TMP44]], [[BB1]] ]			; MAX256-NEXT: [[TMP48:%.*]] = phi <8 x float> [ [[TMP27]], [[BB3]] ], [ [[TMP15]], [[BB4]] ], [ [[TMP15]], [[BB5]] ], [ [[TMP15]], [[BB1]] ]
	; MAX256-NEXT: [[TMP154:%.*]] = phi <8 x float> [ [[TMP28]], [[BB3]] ], [ [[TMP93]], [[BB4]] ], [ [[TMP132]], [[BB5]] ], [ [[TMP54]], [[BB1]] ]			; MAX256-NEXT: [[TMP49:%.*]] = phi <8 x float> [ [[TMP37]], [[BB3]] ], [ [[TMP15]], [[BB4]] ], [ [[TMP37]], [[BB5]] ], [ [[TMP37]], [[BB1]] ]
	; MAX256-NEXT: [[TMP155:%.*]] = phi <8 x float> [ [[TMP30]], [[BB3]] ], [ [[TMP103]], [[BB4]] ], [ [[TMP142]], [[BB5]] ], [ [[TMP64]], [[BB1]] ]			; MAX256-NEXT: [[TMP50:%.*]] = phi <8 x float> [ [[TMP47]], [[BB3]] ], [ [[TMP47]], [[BB4]] ], [ [[TMP15]], [[BB5]] ], [ [[TMP47]], [[BB1]] ]
	; MAX256-NEXT: [[TMP156:%.*]] = phi <8 x float> [ [[TMP32]], [[BB3]] ], [ [[TMP113]], [[BB4]] ], [ [[TMP152]], [[BB5]] ], [ [[TMP74]], [[BB1]] ]			; MAX256-NEXT: [[TMP51:%.*]] = phi <8 x float> [ [[TMP17]], [[BB3]] ], [ [[TMP17]], [[BB4]] ], [ [[TMP17]], [[BB5]] ], [ [[TMP15]], [[BB1]] ]
	; MAX256-NEXT: [[TMP157:%.*]] = extractelement <8 x float> [[TMP156]], i32 6			; MAX256-NEXT: [[TMP52:%.*]] = extractelement <8 x float> [[TMP49]], i32 7
	; MAX256-NEXT: store float [[TMP157]], float* undef, align 4			; MAX256-NEXT: store float [[TMP52]], float* undef, align 4
	; MAX256-NEXT: ret void			; MAX256-NEXT: ret void
	;			;
	; MAX1024-LABEL: @phi_float32(			; MAX1024-LABEL: @phi_float32(
	; MAX1024-NEXT: bb:			; MAX1024-NEXT: bb:
	; MAX1024-NEXT: br label [[BB1:%.*]]			; MAX1024-NEXT: br label [[BB1:%.*]]
	; MAX1024: bb1:			; MAX1024: bb1:
	; MAX1024-NEXT: [[TMP0:%.]] = insertelement <4 x half> poison, half [[HVAL:%.]], i32 0			; MAX1024-NEXT: [[I:%.]] = fpext half [[HVAL:%.]] to float
	; MAX1024-NEXT: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half [[HVAL]], i32 1			; MAX1024-NEXT: [[I3:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half [[HVAL]], i32 2			; MAX1024-NEXT: [[I6:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half [[HVAL]], i32 3			; MAX1024-NEXT: [[I9:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[TMP4:%.*]] = fpext <4 x half> [[TMP3]] to <4 x float>			; MAX1024-NEXT: [[TMP0:%.*]] = insertelement <8 x float> poison, float [[I]], i32 0
	; MAX1024-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0, i32 3, i32 2, i32 1, i32 0>			; MAX1024-NEXT: [[TMP1:%.*]] = insertelement <8 x float> [[TMP0]], float [[I]], i32 1
	; MAX1024-NEXT: [[TMP5:%.]] = insertelement <32 x float> poison, float [[FVAL:%.]], i32 0			; MAX1024-NEXT: [[TMP2:%.*]] = insertelement <8 x float> [[TMP1]], float [[I]], i32 2
	; MAX1024-NEXT: [[TMP6:%.*]] = insertelement <32 x float> [[TMP5]], float [[FVAL]], i32 1			; MAX1024-NEXT: [[TMP3:%.*]] = insertelement <8 x float> [[TMP2]], float [[I]], i32 3
	; MAX1024-NEXT: [[TMP7:%.*]] = insertelement <32 x float> [[TMP6]], float [[FVAL]], i32 2			; MAX1024-NEXT: [[TMP4:%.*]] = insertelement <8 x float> [[TMP3]], float [[I]], i32 4
	; MAX1024-NEXT: [[TMP8:%.*]] = insertelement <32 x float> [[TMP7]], float [[FVAL]], i32 3			; MAX1024-NEXT: [[TMP5:%.*]] = insertelement <8 x float> [[TMP4]], float [[I]], i32 5
	; MAX1024-NEXT: [[TMP9:%.*]] = insertelement <32 x float> [[TMP8]], float [[FVAL]], i32 4			; MAX1024-NEXT: [[TMP6:%.*]] = insertelement <8 x float> [[TMP5]], float [[I]], i32 6
	; MAX1024-NEXT: [[TMP10:%.*]] = insertelement <32 x float> [[TMP9]], float [[FVAL]], i32 5			; MAX1024-NEXT: [[TMP7:%.*]] = insertelement <8 x float> [[TMP6]], float [[I]], i32 7
	; MAX1024-NEXT: [[TMP11:%.*]] = insertelement <32 x float> [[TMP10]], float [[FVAL]], i32 6			; MAX1024-NEXT: [[TMP8:%.]] = insertelement <8 x float> poison, float [[FVAL:%.]], i32 0
	; MAX1024-NEXT: [[TMP12:%.*]] = insertelement <32 x float> [[TMP11]], float [[FVAL]], i32 7			; MAX1024-NEXT: [[TMP9:%.*]] = insertelement <8 x float> [[TMP8]], float [[FVAL]], i32 1
	; MAX1024-NEXT: [[TMP13:%.*]] = insertelement <32 x float> [[TMP12]], float [[FVAL]], i32 8			; MAX1024-NEXT: [[TMP10:%.*]] = insertelement <8 x float> [[TMP9]], float [[FVAL]], i32 2
	; MAX1024-NEXT: [[TMP14:%.*]] = insertelement <32 x float> [[TMP13]], float [[FVAL]], i32 9			; MAX1024-NEXT: [[TMP11:%.*]] = insertelement <8 x float> [[TMP10]], float [[FVAL]], i32 3
	; MAX1024-NEXT: [[TMP15:%.*]] = insertelement <32 x float> [[TMP14]], float [[FVAL]], i32 10			; MAX1024-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[FVAL]], i32 4
	; MAX1024-NEXT: [[TMP16:%.*]] = insertelement <32 x float> [[TMP15]], float [[FVAL]], i32 11			; MAX1024-NEXT: [[TMP13:%.*]] = insertelement <8 x float> [[TMP12]], float [[FVAL]], i32 5
	; MAX1024-NEXT: [[TMP17:%.*]] = insertelement <32 x float> [[TMP16]], float [[FVAL]], i32 12			; MAX1024-NEXT: [[TMP14:%.*]] = insertelement <8 x float> [[TMP13]], float [[FVAL]], i32 6
	; MAX1024-NEXT: [[TMP18:%.*]] = insertelement <32 x float> [[TMP17]], float [[FVAL]], i32 13			; MAX1024-NEXT: [[TMP15:%.*]] = insertelement <8 x float> [[TMP14]], float [[FVAL]], i32 7
	; MAX1024-NEXT: [[TMP19:%.*]] = insertelement <32 x float> [[TMP18]], float [[FVAL]], i32 14			; MAX1024-NEXT: [[TMP16:%.*]] = fmul <8 x float> [[TMP7]], [[TMP15]]
	; MAX1024-NEXT: [[TMP20:%.*]] = insertelement <32 x float> [[TMP19]], float [[FVAL]], i32 15			; MAX1024-NEXT: [[TMP17:%.*]] = fadd <8 x float> zeroinitializer, [[TMP16]]
	; MAX1024-NEXT: [[TMP21:%.*]] = insertelement <32 x float> [[TMP20]], float [[FVAL]], i32 16			; MAX1024-NEXT: [[TMP18:%.*]] = insertelement <8 x float> poison, float [[I3]], i32 0
	; MAX1024-NEXT: [[TMP22:%.*]] = insertelement <32 x float> [[TMP21]], float [[FVAL]], i32 17			; MAX1024-NEXT: [[TMP19:%.*]] = insertelement <8 x float> [[TMP18]], float [[I3]], i32 1
	; MAX1024-NEXT: [[TMP23:%.*]] = insertelement <32 x float> [[TMP22]], float [[FVAL]], i32 18			; MAX1024-NEXT: [[TMP20:%.*]] = insertelement <8 x float> [[TMP19]], float [[I3]], i32 2
	; MAX1024-NEXT: [[TMP24:%.*]] = insertelement <32 x float> [[TMP23]], float [[FVAL]], i32 19			; MAX1024-NEXT: [[TMP21:%.*]] = insertelement <8 x float> [[TMP20]], float [[I3]], i32 3
	; MAX1024-NEXT: [[TMP25:%.*]] = insertelement <32 x float> [[TMP24]], float [[FVAL]], i32 20			; MAX1024-NEXT: [[TMP22:%.*]] = insertelement <8 x float> [[TMP21]], float [[I3]], i32 4
	; MAX1024-NEXT: [[TMP26:%.*]] = insertelement <32 x float> [[TMP25]], float [[FVAL]], i32 21			; MAX1024-NEXT: [[TMP23:%.*]] = insertelement <8 x float> [[TMP22]], float [[I3]], i32 5
	; MAX1024-NEXT: [[TMP27:%.*]] = insertelement <32 x float> [[TMP26]], float [[FVAL]], i32 22			; MAX1024-NEXT: [[TMP24:%.*]] = insertelement <8 x float> [[TMP23]], float [[I3]], i32 6
	; MAX1024-NEXT: [[TMP28:%.*]] = insertelement <32 x float> [[TMP27]], float [[FVAL]], i32 23			; MAX1024-NEXT: [[TMP25:%.*]] = insertelement <8 x float> [[TMP24]], float [[I3]], i32 7
	; MAX1024-NEXT: [[TMP29:%.*]] = insertelement <32 x float> [[TMP28]], float [[FVAL]], i32 24			; MAX1024-NEXT: [[TMP26:%.*]] = fmul <8 x float> [[TMP25]], [[TMP15]]
	; MAX1024-NEXT: [[TMP30:%.*]] = insertelement <32 x float> [[TMP29]], float [[FVAL]], i32 25			; MAX1024-NEXT: [[TMP27:%.*]] = fadd <8 x float> zeroinitializer, [[TMP26]]
	; MAX1024-NEXT: [[TMP31:%.*]] = insertelement <32 x float> [[TMP30]], float [[FVAL]], i32 26			; MAX1024-NEXT: [[TMP28:%.*]] = insertelement <8 x float> poison, float [[I6]], i32 0
	; MAX1024-NEXT: [[TMP32:%.*]] = insertelement <32 x float> [[TMP31]], float [[FVAL]], i32 27			; MAX1024-NEXT: [[TMP29:%.*]] = insertelement <8 x float> [[TMP28]], float [[I6]], i32 1
	; MAX1024-NEXT: [[TMP33:%.*]] = insertelement <32 x float> [[TMP32]], float [[FVAL]], i32 28			; MAX1024-NEXT: [[TMP30:%.*]] = insertelement <8 x float> [[TMP29]], float [[I6]], i32 2
	; MAX1024-NEXT: [[TMP34:%.*]] = insertelement <32 x float> [[TMP33]], float [[FVAL]], i32 29			; MAX1024-NEXT: [[TMP31:%.*]] = insertelement <8 x float> [[TMP30]], float [[I6]], i32 3
	; MAX1024-NEXT: [[TMP35:%.*]] = insertelement <32 x float> [[TMP34]], float [[FVAL]], i32 30			; MAX1024-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[I6]], i32 4
	; MAX1024-NEXT: [[TMP36:%.*]] = insertelement <32 x float> [[TMP35]], float [[FVAL]], i32 31			; MAX1024-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[I6]], i32 5
	; MAX1024-NEXT: [[TMP37:%.*]] = fmul <32 x float> [[SHUFFLE]], [[TMP36]]			; MAX1024-NEXT: [[TMP34:%.*]] = insertelement <8 x float> [[TMP33]], float [[I6]], i32 6
	; MAX1024-NEXT: [[TMP38:%.*]] = fadd <32 x float> zeroinitializer, [[TMP37]]			; MAX1024-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[I6]], i32 7
	; MAX1024-NEXT: [[TMP39:%.*]] = extractelement <32 x float> [[TMP38]], i32 0			; MAX1024-NEXT: [[TMP36:%.*]] = fmul <8 x float> [[TMP35]], [[TMP15]]
	; MAX1024-NEXT: [[TMP40:%.*]] = insertelement <32 x float> poison, float [[TMP39]], i32 0			; MAX1024-NEXT: [[TMP37:%.*]] = fadd <8 x float> zeroinitializer, [[TMP36]]
	; MAX1024-NEXT: [[TMP41:%.*]] = extractelement <32 x float> [[TMP38]], i32 1			; MAX1024-NEXT: [[TMP38:%.*]] = insertelement <8 x float> poison, float [[I9]], i32 0
	; MAX1024-NEXT: [[TMP42:%.*]] = insertelement <32 x float> [[TMP40]], float [[TMP41]], i32 1			; MAX1024-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[I9]], i32 1
	; MAX1024-NEXT: [[TMP43:%.*]] = insertelement <32 x float> [[TMP42]], float [[FVAL]], i32 2			; MAX1024-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[I9]], i32 2
	; MAX1024-NEXT: [[TMP44:%.*]] = insertelement <32 x float> [[TMP43]], float [[FVAL]], i32 3			; MAX1024-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[I9]], i32 3
	; MAX1024-NEXT: [[TMP45:%.*]] = extractelement <32 x float> [[TMP38]], i32 4			; MAX1024-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[I9]], i32 4
	; MAX1024-NEXT: [[TMP46:%.*]] = insertelement <32 x float> [[TMP44]], float [[TMP45]], i32 4			; MAX1024-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[I9]], i32 5
	; MAX1024-NEXT: [[TMP47:%.*]] = extractelement <32 x float> [[TMP38]], i32 5			; MAX1024-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[I9]], i32 6
	; MAX1024-NEXT: [[TMP48:%.*]] = insertelement <32 x float> [[TMP46]], float [[TMP47]], i32 5			; MAX1024-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[I9]], i32 7
	; MAX1024-NEXT: [[TMP49:%.*]] = insertelement <32 x float> [[TMP48]], float [[FVAL]], i32 6			; MAX1024-NEXT: [[TMP46:%.*]] = fmul <8 x float> [[TMP45]], [[TMP15]]
	; MAX1024-NEXT: [[TMP50:%.*]] = insertelement <32 x float> [[TMP49]], float [[FVAL]], i32 7			; MAX1024-NEXT: [[TMP47:%.*]] = fadd <8 x float> zeroinitializer, [[TMP46]]
	; MAX1024-NEXT: [[TMP51:%.*]] = insertelement <32 x float> [[TMP50]], float [[FVAL]], i32 8
	; MAX1024-NEXT: [[TMP52:%.*]] = insertelement <32 x float> [[TMP51]], float [[FVAL]], i32 9
	; MAX1024-NEXT: [[TMP53:%.*]] = extractelement <32 x float> [[TMP38]], i32 10
	; MAX1024-NEXT: [[TMP54:%.*]] = insertelement <32 x float> [[TMP52]], float [[TMP53]], i32 10
	; MAX1024-NEXT: [[TMP55:%.*]] = extractelement <32 x float> [[TMP38]], i32 11
	; MAX1024-NEXT: [[TMP56:%.*]] = insertelement <32 x float> [[TMP54]], float [[TMP55]], i32 11
	; MAX1024-NEXT: [[TMP57:%.*]] = insertelement <32 x float> [[TMP56]], float [[FVAL]], i32 12
	; MAX1024-NEXT: [[TMP58:%.*]] = insertelement <32 x float> [[TMP57]], float [[FVAL]], i32 13
	; MAX1024-NEXT: [[TMP59:%.*]] = extractelement <32 x float> [[TMP38]], i32 14
	; MAX1024-NEXT: [[TMP60:%.*]] = insertelement <32 x float> [[TMP58]], float [[TMP59]], i32 14
	; MAX1024-NEXT: [[TMP61:%.*]] = extractelement <32 x float> [[TMP38]], i32 15
	; MAX1024-NEXT: [[TMP62:%.*]] = insertelement <32 x float> [[TMP60]], float [[TMP61]], i32 15
	; MAX1024-NEXT: [[TMP63:%.*]] = insertelement <32 x float> [[TMP62]], float [[FVAL]], i32 16
	; MAX1024-NEXT: [[TMP64:%.*]] = insertelement <32 x float> [[TMP63]], float [[FVAL]], i32 17
	; MAX1024-NEXT: [[TMP65:%.*]] = extractelement <32 x float> [[TMP38]], i32 18
	; MAX1024-NEXT: [[TMP66:%.*]] = insertelement <32 x float> [[TMP64]], float [[TMP65]], i32 18
	; MAX1024-NEXT: [[TMP67:%.*]] = extractelement <32 x float> [[TMP38]], i32 19
	; MAX1024-NEXT: [[TMP68:%.*]] = insertelement <32 x float> [[TMP66]], float [[TMP67]], i32 19
	; MAX1024-NEXT: [[TMP69:%.*]] = insertelement <32 x float> [[TMP68]], float [[FVAL]], i32 20
	; MAX1024-NEXT: [[TMP70:%.*]] = insertelement <32 x float> [[TMP69]], float [[FVAL]], i32 21
	; MAX1024-NEXT: [[TMP71:%.*]] = extractelement <32 x float> [[TMP38]], i32 22
	; MAX1024-NEXT: [[TMP72:%.*]] = insertelement <32 x float> [[TMP70]], float [[TMP71]], i32 22
	; MAX1024-NEXT: [[TMP73:%.*]] = extractelement <32 x float> [[TMP38]], i32 23
	; MAX1024-NEXT: [[TMP74:%.*]] = insertelement <32 x float> [[TMP72]], float [[TMP73]], i32 23
	; MAX1024-NEXT: [[TMP75:%.*]] = insertelement <32 x float> [[TMP74]], float [[FVAL]], i32 24
	; MAX1024-NEXT: [[TMP76:%.*]] = insertelement <32 x float> [[TMP75]], float [[FVAL]], i32 25
	; MAX1024-NEXT: [[TMP77:%.*]] = extractelement <32 x float> [[TMP38]], i32 26
	; MAX1024-NEXT: [[TMP78:%.*]] = insertelement <32 x float> [[TMP76]], float [[TMP77]], i32 26
	; MAX1024-NEXT: [[TMP79:%.*]] = extractelement <32 x float> [[TMP38]], i32 27
	; MAX1024-NEXT: [[TMP80:%.*]] = insertelement <32 x float> [[TMP78]], float [[TMP79]], i32 27
	; MAX1024-NEXT: [[TMP81:%.*]] = insertelement <32 x float> [[TMP80]], float [[FVAL]], i32 28
	; MAX1024-NEXT: [[TMP82:%.*]] = insertelement <32 x float> [[TMP81]], float [[FVAL]], i32 29
	; MAX1024-NEXT: [[TMP83:%.*]] = extractelement <32 x float> [[TMP38]], i32 30
	; MAX1024-NEXT: [[TMP84:%.*]] = insertelement <32 x float> [[TMP82]], float [[TMP83]], i32 30
	; MAX1024-NEXT: [[TMP85:%.*]] = extractelement <32 x float> [[TMP38]], i32 31
	; MAX1024-NEXT: [[TMP86:%.*]] = insertelement <32 x float> [[TMP84]], float [[TMP85]], i32 31
	; MAX1024-NEXT: switch i32 undef, label [[BB5:%.*]] [			; MAX1024-NEXT: switch i32 undef, label [[BB5:%.*]] [
	; MAX1024-NEXT: i32 0, label [[BB2:%.*]]			; MAX1024-NEXT: i32 0, label [[BB2:%.*]]
	; MAX1024-NEXT: i32 1, label [[BB3:%.*]]			; MAX1024-NEXT: i32 1, label [[BB3:%.*]]
	; MAX1024-NEXT: i32 2, label [[BB4:%.*]]			; MAX1024-NEXT: i32 2, label [[BB4:%.*]]
	; MAX1024-NEXT: ]			; MAX1024-NEXT: ]
	; MAX1024: bb3:			; MAX1024: bb3:
	; MAX1024-NEXT: br label [[BB2]]			; MAX1024-NEXT: br label [[BB2]]
	; MAX1024: bb4:			; MAX1024: bb4:
	; MAX1024-NEXT: [[TMP87:%.*]] = insertelement <32 x float> [[TMP40]], float [[FVAL]], i32 1
	; MAX1024-NEXT: [[TMP88:%.*]] = insertelement <32 x float> [[TMP87]], float [[FVAL]], i32 2
	; MAX1024-NEXT: [[TMP89:%.*]] = extractelement <32 x float> [[TMP38]], i32 3
	; MAX1024-NEXT: [[TMP90:%.*]] = insertelement <32 x float> [[TMP88]], float [[TMP89]], i32 3
	; MAX1024-NEXT: [[TMP91:%.*]] = insertelement <32 x float> [[TMP90]], float [[TMP45]], i32 4
	; MAX1024-NEXT: [[TMP92:%.*]] = insertelement <32 x float> [[TMP91]], float [[FVAL]], i32 5
	; MAX1024-NEXT: [[TMP93:%.*]] = insertelement <32 x float> [[TMP92]], float [[FVAL]], i32 6
	; MAX1024-NEXT: [[TMP94:%.*]] = extractelement <32 x float> [[TMP38]], i32 7
	; MAX1024-NEXT: [[TMP95:%.*]] = insertelement <32 x float> [[TMP93]], float [[TMP94]], i32 7
	; MAX1024-NEXT: [[TMP96:%.*]] = extractelement <32 x float> [[TMP38]], i32 8
	; MAX1024-NEXT: [[TMP97:%.*]] = insertelement <32 x float> [[TMP95]], float [[TMP96]], i32 8
	; MAX1024-NEXT: [[TMP98:%.*]] = insertelement <32 x float> [[TMP97]], float [[FVAL]], i32 9
	; MAX1024-NEXT: [[TMP99:%.*]] = insertelement <32 x float> [[TMP98]], float [[FVAL]], i32 10
	; MAX1024-NEXT: [[TMP100:%.*]] = insertelement <32 x float> [[TMP99]], float [[TMP55]], i32 11
	; MAX1024-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[TMP38]], i32 12
	; MAX1024-NEXT: [[TMP102:%.*]] = insertelement <32 x float> [[TMP100]], float [[TMP101]], i32 12
	; MAX1024-NEXT: [[TMP103:%.*]] = insertelement <32 x float> [[TMP102]], float [[FVAL]], i32 13
	; MAX1024-NEXT: [[TMP104:%.*]] = insertelement <32 x float> [[TMP103]], float [[FVAL]], i32 14
	; MAX1024-NEXT: [[TMP105:%.*]] = insertelement <32 x float> [[TMP104]], float [[TMP61]], i32 15
	; MAX1024-NEXT: [[TMP106:%.*]] = extractelement <32 x float> [[TMP38]], i32 16
	; MAX1024-NEXT: [[TMP107:%.*]] = insertelement <32 x float> [[TMP105]], float [[TMP106]], i32 16
	; MAX1024-NEXT: [[TMP108:%.*]] = insertelement <32 x float> [[TMP107]], float [[FVAL]], i32 17
	; MAX1024-NEXT: [[TMP109:%.*]] = insertelement <32 x float> [[TMP108]], float [[FVAL]], i32 18
	; MAX1024-NEXT: [[TMP110:%.*]] = insertelement <32 x float> [[TMP109]], float [[TMP67]], i32 19
	; MAX1024-NEXT: [[TMP111:%.*]] = extractelement <32 x float> [[TMP38]], i32 20
	; MAX1024-NEXT: [[TMP112:%.*]] = insertelement <32 x float> [[TMP110]], float [[TMP111]], i32 20
	; MAX1024-NEXT: [[TMP113:%.*]] = insertelement <32 x float> [[TMP112]], float [[FVAL]], i32 21
	; MAX1024-NEXT: [[TMP114:%.*]] = insertelement <32 x float> [[TMP113]], float [[FVAL]], i32 22
	; MAX1024-NEXT: [[TMP115:%.*]] = insertelement <32 x float> [[TMP114]], float [[TMP73]], i32 23
	; MAX1024-NEXT: [[TMP116:%.*]] = extractelement <32 x float> [[TMP38]], i32 24
	; MAX1024-NEXT: [[TMP117:%.*]] = insertelement <32 x float> [[TMP115]], float [[TMP116]], i32 24
	; MAX1024-NEXT: [[TMP118:%.*]] = insertelement <32 x float> [[TMP117]], float [[FVAL]], i32 25
	; MAX1024-NEXT: [[TMP119:%.*]] = insertelement <32 x float> [[TMP118]], float [[FVAL]], i32 26
	; MAX1024-NEXT: [[TMP120:%.*]] = insertelement <32 x float> [[TMP119]], float [[TMP79]], i32 27
	; MAX1024-NEXT: [[TMP121:%.*]] = extractelement <32 x float> [[TMP38]], i32 28
	; MAX1024-NEXT: [[TMP122:%.*]] = insertelement <32 x float> [[TMP120]], float [[TMP121]], i32 28
	; MAX1024-NEXT: [[TMP123:%.*]] = insertelement <32 x float> [[TMP122]], float [[FVAL]], i32 29
	; MAX1024-NEXT: [[TMP124:%.*]] = insertelement <32 x float> [[TMP123]], float [[FVAL]], i32 30
	; MAX1024-NEXT: [[TMP125:%.*]] = insertelement <32 x float> [[TMP124]], float [[TMP85]], i32 31
	; MAX1024-NEXT: br label [[BB2]]			; MAX1024-NEXT: br label [[BB2]]
	; MAX1024: bb5:			; MAX1024: bb5:
	; MAX1024-NEXT: [[TMP126:%.*]] = insertelement <32 x float> [[TMP5]], float [[TMP41]], i32 1
	; MAX1024-NEXT: [[TMP127:%.*]] = insertelement <32 x float> [[TMP126]], float [[FVAL]], i32 2
	; MAX1024-NEXT: [[TMP128:%.*]] = extractelement <32 x float> [[TMP38]], i32 3
	; MAX1024-NEXT: [[TMP129:%.*]] = insertelement <32 x float> [[TMP127]], float [[TMP128]], i32 3
	; MAX1024-NEXT: [[TMP130:%.*]] = insertelement <32 x float> [[TMP129]], float [[FVAL]], i32 4
	; MAX1024-NEXT: [[TMP131:%.*]] = insertelement <32 x float> [[TMP130]], float [[TMP47]], i32 5
	; MAX1024-NEXT: [[TMP132:%.*]] = insertelement <32 x float> [[TMP131]], float [[FVAL]], i32 6
	; MAX1024-NEXT: [[TMP133:%.*]] = extractelement <32 x float> [[TMP38]], i32 7
	; MAX1024-NEXT: [[TMP134:%.*]] = insertelement <32 x float> [[TMP132]], float [[TMP133]], i32 7
	; MAX1024-NEXT: [[TMP135:%.*]] = extractelement <32 x float> [[TMP38]], i32 8
	; MAX1024-NEXT: [[TMP136:%.*]] = insertelement <32 x float> [[TMP134]], float [[TMP135]], i32 8
	; MAX1024-NEXT: [[TMP137:%.*]] = insertelement <32 x float> [[TMP136]], float [[FVAL]], i32 9
	; MAX1024-NEXT: [[TMP138:%.*]] = insertelement <32 x float> [[TMP137]], float [[TMP53]], i32 10
	; MAX1024-NEXT: [[TMP139:%.*]] = insertelement <32 x float> [[TMP138]], float [[FVAL]], i32 11
	; MAX1024-NEXT: [[TMP140:%.*]] = extractelement <32 x float> [[TMP38]], i32 12
	; MAX1024-NEXT: [[TMP141:%.*]] = insertelement <32 x float> [[TMP139]], float [[TMP140]], i32 12
	; MAX1024-NEXT: [[TMP142:%.*]] = insertelement <32 x float> [[TMP141]], float [[FVAL]], i32 13
	; MAX1024-NEXT: [[TMP143:%.*]] = insertelement <32 x float> [[TMP142]], float [[TMP59]], i32 14
	; MAX1024-NEXT: [[TMP144:%.*]] = insertelement <32 x float> [[TMP143]], float [[FVAL]], i32 15
	; MAX1024-NEXT: [[TMP145:%.*]] = extractelement <32 x float> [[TMP38]], i32 16
	; MAX1024-NEXT: [[TMP146:%.*]] = insertelement <32 x float> [[TMP144]], float [[TMP145]], i32 16
	; MAX1024-NEXT: [[TMP147:%.*]] = insertelement <32 x float> [[TMP146]], float [[FVAL]], i32 17
	; MAX1024-NEXT: [[TMP148:%.*]] = insertelement <32 x float> [[TMP147]], float [[TMP65]], i32 18
	; MAX1024-NEXT: [[TMP149:%.*]] = insertelement <32 x float> [[TMP148]], float [[FVAL]], i32 19
	; MAX1024-NEXT: [[TMP150:%.*]] = extractelement <32 x float> [[TMP38]], i32 20
	; MAX1024-NEXT: [[TMP151:%.*]] = insertelement <32 x float> [[TMP149]], float [[TMP150]], i32 20
	; MAX1024-NEXT: [[TMP152:%.*]] = insertelement <32 x float> [[TMP151]], float [[FVAL]], i32 21
	; MAX1024-NEXT: [[TMP153:%.*]] = insertelement <32 x float> [[TMP152]], float [[TMP71]], i32 22
	; MAX1024-NEXT: [[TMP154:%.*]] = insertelement <32 x float> [[TMP153]], float [[FVAL]], i32 23
	; MAX1024-NEXT: [[TMP155:%.*]] = extractelement <32 x float> [[TMP38]], i32 24
	; MAX1024-NEXT: [[TMP156:%.*]] = insertelement <32 x float> [[TMP154]], float [[TMP155]], i32 24
	; MAX1024-NEXT: [[TMP157:%.*]] = insertelement <32 x float> [[TMP156]], float [[FVAL]], i32 25
	; MAX1024-NEXT: [[TMP158:%.*]] = insertelement <32 x float> [[TMP157]], float [[TMP77]], i32 26
	; MAX1024-NEXT: [[TMP159:%.*]] = insertelement <32 x float> [[TMP158]], float [[FVAL]], i32 27
	; MAX1024-NEXT: [[TMP160:%.*]] = extractelement <32 x float> [[TMP38]], i32 28
	; MAX1024-NEXT: [[TMP161:%.*]] = insertelement <32 x float> [[TMP159]], float [[TMP160]], i32 28
	; MAX1024-NEXT: [[TMP162:%.*]] = insertelement <32 x float> [[TMP161]], float [[FVAL]], i32 29
	; MAX1024-NEXT: [[TMP163:%.*]] = insertelement <32 x float> [[TMP162]], float [[TMP83]], i32 30
	; MAX1024-NEXT: [[TMP164:%.*]] = insertelement <32 x float> [[TMP163]], float [[FVAL]], i32 31
	; MAX1024-NEXT: br label [[BB2]]			; MAX1024-NEXT: br label [[BB2]]
	; MAX1024: bb2:			; MAX1024: bb2:
	; MAX1024-NEXT: [[TMP165:%.*]] = phi <32 x float> [ [[TMP38]], [[BB3]] ], [ [[TMP125]], [[BB4]] ], [ [[TMP164]], [[BB5]] ], [ [[TMP86]], [[BB1]] ]			; MAX1024-NEXT: [[TMP48:%.*]] = phi <8 x float> [ [[TMP27]], [[BB3]] ], [ [[TMP15]], [[BB4]] ], [ [[TMP15]], [[BB5]] ], [ [[TMP15]], [[BB1]] ]
	; MAX1024-NEXT: [[TMP166:%.*]] = extractelement <32 x float> [[TMP165]], i32 30			; MAX1024-NEXT: [[TMP49:%.*]] = phi <8 x float> [ [[TMP37]], [[BB3]] ], [ [[TMP15]], [[BB4]] ], [ [[TMP37]], [[BB5]] ], [ [[TMP37]], [[BB1]] ]
	; MAX1024-NEXT: store float [[TMP166]], float* undef, align 4			; MAX1024-NEXT: [[TMP50:%.*]] = phi <8 x float> [ [[TMP47]], [[BB3]] ], [ [[TMP47]], [[BB4]] ], [ [[TMP15]], [[BB5]] ], [ [[TMP47]], [[BB1]] ]
				; MAX1024-NEXT: [[TMP51:%.*]] = phi <8 x float> [ [[TMP17]], [[BB3]] ], [ [[TMP17]], [[BB4]] ], [ [[TMP17]], [[BB5]] ], [ [[TMP15]], [[BB1]] ]
				; MAX1024-NEXT: [[TMP52:%.*]] = extractelement <8 x float> [[TMP49]], i32 7
				; MAX1024-NEXT: store float [[TMP52]], float* undef, align 4
	; MAX1024-NEXT: ret void			; MAX1024-NEXT: ret void
	;			;
	bb:			bb:
	br label %bb1			br label %bb1

	bb1:			bb1:
	%i = fpext half %hval to float			%i = fpext half %hval to float
	%i1 = fmul float %i, %fval			%i1 = fmul float %i, %fval
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Initial support for the vectorization of the non-power-of-2 vectors.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 331650

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/trunc-insertion.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/compare-reduce.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_mandeltext.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

llvm/test/Transforms/SLPVectorizer/X86/hoist.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

llvm/test/Transforms/SLPVectorizer/X86/no_alternate_divrem.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/phi3.ll

llvm/test/Transforms/SLPVectorizer/X86/phi_landingpad.ll

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

llvm/test/Transforms/SLPVectorizer/X86/pr42022-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

[SLP] Initial support for the vectorization of the non-power-of-2 vectors.
Needs ReviewPublic