This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
23/60
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
gather-root.ll
-
horizontal.ll
-
transpose-inseltpoison.ll
-
transpose.ll
-
vectorize-free-extracts-inserts.ll
-
AMDGPU/
-
add_sub_sat-inseltpoison.ll
-
add_sub_sat.ll
-
SystemZ/
-
pr34619.ll
-
X86/
-
PR35865-inseltpoison.ll
-
PR35865.ll
-
PR39774.ll
-
alternate-calls-inseltpoison.ll
-
alternate-calls.ll
-
alternate-cast-inseltpoison.ll
-
alternate-cast.ll
-
alternate-fp-inseltpoison.ll
-
alternate-fp.ll
-
alternate-int-inseltpoison.ll
-
alternate-int.ll
-
arith-fp-inseltpoison.ll
-
arith-fp.ll
-
blending-shuffle-inseltpoison.ll
-
blending-shuffle.ll
-
cmp_commute-inseltpoison.ll
-
cmp_commute.ll
-
commutativity.ll
-
compare-reduce.ll
-
crash_exceed_scheduling.ll
-
crash_reordering_undefs.ll
-
crash_vectorizeTree.ll
1/2
cse.ll
-
dot-product.ll
-
extract.ll
-
extractelement.ll
-
fptosi-inseltpoison.ll
-
fptosi.ll
-
fptoui.ll
-
geps-non-pow-2.ll
-
horizontal-minmax.ll
-
insert-element-build-vector-inseltpoison.ll
1/2
insert-element-build-vector.ll
-
load-merge-inseltpoison.ll
-
load-merge.ll
-
lookahead.ll
-
minimum-sizes.ll
-
no_alternate_divrem.ll
-
operandorder.ll
-
ordering-bug.ll
-
phi.ll
-
pr35497.ll
-
pr42022.ll
2/4
pr47623.ll
-
pr47629-inseltpoison.ll
-
pr47629.ll
-
pr49081.ll
-
reduction-logical.ll
-
reorder_repeated_ops.ll
-
resched.ll
-
revectorized_rdx_crash.ll
-
rgb_phi.ll
-
schedule-bundle.ll
-
shrink_after_reorder.ll
-
supernode.ll
-
value-bug-inseltpoison.ll
-
value-bug.ll
-
vec_list_bias-inseltpoison.ll
-
vec_list_bias.ll
-
vectorize-reorder-reuse.ll

Differential D57059

[SLP] Initial support for the vectorization of the non-power-of-2 vectors.
Needs ReviewPublic

Authored by ABataev on Jan 22 2019, 8:30 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
hfinkel
craig.topper
dtemirbulatov
anton-afanasyev

Summary

Possibly vectorized operations are extended to the power-of-2 number with UndefValues to allow to use regular vector operations.

For SPEC CPU2017 it gives ~7% perf gain for 526.blender_r (AVX512,
O3+LTOi, -march=native), ~2% gain for 538.imagick_r and 638.imagick_s,
~2% gain for 525.x264_r and 625.x264_s, ~2% gain for 526.blender_r (AVX2
, O3+LTO, -march=native), ~11% gain 526.blender_r, ~3% gain for
544.nab_r and 644.nab_s (AVX512, O3+LTO), ~3% gain
for 526.blender_r, ~2% gain for 544.nab_r and 644.nab_s (AVX2, O3+LTO).

Compile and link time are pretty the same:

AVX512, O3+LTO, -march=native
Metric: compile_time
Geomean difference -0.1% (-1.85 sec)

Metric: link_time
Geomean difference +1.2% (+14.46 sec)

AVX512, O3+LTO
Metric: compile_time
Geomean difference -0.2% (-4.71 sec)

Metric: link_time
Geomean difference -3.6% (-54.53 sec)

AVX2, O3+LTO, -march=native
Metric: compile_time
Geomean difference +0.3% (+10.56 sec)

Metric: link_time
Geomean difference -0.1% (-2.18 sec)

AVX2, O3+LTO
Metric: compile_time
Geomean difference +0.2% (+5.73 sec)

Metric: link_time
Geomean difference -3.4% (-67.45 sec)

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,530 ms	x64 debian > libarcher.critical::critical.c
	2,560 ms	x64 debian > libarcher.parallel::parallel-simple2.c
	2,630 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,850 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
	2,480 ms	x64 debian > libarcher.races::lock-unrelated.c
		View Full Test Results (18 Failed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Rebase

Harbormaster completed remote builds in B72530: Diff 293480.Sep 22 2020, 9:38 AM

some very minor style comments - a general comment would be to try and pre-commit the style/NFC refactor/cleanup changes so the size of this patch is smaller

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3181	Do these trivial style refactors separately now to reduce the size of the patch?
3197–3203	Do these trivial style refactors separately now to reduce the size of the patch?
4394	duplicate cast
4502	duplicate cast
4517	duplicate cast
4563	duplicate cast
5520–5521	trivial style refactor - pull out of patch?

Rebase

Harbormaster completed remote builds in B72586: Diff 293576.Sep 22 2020, 4:09 PM

RKSimon added inline comments.Sep 23 2020, 4:24 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
218–219	Are we going to have a problem if VL[0] is UndefValue?

ABataev added inline comments.Sep 23 2020, 4:30 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
218–219	Yeah, will fix it.

Rebase + fix

Harbormaster completed remote builds in B72644: Diff 293701.Sep 23 2020, 5:42 AM

RKSimon added inline comments.Sep 23 2020, 5:52 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
16 ↗	(On Diff #293701)	These "feel" like regressions to me - any idea whats going on?

ABataev added inline comments.Sep 23 2020, 5:55 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
16 ↗	(On Diff #293701)	The cost model problem, if I recall it correctly. I investigated it before and found out that the cost model for AArch64 is not defined for long vectors in some cases and we fall back to the generic cost model evaluation which is not quite correct in many cases. Need to tweak the cost model for AArch64.

RKSimon added inline comments.Sep 23 2020, 6:15 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
16 ↗	(On Diff #293701)	Any instruction cost type (extract/shuffle/store?) in particular that needs better costs? It'd be good to at least raise a specific bug report to the aarch64 team

ABataev added inline comments.Sep 23 2020, 6:31 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
16 ↗	(On Diff #293701)	Do not remember already, need some time to investigate it again. Hope to do it by the end of this week. PS. There was a question about this test already.

spatel added inline comments.Sep 23 2020, 7:18 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9008	Is it necessary to copy these? If so, it would be better to name this function something like "getCopyOfExtraArgValues" to make that explicit. If not, we can just make this a standard 'get' method: const MapVector<Instruction , Value > &getExtraArgs() const { return ExtraArgs; } And then access the 'second' data in the user code?

ABataev added inline comments.Sep 24 2020, 6:26 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9008	We don't need to expose the `first` element of the `MapVector` here, it is not good from the general design point of view. I'll rename the member function.

Rebase + rename

Harbormaster completed remote builds in B72809: Diff 294040.Sep 24 2020, 6:59 AM

ABataev added inline comments.Sep 24 2020, 7:19 AM

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
16 ↗	(On Diff #293701)	Found the reason. It is the cost of shuffle of `TTI::SK_PermuteSingleSrc` kind. Before this patch, the test operated with the vector `<2 x i16>`, which is transformed to `llvm::MVT::v2i32` by type legalization function and the cost of this shuffle is tweaked to be `1` (see llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp, `AArch64TTIImpl::getShuffleCost`). The cost of this operation is 1, per table. With this patch, the original vector type is `<4 x i16>` which is transformed to `llvm::MVT::v4i16` and there is no optimized value for `TTI::SK_PermuteSingleSrc` in the table for this type and the function falls back to the pessimistic cost model and returns `18`. There are several TODOs int the file already about fixing the cost model for different shuffle operations.

Does anyone have any more comments?

spatel mentioned this in D88505: [InstCombine] ease alignment restriction for converting masked load to normal load.Sep 30 2020, 6:10 AM

spatel added inline comments.Sep 30 2020, 6:23 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3758–3760	Use isValidElementType() or check for undef directly? I still can't tell from the debug statement exactly what we are guarding against. Should the type check already be here even without this patch?
4802	Are we always creating a masked load for a vector with 2 elements? This logic needs a code comment to explain the cases.
6230	Please add code comment/example to explain what the difference is between these 2 clauses.
6261	Is Passthrough a full vector of undef elements? If so, it should be created/named that way (or directly in the call to CreateMaskedLoad()) rather than in the loop.
6321–6322	Similar to above (so can we add a helper function to avoid duplicating the code?): Please add code comment/example to explain what the difference is between these 2 clauses.

reverse ping

Herald added a subscriber: pengfei. · View Herald TranscriptOct 14 2020, 5:56 AM

In D57059#2329971, @RKSimon wrote:

reverse ping

Will update the patch as soon as I'm back to work, in 2-3 weeks.

reverse ping?

In D57059#2379030, @RKSimon wrote:

reverse ping?

Need some time to setup my dev environment, will update ASAP

ABataev added inline comments.Nov 9 2020, 10:37 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3758–3760	I was just trying to protect the code and try to support it only for simple types at first. There are some doubts that the cost for masked loads/stores is completed and I protected it to make it work only for simple types. I can remove this check if the cost model for masked ops is good enough.
4802	No, no need to do it for 2 elements, removed it.
6230	Fixed it, thanks.
6261	Fixed
6321–6322	Fixed, thanks!

RKSimon added inline comments.Nov 9 2020, 11:14 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3758–3760	masked load/store costs for constant masks should be good enough now (getScalarizationOverhead should now provide us with a reasonable fallback)

Rebase, updates and fixes

Harbormaster completed remote builds in B78299: Diff 304196.Nov 10 2020, 8:24 AM

All of my comments were addressed, so LGTM. But please wait for an official 'accept' from at least 1 other reviewer.

RKSimon added inline comments.Nov 12 2020, 9:35 AM

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
338	There isn't a ANY check-prefix atm (it was cleaned out in rG119e4550ddedc75e4 as part of the unused prefix cleanup) - please can you review?

ABataev added inline comments.Nov 12 2020, 9:39 AM

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
338	Yes, need to remove it, I think. Most probably, caused but not quite clean merge.

Rebase, test cleanup + small code improvements.

Harbormaster completed remote builds in B78688: Diff 304968.Nov 12 2020, 2:10 PM

Rebase

Harbormaster completed remote builds in B79634: Diff 306735.Nov 20 2020, 10:42 AM

xbolva00 added a subscriber: xbolva00.Nov 20 2020, 10:53 AM

xbolva00 added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	Regression on avx?

ABataev added inline comments.Nov 20 2020, 11:01 AM

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	Yes, looks like the issue with the cost of `@llvm.masked.gather` for masked gather with some undefs in the mask

craig.topper added inline comments.Nov 20 2020, 11:07 AM

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	Gather is slow on CPUs prior to AVX512. And its cost is proportional to the number of elements. I don't think the value of the mask should be a factor.

ABataev added inline comments.Nov 20 2020, 11:18 AM

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll
21–23	True, but in some cases it can be optimized into `gather + shuffle` instead of wide `gather`, if there are undefs in mask.

Rebase + improve handling of masked gathers.

Harbormaster completed remote builds in B79808: Diff 307098.Nov 23 2020, 9:22 AM

anton-afanasyev added inline comments.Nov 23 2020, 1:01 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2963	Could it be actually no one `Instruction` in `UseEntry` or should it be assert?
2967	"Lane 0" seems outdated here, but not sure about better description.
3151	`assert(NumberOfInstructions != 0 && "...")` and `if (NumberOfInstructions == 1)`?
4079	Comment typo: `aggrgate`.
6253–6254	`emplace_back()`
7703	typo: "indeces"

Fixed according to comments

Harbormaster completed remote builds in B79873: Diff 307204.Nov 23 2020, 2:59 PM

Fixed function name.

Harbormaster completed remote builds in B79874: Diff 307205.Nov 23 2020, 3:13 PM

Rebase

Harbormaster completed remote builds in B80547: Diff 308408.Nov 30 2020, 10:09 AM

Btw, I've observed significant compile-time regression with this patch: http://llvm-compile-time-tracker.com/compare.php?from=99d82412f822190a6caa3e3a5b9f87b71f56de47&to=81b636bae72c967f526bcd18de45a6f4a76daa41&stat=instructions (thanks to @nikic for awesome service). This could be justified in case of comparable performance improvements but have you done any benchmarking?

In D57059#2426996, @anton-afanasyev wrote:

Btw, I've observed significant compile-time regression with this patch: http://llvm-compile-time-tracker.com/compare.php?from=99d82412f822190a6caa3e3a5b9f87b71f56de47&to=81b636bae72c967f526bcd18de45a6f4a76daa41&stat=instructions (thanks to @nikic for awesome service). This could be justified in case of comparable performance improvements but have you done any benchmarking?

I have done a while back with SPECINT 2006 and as I remember results were good, but I am not sure that I could find those now. Yes, for me, having this new functionality with presented compile-time regression looks ok.

I dont think (geomean) 0.20% is significant compile time problem. TBH, I expected bigger CT regressions - up to 0.5% is fine IMHO.

Rebase

Harbormaster completed remote builds in B81116: Diff 309568.Dec 4 2020, 10:40 AM

Rebase

Harbormaster completed remote builds in B81501: Diff 310293.Dec 8 2020, 11:41 AM

AFAICT the only outstanding question is whether the compile time increase is acceptable?

In D57059#2442286, @RKSimon wrote:

AFAICT the only outstanding question is whether the compile time increase is acceptable?

I'd agree that geomean = 0.2% is acceptable for the change with such awesome performance impact, just noted that changed time compilation is significant in comparision with other changes. Generally it looks good to me apart from one minor unaddressed comment.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3764	Well, "unsortable" or "unprocessable" term would be more precise. But why did we change `if (sortPtrAccesses)...` to opposite condition? This change just duplicate debug output, since we didn't differentiate it. Also I'd prefer to see the same `if-else` structure as for the load case.

In D57059#2442527, @anton-afanasyev wrote:

In D57059#2442286, @RKSimon wrote:

AFAICT the only outstanding question is whether the compile time increase is acceptable?

I'd agree that geomean = 0.2% is acceptable for the change with such awesome performance impact, just noted that changed time compilation is significant in comparision with other changes. Generally it looks good to me apart from one minor unaddressed comment.

Could the summary of the revision be updated with the performance data? The thread is very long and I didn't spot where the measurements are, so it's hard to say what we're trading off here...

Generally though this level of geomean regression is fine, the issue is usually in the outliers. For example, I spot this this 4-5% regression:

CMakeFiles/lencod.dir/transform8x8.c.o 	3053M 	3188M (+4.41%)

It may be worthwhile to briefly check it in case something can be improved.

In D57059#2442542, @nikic wrote:

In D57059#2442527, @anton-afanasyev wrote:

In D57059#2442286, @RKSimon wrote:

AFAICT the only outstanding question is whether the compile time increase is acceptable?

I'd agree that geomean = 0.2% is acceptable for the change with such awesome performance impact, just noted that changed time compilation is significant in comparision with other changes. Generally it looks good to me apart from one minor unaddressed comment.

Could the summary of the revision be updated with the performance data? The thread is very long and I didn't spot where the measurements are, so it's hard to say what we're trading off here...

Will try to run the benchmarks and get fresh data.

Generally though this level of geomean regression is fine, the issue is usually in the outliers. For example, I spot this this 4-5% regression:
CMakeFiles/lencod.dir/transform8x8.c.o 	3053M 	3188M (+4.41%)
It may be worthwhile to briefly check it in case something can be improved.

I'll check what can be improved in terms of compile time. Not sure that will be able to improve it significantly since the patch itself does not add extensive analysis/transformations, just adds an extra 1 iteration for wider vector analysis. But I'll check it anyway and will try to improve things where possible.

While reviewing the latest update, I think I spotted SLP compile-time failure in SingleSource/Benchmarks/Misc/oourafft.c, here is the reduced testcase to reporduce:
source_filename = "/home/dtemirbulatov/llvm/test-suite/SingleSource/Benchmarks/Misc/oourafft.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local fastcc void @cft1st(double* %a) unnamed_addr #0 {
entry:

%0 = or i64 16, 2
%arrayidx107 = getelementptr inbounds double, double* %a, i64 %0
%1 = or i64 16, 3
%arrayidx114 = getelementptr inbounds double, double* %a, i64 %1
%2 = or i64 16, 4
%arrayidx131 = getelementptr inbounds double, double* %a, i64 %2
%3 = or i64 16, 6
%arrayidx134 = getelementptr inbounds double, double* %a, i64 %3
%4 = load double, double* %arrayidx134, align 8
%5 = or i64 16, 5
%arrayidx138 = getelementptr inbounds double, double* %a, i64 %5
%6 = or i64 16, 7
%arrayidx141 = getelementptr inbounds double, double* %a, i64 %6
%7 = load double, double* %arrayidx141, align 8
%sub149 = fsub double undef, %4
%sub156 = fsub double undef, %7
store double undef, double* %arrayidx131, align 8
store double undef, double* %arrayidx138, align 8
%sub178 = fsub double undef, %sub156
%add179 = fadd double undef, %sub149
%mul180 = fmul double undef, %sub178
%sub182 = fsub double %mul180, undef
store double %sub182, double* %arrayidx107, align 8
%mul186 = fmul double undef, %add179
%add188 = fadd double %mul186, undef
store double %add188, double* %arrayidx114, align 8
unreachable

}

attributes #0 = { "target-features"="+avx,+avx2,+bmi,+bmi2,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" }

!llvm.ident = !{!0}

!0 = !{!"clang version 12.0.0 (https://github.com/llvm/llvm-project.git aaa925795f93c389a96ee01bab73bc2b6b771cbb)"}

In D57059#2443350, @dtemirbulatov wrote:
While reviewing the latest update, I think I spotted SLP compile-time failure in SingleSource/Benchmarks/Misc/oourafft.c, here is the reduced testcase to reporduce:
source_filename = "/home/dtemirbulatov/llvm/test-suite/SingleSource/Benchmarks/Misc/oourafft.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local fastcc void @cft1st(double* %a) unnamed_addr #0 {
entry:
%0 = or i64 16, 2
%arrayidx107 = getelementptr inbounds double, double* %a, i64 %0
%1 = or i64 16, 3
%arrayidx114 = getelementptr inbounds double, double* %a, i64 %1
%2 = or i64 16, 4
%arrayidx131 = getelementptr inbounds double, double* %a, i64 %2
%3 = or i64 16, 6
%arrayidx134 = getelementptr inbounds double, double* %a, i64 %3
%4 = load double, double* %arrayidx134, align 8
%5 = or i64 16, 5
%arrayidx138 = getelementptr inbounds double, double* %a, i64 %5
%6 = or i64 16, 7
%arrayidx141 = getelementptr inbounds double, double* %a, i64 %6
%7 = load double, double* %arrayidx141, align 8
%sub149 = fsub double undef, %4
%sub156 = fsub double undef, %7
store double undef, double* %arrayidx131, align 8
store double undef, double* %arrayidx138, align 8
%sub178 = fsub double undef, %sub156
%add179 = fadd double undef, %sub149
%mul180 = fmul double undef, %sub178
%sub182 = fsub double %mul180, undef
store double %sub182, double* %arrayidx107, align 8
%mul186 = fmul double undef, %add179
%add188 = fadd double %mul186, undef
store double %add188, double* %arrayidx114, align 8
unreachable
}

attributes #0 = { "target-features"="+avx,+avx2,+bmi,+bmi2,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" }

!llvm.ident = !{!0}

!0 = !{!"clang version 12.0.0 (https://github.com/llvm/llvm-project.git aaa925795f93c389a96ee01bab73bc2b6b771cbb)"}

Do you mean compile time increasing? With this patch?

Do you mean compile time increasing? With this patch?

no, just compile-time error.

In D57059#2443492, @dtemirbulatov wrote:

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Crash or incorrect code?

In D57059#2443496, @ABataev wrote:

In D57059#2443492, @dtemirbulatov wrote:

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Crash or incorrect code?

Crash.

wxiao3 added a subscriber: wxiao3.Dec 14 2020, 7:09 AM

ABataev edited the summary of this revision. (Show Details)Feb 10 2021, 6:01 AM

Extra numbers:

AVX512, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   146.00   148.00   1.4%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   146.00   148.00   1.4%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    34.00    34.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    34.00    34.00   0.0%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  5587.00  5560.00  -0.5%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  5587.00  5560.00  -0.5%
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  7384.00  7341.00  -0.6%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  9607.00  9359.00  -2.6%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5340.00  5178.00  -3.0%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1053.00  1006.00  -4.5%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1053.00  1006.00  -4.5%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   141.00   134.00  -5.0%
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   141.00   134.00  -5.0%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3996.00  3563.00 -10.8%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3996.00  3563.00 -10.8%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   862.00   767.00 -11.0%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   862.00   767.00 -11.0%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   524.00   463.00 -11.6%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   524.00   463.00 -11.6%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   426.00   370.00 -13.1%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   426.00   370.00 -13.1%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 15945.00 12573.00 -21.1%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test      NaN    16.00   nan%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test      NaN    16.00   nan%
                                                             Geomean difference                     nan%

AVX512, O3+LTO
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test    22.00    60.00 172.7%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test    22.00    60.00 172.7%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test    68.00    72.00   5.9%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test    68.00    72.00   5.9%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    10.00    10.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  3396.00  3396.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    10.00    10.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  3396.00  3396.00   0.0%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   499.00   497.00  -0.4%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   499.00   497.00  -0.4%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   838.00   826.00  -1.4%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   838.00   826.00  -1.4%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  6090.00  5906.00  -3.0%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test   131.00   127.00  -3.1%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test   131.00   127.00  -3.1%
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  8815.00  8452.00  -4.1%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  2864.00  2712.00  -5.3%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  2864.00  2712.00  -5.3%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 16049.00 14753.00  -8.1%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   686.00   621.00  -9.5%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   686.00   621.00  -9.5%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   551.00   473.00 -14.2%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   551.00   473.00 -14.2%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 16240.00 13941.00 -14.2%
                                                             Geomean difference                     4.3%

AVX2, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  7309.00  7341.00   0.4%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    34.00    34.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  5490.00  5490.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    34.00    34.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  5490.00  5490.00   0.0%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   462.00   455.00  -1.5%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   462.00   455.00  -1.5%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  9508.00  9347.00  -1.7%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5393.00  5190.00  -3.8%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1066.00   968.00  -9.2%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1066.00   968.00  -9.2%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   151.00   134.00 -11.3%
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   151.00   134.00 -11.3%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   160.00   141.00 -11.9%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   160.00   141.00 -11.9%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   820.00   722.00 -12.0%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   820.00   722.00 -12.0%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3605.00  3173.00 -12.0%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3605.00  3173.00 -12.0%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   438.00   370.00 -15.5%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   438.00   370.00 -15.5%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14842.00 12463.00 -16.0%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test   106.00    79.00 -25.5%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test   106.00    79.00 -25.5%
                                                             Geomean difference                    -8.7%

AVX2, O3+LTO
Metric: SLP.NumVectorInstructions

Program                                                                         lhs      rhs      diff
         test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test    22.00    60.00 172.7%
          test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test    22.00    60.00 172.7%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test    68.00    72.00   5.9%
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test    68.00    72.00   5.9%
              test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test    11.00    11.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    10.00    10.00   0.0%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  3396.00  3396.00   0.0%
             test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test    11.00    11.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    10.00    10.00   0.0%
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  3396.00  3396.00   0.0%
     test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   499.00   497.00  -0.4%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   499.00   497.00  -0.4%
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   838.00   826.00  -1.4%
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   838.00   826.00  -1.4%
              test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test   131.00   127.00  -3.1%
               test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test   131.00   127.00  -3.1%
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  6094.00  5906.00  -3.1%
             test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  8734.00  8452.00  -3.2%
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  2867.00  2712.00  -5.4%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  2867.00  2712.00  -5.4%
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 16026.00 14753.00  -7.9%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   686.00   621.00  -9.5%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   686.00   621.00  -9.5%
         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 16241.00 13941.00 -14.2%
              test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   559.00   473.00 -15.4%
               test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   559.00   473.00 -15.4%
                                                             Geomean difference                     4.2%

Will update the patch soon.

Rework, bug fixes, rebase

This is an integral patch, going to split it into several smaller patches.

Harbormaster completed remote builds in B88714: Diff 322816.Feb 10 2021, 8:03 PM

Btw, how could it be explained NumVectorInstructions stat reducing after this patch?

In D57059#2553914, @ABataev wrote:

Extra numbers:

AVX512, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

...
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3996.00  3563.00 -10.8%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3996.00  3563.00 -10.8%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   862.00   767.00 -11.0%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   862.00   767.00 -11.0%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   524.00   463.00 -11.6%
...

In D57059#2555233, @ABataev wrote:

This is an integral patch, going to split it into several smaller patches.

Are you planning to send for review all of this patches or just to commit them after this integral review? I.e. should we go on with review here?

In D57059#2578657, @anton-afanasyev wrote:

Btw, how could it be explained NumVectorInstructions stat reducing after this patch?

In D57059#2553914, @ABataev wrote:

Extra numbers:

AVX512, O3+LTO, -march=native
Metric: SLP.NumVectorInstructions

...
      test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3996.00  3563.00 -10.8%
       test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3996.00  3563.00 -10.8%
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   862.00   767.00 -11.0%
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   862.00   767.00 -11.0%
      test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   524.00   463.00 -11.6%
...

Actually, it is not reducing. This is how test-suite python script works. So, here lhs - number of instructions after this patch, rhs - before. And the less relative number, the more vector instructions we actually generate.

In D57059#2578661, @anton-afanasyev wrote:

In D57059#2555233, @ABataev wrote:

This is an integral patch, going to split it into several smaller patches.

Are you planning to send for review all of this patches or just to commit them after this integral review? I.e. should we go on with review here?

You can publish your comments here, no need to wait for small patches.

Actually, it is not reducing. This is how test-suite python script works. So, here lhs - number of instructions after this patch, rhs - before. And the less relative number, the more vector instructions we actually generate.

Oh, I see. There are still several reducing cases though.

In D57059#2578770, @anton-afanasyev wrote:

Actually, it is not reducing. This is how test-suite python script works. So, here lhs - number of instructions after this patch, rhs - before. And the less relative number, the more vector instructions we actually generate.

Oh, I see. There are still several reducing cases though.

Actually, no. I compared the resulting IR files for these cases - they are absolutely the same as before. It is just we generating fewer ExtractElement/InsertElement instructions and directly emit Shuffle instructions in some cases. That's why it seems it generates fewer vector instructions though it is not.

ABataev added a child revision: D97406: [Instcombiner]Improve emission of logical or/and reductions..Feb 24 2021, 1:06 PM

ABataev mentioned this in rG04ba80ca4dee: [Instcombiner]Improve emission of logical or/and reductions..Mar 4 2021, 8:02 AM

Rebase

Harbormaster completed remote builds in B94525: Diff 331650.Mar 18 2021, 1:08 PM

Removed logical reductions conversion code.

Harbormaster completed remote builds in B94702: Diff 331873.Mar 19 2021, 8:30 AM

ABataev mentioned this in D98967: [Analysis]Add getPointersDiff function to improve compile time..Mar 19 2021, 10:33 AM

ABataev mentioned this in rG065a14a12d26: [Analysis]Add getPointersDiff function to improve compile time..Mar 23 2021, 12:59 PM

ABataev mentioned this in rG99203f2004d0: [Analysis]Add getPointersDiff function to improve compile time..Mar 23 2021, 2:26 PM

Rebase

Harbormaster completed remote builds in B95477: Diff 332970.Mar 24 2021, 11:00 AM

Rebase

Harbormaster completed remote builds in B96712: Diff 334682.Apr 1 2021, 7:42 AM

What is the status of this patch? any blockers? or just lack of reviewers?

In D57059#2666065, @xbolva00 wrote:

What is the status of this patch? any blockers? or just lack of reviewers?

The patch is just too big, I'm splitting it into smaller chunks and commit them step by step. The final chunk would be largest one. But before need to commit other improvements that can be separated from the patch.

ABataev mentioned this in D99796: [SLP]Improve vectorization of the CmpInst instructions..Apr 2 2021, 8:25 AM

ABataev mentioned this in rG00a84f9a7f89: [SLP]Improve vectorization of the CmpInst instructions..Apr 5 2021, 6:50 AM

Rebase

Harbormaster completed remote builds in B97157: Diff 335310.Apr 5 2021, 12:59 PM

ABataev mentioned this in D99980: [SLP]Improve cost model for the vectorized extractelements..Apr 6 2021, 11:09 AM

ABataev mentioned this in rGe99b98cb1bca: [SLP]Improve cost model for the vectorized extractelements..Apr 22 2021, 7:41 AM

ABataev mentioned this in D101109: [SLP]Improve multinode analysis..Apr 22 2021, 2:01 PM

ABataev mentioned this in D101297: [SLP]Allow masked gathers only if allowed by target..Apr 26 2021, 7:41 AM

ABataev mentioned this in rGb5f64768cfee: [SLP]Allow masked gathers only if allowed by target..May 3 2021, 7:05 AM

ABataev mentioned this in rGfd18547e0721: [SLP]Allow masked gathers only if allowed by target..May 3 2021, 8:07 AM

Hi Alexey! Is this patch ready for reviewing or will other patches be splitted from this one?

In D57059#2766508, @anton-afanasyev wrote:

Hi Alexey! Is this patch ready for reviewing or will other patches be splitted from this one?

Just like I said before, it must be split into several smaller patches. I'm doing this step by step, there is D101109, which is part of this big patch. I want to commit it, then rebase this patch and split it again, there are some other parts that can be committed independently.

ABataev mentioned this in D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..May 20 2021, 6:31 AM

SjoerdMeijer added a subscriber: SjoerdMeijer.May 20 2021, 6:35 AM

ABataev mentioned this in D103247: [SLP]Allow to reorder nodes with >2 scalar values..May 27 2021, 6:48 AM

Is it worth rebasing this to show the remaining diffs that still need to go in?

In D57059#2785073, @RKSimon wrote:

Is it worth rebasing this to show the remaining diffs that still need to go in?

There were not many commits for it, need to commit some extra patches to fix some regressions, but any way I've started rebasing. We'll try to rebase it next week

ABataev mentioned this in D103458: [SLP]Improve gathering of scalar elements..Jun 1 2021, 6:53 AM

ABataev mentioned this in rG89f3bc7698c5: [SLP]Allow to reorder nodes with >2 scalar values..Jun 3 2021, 10:03 AM

ABataev mentioned this in D103638: [SLP]Improve vectorization of PHI instructions..Jun 3 2021, 11:18 AM

ABataev mentioned this in rGa0086add2e52: [SLP]Improve gathering of scalar elements..Jun 9 2021, 5:24 AM

ABataev mentioned this in D104122: [SLP]Improve vectorization of stores..Jun 11 2021, 8:07 AM

ABataev mentioned this in rG908b7536615e: [SLP]Improve vectorization of PHI instructions..Jun 21 2021, 12:27 PM

anton-afanasyev mentioned this in D105042: [SLP][COST][X86]Improve cost model for masked gather..Jul 6 2021, 3:33 PM

ABataev mentioned this in rGc574d2fbaca4: [SLP]Improve vectorization of stores..Jul 8 2021, 12:49 PM

ABataev mentioned this in D105986: [SLP]Improve vectorization of gathered loads..Jul 14 2021, 7:41 AM

In D57059#2785076, @ABataev wrote:

In D57059#2785073, @RKSimon wrote:

Is it worth rebasing this to show the remaining diffs that still need to go in?

There were not many commits for it, need to commit some extra patches to fix some regressions, but any way I've started rebasing. We'll try to rebase it next week

Any chance that you could refresh this patch with your rebase please? I'm investigating a lot of 'float3' style performance issues at the moment (PR50920, PR51075, PR51091) and I'd like to get a better idea of how close the non-pow2 slp support will get us. Thanks.

In D57059#2877539, @RKSimon wrote:

In D57059#2785076, @ABataev wrote:

In D57059#2785073, @RKSimon wrote:

Is it worth rebasing this to show the remaining diffs that still need to go in?

There were not many commits for it, need to commit some extra patches to fix some regressions, but any way I've started rebasing. We'll try to rebase it next week

Any chance that you could refresh this patch with your rebase please? I'm investigating a lot of 'float3' style performance issues at the moment (PR50920, PR51075, PR51091) and I'd like to get a better idea of how close the non-pow2 slp support will get us. Thanks.

I will try to do it ASAP.

That's awesome thanks, it'll definitely help improve the feedback I can give on the patch series.

RKSimon mentioned this in D106399: [VectorCombine] Widening of partial vector loads.Jul 21 2021, 4:59 AM

Rebase. Did not test it thoroughly, just rebased and fixed test cases.

Harbormaster completed remote builds in B115286: Diff 360420.Jul 21 2021, 6:59 AM

Thank you!

Thank you, checked this patch after rebase, trying to fix PR49933. It works well for it, reported to https://bugs.llvm.org/show_bug.cgi?id=49933.

anton-afanasyev mentioned this in rGdd028c359e09: [SLP][Test] Add tests for PR47624 and PR49933.Sep 4 2021, 3:18 PM

nick added a subscriber: nick.Sep 4 2021, 4:30 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 8:01 PM

ABataev mentioned this in rGbd053769867f: [SLP]Improve multinode analysis..Dec 14 2021, 6:18 AM

ABataev mentioned this in D123516: Fix SLP score for out of order contiguous loads.Apr 12 2022, 11:38 AM

liaolucy added a subscriber: liaolucy.May 17 2022, 7:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 7:18 PM

Herald added subscribers: kosarev, StephenFan. · View Herald Transcript

Current status?

Herald added subscribers: • pcwang-thead, nlopes. · View Herald TranscriptSep 7 2022, 12:53 PM

In D57059#3775369, @xbolva00 wrote:

Current status?

Still requires several patches to commit.

In D57059#3775511, @ABataev wrote:

In D57059#3775369, @xbolva00 wrote:

Current status?

Still requires several patches to commit.

Are those patches linked somewhere?

Herald added a subscriber: wangpc. · View Herald TranscriptAug 8 2023, 6:40 AM

In D57059#4569337, @danilaml wrote:

In D57059#3775511, @ABataev wrote:

In D57059#3775369, @xbolva00 wrote:

Current status?

Still requires several patches to commit.

Are those patches linked somewhere?

Almost all my current SLP patches are related to non-power-of-2 support. But even these patches are not the full list. Need to add several others after.

sunshaoce added a subscriber: sunshaoce.Aug 17 2023, 2:19 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

2037 lines

test/

Transforms/

SLPVectorizer/

AArch64/

gather-root.ll

16 lines

horizontal.ll

2 lines

transpose-inseltpoison.ll

137 lines

transpose.ll

137 lines

vectorize-free-extracts-inserts.ll

273 lines

AMDGPU/

add_sub_sat-inseltpoison.ll

30 lines

add_sub_sat.ll

30 lines

SystemZ/

pr34619.ll

23 lines

X86/

PR35865-inseltpoison.ll

8 lines

PR35865.ll

8 lines

PR39774.ll

31 lines

alternate-calls-inseltpoison.ll

104 lines

alternate-calls.ll

104 lines

alternate-cast-inseltpoison.ll

130 lines

alternate-cast.ll

130 lines

alternate-fp-inseltpoison.ll

8 lines

alternate-fp.ll

8 lines

alternate-int-inseltpoison.ll

199 lines

alternate-int.ll

199 lines

arith-fp-inseltpoison.ll

60 lines

arith-fp.ll

60 lines

blending-shuffle-inseltpoison.ll

65 lines

blending-shuffle.ll

65 lines

cmp_commute-inseltpoison.ll

8 lines

cmp_commute.ll

8 lines

commutativity.ll

42 lines

compare-reduce.ll

29 lines

crash_exceed_scheduling.ll

4 lines

crash_reordering_undefs.ll

26 lines

crash_vectorizeTree.ll

27 lines

18 lines

91 lines

11 lines

4 lines

fptosi-inseltpoison.ll

159 lines

159 lines

186 lines

43 lines

191 lines

insert-element-build-vector-inseltpoison.ll

51 lines

insert-element-build-vector.ll

51 lines

load-merge-inseltpoison.ll

26 lines

load-merge.ll

26 lines

lookahead.ll

54 lines

minimum-sizes.ll

23 lines

no_alternate_divrem.ll

56 lines

78 lines

2 lines

66 lines

35 lines

40 lines

22 lines

pr47629-inseltpoison.ll

505 lines

pr47629.ll

489 lines

pr49081.ll

13 lines

reduction-logical.ll

29 lines

reorder_repeated_ops.ll

8 lines

resched.ll

62 lines

revectorized_rdx_crash.ll

17 lines

rgb_phi.ll

49 lines

schedule-bundle.ll

16 lines

shrink_after_reorder.ll

7 lines

supernode.ll

2 lines

value-bug-inseltpoison.ll

2 lines

value-bug.ll

2 lines

vec_list_bias-inseltpoison.ll

31 lines

vec_list_bias.ll

31 lines

vectorize-reorder-reuse.ll

52 lines

Diff 360420

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> LookAheadUsersBudget(
"slp-look-ahead-users-budget", cl::init(2), cl::Hidden,		"slp-look-ahead-users-budget", cl::init(2), cl::Hidden,
cl::desc("The maximum number of users to visit while visiting the "		cl::desc("The maximum number of users to visit while visiting the "
"predecessors. This prevents compilation time increase."));		"predecessors. This prevents compilation time increase."));

static cl::opt<bool>		static cl::opt<bool>
ViewSLPTree("view-slp-tree", cl::Hidden,		ViewSLPTree("view-slp-tree", cl::Hidden,
cl::desc("Display the SLP trees with Graphviz"));		cl::desc("Display the SLP trees with Graphviz"));

		// FIXME: These 2 options are required to avoid regressions in O3+LTO because of
		// too early optimizations at compile time.
		static cl::opt<unsigned>
		MinNonPow2StoresSize("slp-min-non-power2-stores-size", cl::init(6),
		cl::Hidden,
		cl::desc("The minimum number of non-power-2 stores to "
		"vectorize to try to use masked stores."));

		static cl::opt<unsigned>
		MinNonPow2ValuesSize("slp-min-non-power2-values-size", cl::init(4),
		cl::Hidden,
		cl::desc("The minimum number of non-power-2 non-store "
		"values to try the vectorization."));

// Limit the number of alias checks. The limit is chosen so that		// Limit the number of alias checks. The limit is chosen so that
// it has no negative effect on the llvm benchmarks.		// it has no negative effect on the llvm benchmarks.
static const unsigned AliasedCheckLimit = 10;		static const unsigned AliasedCheckLimit = 10;

// Another limit for the alias checks: The maximum distance between load/store		// Another limit for the alias checks: The maximum distance between load/store
// instructions where alias checks are done.		// instructions where alias checks are done.
// This limit is useful for very large basic blocks.		// This limit is useful for very large basic blocks.
static const unsigned MaxMemDepDistance = 160;		static const unsigned MaxMemDepDistance = 160;
Show All 10 Lines
/// avoids spending time checking the cost model and realizing that they will		/// avoids spending time checking the cost model and realizing that they will
/// be inevitably scalarized.		/// be inevitably scalarized.
static bool isValidElementType(Type *Ty) {		static bool isValidElementType(Type *Ty) {
return VectorType::isValidElementType(Ty) && !Ty->isX86_FP80Ty() &&		return VectorType::isValidElementType(Ty) && !Ty->isX86_FP80Ty() &&
!Ty->isPPC_FP128Ty();		!Ty->isPPC_FP128Ty();
}		}

/// \returns true if all of the instructions in \p VL are in the same block or		/// \returns true if all of the instructions in \p VL are in the same block or
/// false otherwise.		/// false otherwise.
static bool allSameBlock(ArrayRef<Value *> VL) {		template <typename T> static bool allSameBlock(T &&VL) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Are we going to have a problem if VL[0] is UndefValue? RKSimon: Are we going to have a problem if VL[0] is UndefValue?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yeah, will fix it. ABataev: Yeah, will fix it.
Instruction *I0 = dyn_cast<Instruction>(VL[0]);		if (empty(VL))
if (!I0)
return false;
BasicBlock *BB = I0->getParent();
for (int I = 1, E = VL.size(); I < E; I++) {
auto *II = dyn_cast<Instruction>(VL[I]);
if (!II)
return false;

if (BB != II->getParent())
return false;
}
return true;		return true;
		auto I0 = cast<Instruction>(VL.begin());
		BasicBlock *BB = I0->getParent();
		return all_of(drop_begin(VL, 1), [BB](Value *V) {
		return BB == cast<Instruction>(V)->getParent();
		});
}		}

/// \returns True if the value is a constant (but not globals/constant		/// \returns True if the value is a constant (but not globals/constant
/// expressions).		/// expressions).
static bool isConstant(Value *V) {		static bool isConstant(Value *V) {
return isa<Constant>(V) && !isa<ConstantExpr>(V) && !isa<GlobalValue>(V);		return isa<Constant>(V) && !isa<ConstantExpr>(V) && !isa<GlobalValue>(V);
}		}

/// \returns True if all of the values in \p VL are constants (but not		/// \returns True if all of the values in \p VL are constants (but not
/// globals/constant expressions).		/// globals/constant expressions).
static bool allConstant(ArrayRef<Value *> VL) {		static bool allConstant(ArrayRef<Value *> VL) {
// Constant expressions and globals can't be vectorized like normal integer/FP		// Constant expressions and globals can't be vectorized like normal integer/FP
// constants.		// constants.
return all_of(VL, isConstant);		return all_of(VL, isConstant);
}		}

/// \returns True if all of the values in \p VL are identical.		/// \returns True if all defined values in \p VL are identical.
static bool isSplat(ArrayRef<Value *> VL) {		static bool isSplat(ArrayRef<Value *> VL) {
for (unsigned i = 1, e = VL.size(); i < e; ++i)		Value *VL0 = nullptr;
if (VL[i] != VL[0])		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
		if (!VL0) {
		VL0 = V;
		continue;
		}
		if (V != VL0)
return false;		return false;
		}
return true;		return true;
}		}

/// \returns True if \p I is commutative, handles CmpInst and BinaryOperator.		/// \returns True if \p I is commutative, handles CmpInst and BinaryOperator.
static bool isCommutative(Instruction *I) {		static bool isCommutative(Instruction *I) {
if (auto *Cmp = dyn_cast<CmpInst>(I))		if (auto *Cmp = dyn_cast<CmpInst>(I))
return Cmp->isCommutative();		return Cmp->isCommutative();
if (auto *BO = dyn_cast<BinaryOperator>(I))		if (auto *BO = dyn_cast<BinaryOperator>(I))
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	static bool isValidForAlternation(unsigned Opcode) {
return true;		return true;
}		}

/// \returns analysis of the Instructions in \p VL described in		/// \returns analysis of the Instructions in \p VL described in
/// InstructionsState, the Opcode that we suppose the whole list		/// InstructionsState, the Opcode that we suppose the whole list
/// could be vectorized even if its structure is diverse.		/// could be vectorized even if its structure is diverse.
static InstructionsState getSameOpcode(ArrayRef<Value *> VL,		static InstructionsState getSameOpcode(ArrayRef<Value *> VL,
unsigned BaseIndex = 0) {		unsigned BaseIndex = 0) {
// Make sure these are all Instructions.		// Make sure these are all Instructions or UndefValues.
if (llvm::any_of(VL, [](Value *V) { return !isa<Instruction>(V); }))		auto &&IsNotInstructionOrAllUndefs = [](ArrayRef<Value *> VL) {
		bool AllUndefs = true;
		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
		if (isa<Instruction>(V)) {
		AllUndefs = false;
		continue;
		}
		return true;
		}
		return AllUndefs;
		};
		if (IsNotInstructionOrAllUndefs(VL))
return InstructionsState(VL[BaseIndex], nullptr, nullptr);		return InstructionsState(VL[BaseIndex], nullptr, nullptr);
		BaseIndex =
		std::distance(VL.begin(), llvm::find_if(llvm::drop_begin(VL, BaseIndex),
		Instruction::classof));

bool IsCastOp = isa<CastInst>(VL[BaseIndex]);		bool IsCastOp = isa<CastInst>(VL[BaseIndex]);
		RKSimonUnsubmitted Done Reply Inline Actions Worth using find_if ? RKSimon: Worth using find_if ?
bool IsBinOp = isa<BinaryOperator>(VL[BaseIndex]);		bool IsBinOp = isa<BinaryOperator>(VL[BaseIndex]);
unsigned Opcode = cast<Instruction>(VL[BaseIndex])->getOpcode();		unsigned Opcode = cast<Instruction>(VL[BaseIndex])->getOpcode();
unsigned AltOpcode = Opcode;		unsigned AltOpcode = Opcode;
unsigned AltIndex = BaseIndex;		unsigned AltIndex = BaseIndex;

// Check for one alternate opcode from another BinaryOperator.		// Check for one alternate opcode from another BinaryOperator.
// TODO - generalize to support all operators (types, calls etc.).		// TODO - generalize to support all operators (types, calls etc.).
for (int Cnt = 0, E = VL.size(); Cnt < E; Cnt++) {		for (int Cnt = 0, E = VL.size(); Cnt < E; Cnt++) {
		if (isa<UndefValue>(VL[Cnt]))
		continue;
unsigned InstOpcode = cast<Instruction>(VL[Cnt])->getOpcode();		unsigned InstOpcode = cast<Instruction>(VL[Cnt])->getOpcode();
if (IsBinOp && isa<BinaryOperator>(VL[Cnt])) {		if (IsBinOp && isa<BinaryOperator>(VL[Cnt])) {
if (InstOpcode == Opcode \|\| InstOpcode == AltOpcode)		if (InstOpcode == Opcode \|\| InstOpcode == AltOpcode)
continue;		continue;
if (Opcode == AltOpcode && isValidForAlternation(InstOpcode) &&		if (Opcode == AltOpcode && isValidForAlternation(InstOpcode) &&
isValidForAlternation(Opcode)) {		isValidForAlternation(Opcode)) {
AltOpcode = InstOpcode;		AltOpcode = InstOpcode;
AltIndex = Cnt;		AltIndex = Cnt;
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
}		}

namespace llvm {		namespace llvm {

static void inversePermutation(ArrayRef<unsigned> Indices,		static void inversePermutation(ArrayRef<unsigned> Indices,
SmallVectorImpl<int> &Mask) {		SmallVectorImpl<int> &Mask) {
Mask.clear();		Mask.clear();
const unsigned E = Indices.size();		const unsigned E = Indices.size();
Mask.resize(E, E + 1);		Mask.resize(E, UndefMaskElem);
for (unsigned I = 0; I < E; ++I)		for (unsigned I = 0; I < E; ++I)
		if (Indices[I] < E)
Mask[Indices[I]] = I;		Mask[Indices[I]] = I;
}		}

/// \returns inserting index of InsertElement or InsertValue instruction,		/// \returns inserting index of InsertElement or InsertValue instruction,
/// using Offset as base offset for index.		/// using Offset as base offset for index.
static Optional<int> getInsertIndex(Value *InsertInst, unsigned Offset) {		static Optional<int> getInsertIndex(Value *InsertInst, unsigned Offset) {
int Index = Offset;		int Index = Offset;
if (auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {		if (auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {
if (auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2))) {		if (auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2))) {
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	public:
void buildTree(ArrayRef<Value *> Roots,		void buildTree(ArrayRef<Value *> Roots,
ExtraValueToDebugLocsMap &ExternallyUsedValues,		ExtraValueToDebugLocsMap &ExternallyUsedValues,
ArrayRef<Value *> UserIgnoreLst = None);		ArrayRef<Value *> UserIgnoreLst = None);

/// Clear the internal data structures that are created by 'buildTree'.		/// Clear the internal data structures that are created by 'buildTree'.
void deleteTree() {		void deleteTree() {
VectorizableTree.clear();		VectorizableTree.clear();
ScalarToTreeEntry.clear();		ScalarToTreeEntry.clear();
		EntryVFs.clear();
MustGather.clear();		MustGather.clear();
		GatheredLoads.clear();
		GatheredLoadsEntriesFirst = -1;
ExternalUses.clear();		ExternalUses.clear();
NumOpsWantToKeepOrder.clear();		NumOpsWantToKeepOrder.clear();
NumOpsWantToKeepOriginalOrder = 0;		NumOpsWantToKeepOriginalOrder = 0;
for (auto &Iter : BlocksSchedules) {		for (auto &Iter : BlocksSchedules) {
BlockScheduling *BS = Iter.second.get();		BlockScheduling *BS = Iter.second.get();
BS->clear();		BS->clear();
}		}
MinBWs.clear();		MinBWs.clear();
Show All 40 Lines	public:
/// be reordered, the best order will be \<1, 0\>. We need to extend this		/// be reordered, the best order will be \<1, 0\>. We need to extend this
/// order for the root node. For the root node this order should look like		/// order for the root node. For the root node this order should look like
/// \<3, 0, 1, 2\>. This function extends the order for the reused		/// \<3, 0, 1, 2\>. This function extends the order for the reused
/// instructions.		/// instructions.
void findRootOrder(OrdersType &Order) {		void findRootOrder(OrdersType &Order) {
// If the leaf has the same number of instructions to vectorize as the root		// If the leaf has the same number of instructions to vectorize as the root
// - order must be set already.		// - order must be set already.
unsigned RootSize = VectorizableTree[0]->Scalars.size();		unsigned RootSize = VectorizableTree[0]->Scalars.size();
if (Order.size() == RootSize)		// Checks if the order is normalized relatively the root node, i.e. it has
		// the same number of undef elements (undef element is equal to RootSize
		// value) as the root node scalars.
		auto &&IsNormalizedOrder = [this, RootSize](const OrdersType &Order) {
		return count(Order, RootSize) ==
		count_if(VectorizableTree[0]->Scalars, UndefValue::classof);
		};
		// Check if the current order has the same number of undefined elements as
		// the root node.
		if (IsNormalizedOrder(Order))
return;		return;
SmallVector<unsigned, 4> RealOrder(Order.size());
std::swap(Order, RealOrder);
SmallVector<int, 4> Mask;
inversePermutation(RealOrder, Mask);
Order.assign(Mask.begin(), Mask.end());
// The leaf has less number of instructions - need to find the true order of		// The leaf has less number of instructions - need to find the true order of
// the root.		// the root.
// Scan the nodes starting from the leaf back to the root.		// Scan the nodes starting from the leaf back to the root.
const TreeEntry *PNode = VectorizableTree.back().get();		const TreeEntry *PNode = VectorizableTree.back().get();
SmallVector<const TreeEntry *, 4> Nodes(1, PNode);		SmallVector<const TreeEntry *, 4> Nodes(1, PNode);
SmallPtrSet<const TreeEntry *, 4> Visited;		SmallPtrSet<const TreeEntry *, 4> Visited;
while (!Nodes.empty() && Order.size() != RootSize) {		while (!Nodes.empty() && !IsNormalizedOrder(Order)) {
const TreeEntry *PNode = Nodes.pop_back_val();		const TreeEntry *PNode = Nodes.pop_back_val();
if (!Visited.insert(PNode).second)		if (!Visited.insert(PNode).second)
continue;		continue;
const TreeEntry &Node = *PNode;		const TreeEntry &Node = *PNode;
for (const EdgeInfo &EI : Node.UserTreeIndices)		for (const EdgeInfo &EI : Node.UserTreeIndices)
if (EI.UserTE)		if (EI.UserTE)
Nodes.push_back(EI.UserTE);		Nodes.push_back(EI.UserTE);
if (Node.ReuseShuffleIndices.empty())		if (Node.ReuseShuffleIndices.empty())
continue;		continue;
// Build the order for the parent node.		// Build the order for the parent node.
OrdersType NewOrder(Node.ReuseShuffleIndices.size(), RootSize);		SmallVector<int, 4> Mask;
SmallVector<unsigned, 4> OrderCounter(Order.size(), 0);		inversePermutation(Order, Mask);
		Order.assign(RootSize, RootSize);
		SmallVector<unsigned, 4> OrderCounter(RootSize + 1, 0);
// The algorithm of the order extension is:		// The algorithm of the order extension is:
// 1. Calculate the number of the same instructions for the order.		// 1. Calculate the number of the same instructions for the order.
// 2. Calculate the index of the new order: total number of instructions		// 2. Calculate the index of the new order: total number of instructions
// with order less than the order of the current instruction + reuse		// with order less than the order of the current instruction + reuse
// number of the current instruction.		// number of the current instruction.
// 3. The new order is just the index of the instruction in the original		// 3. The new order is just the index of the instruction in the original
// vector of the instructions.		// vector of the instructions.
for (unsigned I : Node.ReuseShuffleIndices)		for (unsigned I : Node.ReuseShuffleIndices)
++OrderCounter[Order[I]];		if (I != RootSize && Mask[I] != UndefMaskElem)
SmallVector<unsigned, 4> CurrentCounter(Order.size(), 0);		++OrderCounter[Mask[I]];
		SmallVector<unsigned, 4> CurrentCounter(Order.size() + 1, 0);
for (unsigned I = 0, E = Node.ReuseShuffleIndices.size(); I < E; ++I) {		for (unsigned I = 0, E = Node.ReuseShuffleIndices.size(); I < E; ++I) {
unsigned ReusedIdx = Node.ReuseShuffleIndices[I];		unsigned ReusedIdx = Node.ReuseShuffleIndices[I];
unsigned OrderIdx = Order[ReusedIdx];		if (ReusedIdx == RootSize)
		continue;
		int OrderIdx = Mask[ReusedIdx];
		if (OrderIdx == UndefMaskElem) {
		// Special case where the UndefValue is actually a real operand. Need
		// to expand the order taking this UndefValue into account.
		OrderIdx = RootSize;
		}
unsigned NewIdx = 0;		unsigned NewIdx = 0;
for (unsigned J = 0; J < OrderIdx; ++J)		for (int J = 0; J < OrderIdx; ++J)
NewIdx += OrderCounter[J];		NewIdx += OrderCounter[J];
NewIdx += CurrentCounter[OrderIdx];		NewIdx += CurrentCounter[OrderIdx];
++CurrentCounter[OrderIdx];		++CurrentCounter[OrderIdx];
assert(NewOrder[NewIdx] == RootSize &&		assert(Order[NewIdx] == RootSize &&
"The order index should not be written already.");		"The order index should not be written already.");
NewOrder[NewIdx] = I;		Order[NewIdx] = I;
}		}
std::swap(Order, NewOrder);
}		}
assert(Order.size() == RootSize &&		// The order must be normalized relatively the root node after the
"Root node is expected or the size of the order must be the same as "		// function.
"the number of elements in the root node.");		assert(IsNormalizedOrder(Order) &&
assert(llvm::all_of(Order,		"Indices for all non-undefs must be set.");
[RootSize](unsigned Val) { return Val != RootSize; }) &&
"All indices must be initialized");
}		}

/// \return The vector element size in bits to use when vectorizing the		/// \return The vector element size in bits to use when vectorizing the
/// expression tree ending at \p V. If V is a store, the size is the width of		/// expression tree ending at \p V. If V is a store, the size is the width of
/// the stored value. Otherwise, the size is the width of the largest loaded		/// the stored value. Otherwise, the size is the width of the largest loaded
/// value reaching V. This method is used by the vectorizer to calculate		/// value reaching V. This method is used by the vectorizer to calculate
/// vectorization factors.		/// vectorization factors.
unsigned getVectorElementSize(Value *V);		unsigned getVectorElementSize(Value *V);
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	class VLOperands {

/// During operand reordering, we are trying to select the operand at lane		/// During operand reordering, we are trying to select the operand at lane
/// that matches best with the operand at the neighboring lane. Our		/// that matches best with the operand at the neighboring lane. Our
/// selection is based on the type of value we are looking for. For example,		/// selection is based on the type of value we are looking for. For example,
/// if the neighboring lane has a load, we need to look for a load that is		/// if the neighboring lane has a load, we need to look for a load that is
/// accessing a consecutive address. These strategies are summarized in the		/// accessing a consecutive address. These strategies are summarized in the
/// 'ReorderingMode' enumerator.		/// 'ReorderingMode' enumerator.
enum class ReorderingMode {		enum class ReorderingMode {
		Unknown, ///< Mode is not defined yet
Load, ///< Matching loads to consecutive memory addresses		Load, ///< Matching loads to consecutive memory addresses
Opcode, ///< Matching instructions based on opcode (same or alternate)		Opcode, ///< Matching instructions based on opcode (same or alternate)
Constant, ///< Matching constants		Constant, ///< Matching constants
Splat, ///< Matching the same instruction multiple times (broadcast)		Splat, ///< Matching the same instruction multiple times (broadcast)
Failed, ///< We failed to create a vectorizable group		Failed, ///< We failed to create a vectorizable group
};		};

using OperandDataVec = SmallVector<OperandData, 2>;		using OperandDataVec = SmallVector<OperandData, 2>;

/// A vector of operand vectors.		/// A vector of operand vectors.
SmallVector<OperandDataVec, 4> OpsVec;		SmallVector<OperandDataVec, 4> OpsVec;

const DataLayout &DL;		const DataLayout &DL;
ScalarEvolution &SE;		ScalarEvolution &SE;
const BoUpSLP &R;		const BoUpSLP &R;
		/// Base instruction in the list of scalars, the first instruction with the
		/// main opcode.
		Instruction &VL0;
		/// Number of lanes in the node, i.e. PowerOf2Ceil(number of instructions in
		/// the node).
		unsigned NumLanes = 0;

/// \returns the operand data at \p OpIdx and \p Lane.		/// \returns the operand data at \p OpIdx and \p Lane.
OperandData &getData(unsigned OpIdx, unsigned Lane) {		OperandData &getData(unsigned OpIdx, unsigned Lane) {
return OpsVec[OpIdx][Lane];		return OpsVec[OpIdx][Lane];
}		}

/// \returns the operand data at \p OpIdx and \p Lane. Const version.		/// \returns the operand data at \p OpIdx and \p Lane. Const version.
const OperandData &getData(unsigned OpIdx, unsigned Lane) const {		const OperandData &getData(unsigned OpIdx, unsigned Lane) const {
Show All 9 Lines	void clearUsed() {
OpsVec[OpIdx][Lane].IsUsed = false;		OpsVec[OpIdx][Lane].IsUsed = false;
}		}

/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.		/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.
void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {		void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {
std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);		std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);
}		}

// The hard-coded scores listed here are not very important. When computing		// The hard-coded scores listed here are not very important, though it shall
// the scores of matching one sub-tree with another, we are basically		// be higher for better matches to iimprove the resulting cost. When
// counting the number of values that are matching. So even if all scores		// computing the scores of matching one sub-tree with another, we are
// are set to 1, we would still get a decent matching result.		// basically counting the number of values that are matching. So even if all
		// scores are set to 1, we would still get a decent matching result.
// However, sometimes we have to break ties. For example we may have to		// However, sometimes we have to break ties. For example we may have to
// choose between matching loads vs matching opcodes. This is what these		// choose between matching loads vs matching opcodes. This is what these
// scores are helping us with: they provide the order of preference.		// scores are helping us with: they provide the order of preference. Also,
		// this is improtant if the scalar is externally used or used in another
		// tree entry node in the different lane.

/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).		/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).
static const int ScoreConsecutiveLoads = 3;		static const int ScoreConsecutiveLoads = 4;
		/// Loads from reversed memory addresses, e.g. load(A[i+1]), load(A[i]).
		static const int ScoreReversedLoads = 3;
/// ExtractElementInst from same vector and consecutive indexes.		/// ExtractElementInst from same vector and consecutive indexes.
static const int ScoreConsecutiveExtracts = 3;		static const int ScoreConsecutiveExtracts = 4;
		/// ExtractElementInst from same vector and reversed indices.
		static const int ScoreReversedExtracts = 3;
/// Constants.		/// Constants.
static const int ScoreConstants = 2;		static const int ScoreConstants = 2;
/// Instructions with the same opcode.		/// Instructions with the same opcode.
static const int ScoreSameOpcode = 2;		static const int ScoreSameOpcode = 2;
/// Instructions with alt opcodes (e.g, add + sub).		/// Instructions with alt opcodes (e.g, add + sub).
static const int ScoreAltOpcodes = 1;		static const int ScoreAltOpcodes = 1;
/// Identical instructions (a.k.a. splat or broadcast).		/// Identical instructions (a.k.a. splat or broadcast).
static const int ScoreSplat = 1;		static const int ScoreSplat = 1;
/// Matching with an undef is preferable to failing.		/// Matching with an undef is preferable to failing.
static const int ScoreUndef = 1;		static const int ScoreUndef = 1;
/// Score for failing to find a decent match.		/// Score for failing to find a decent match.
static const int ScoreFail = 0;		static const int ScoreFail = 0;
/// User exteranl to the vectorized code.		/// User exteranl to the vectorized code.
static const int ExternalUseCost = 1;		static const int ExternalUseCost = 1;
/// The user is internal but in a different lane.		/// The user is internal but in a different lane.
static const int UserInDiffLaneCost = ExternalUseCost;		static const int UserInDiffLaneCost = ExternalUseCost;

/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.		/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.
static int getShallowScore(Value V1, Value V2, const DataLayout &DL,		static int getShallowScore(Value V1, Value V2, const DataLayout &DL,
ScalarEvolution &SE) {		ScalarEvolution &SE, int NumLanes) {
		if (V1 == V2)
		return VLOperands::ScoreSplat;

auto *LI1 = dyn_cast<LoadInst>(V1);		auto *LI1 = dyn_cast<LoadInst>(V1);
auto *LI2 = dyn_cast<LoadInst>(V2);		auto *LI2 = dyn_cast<LoadInst>(V2);
if (LI1 && LI2) {		if (LI1 && LI2) {
if (LI1->getParent() != LI2->getParent())		if (LI1->getParent() != LI2->getParent())
return VLOperands::ScoreFail;		return VLOperands::ScoreFail;

Optional<int> Dist = getPointersDiff(		Optional<int> Dist = getPointersDiff(
LI1->getType(), LI1->getPointerOperand(), LI2->getType(),		LI1->getType(), LI1->getPointerOperand(), LI2->getType(),
LI2->getPointerOperand(), DL, SE, /StrictCheck=/true);		LI2->getPointerOperand(), DL, SE, /StrictCheck=/true);
return (Dist && *Dist == 1) ? VLOperands::ScoreConsecutiveLoads		if (!Dist)
: VLOperands::ScoreFail;		return VLOperands::ScoreFail;
		// The distance is too large - still may be profitable to use masked
		// loads/gathers.
		if (std::abs(*Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		return (*Dist > 0) ? VLOperands::ScoreConsecutiveLoads
		: VLOperands::ScoreReversedLoads;
}		}

auto *C1 = dyn_cast<Constant>(V1);		auto *C1 = dyn_cast<Constant>(V1);
auto *C2 = dyn_cast<Constant>(V2);		auto *C2 = dyn_cast<Constant>(V2);
if (C1 && C2)		if (C1 && C2 && !isa<UndefValue>(V2))
return VLOperands::ScoreConstants;		return VLOperands::ScoreConstants;

// Extracts from consecutive indexes of the same vector better score as		// Extracts from consecutive indexes of the same vector better score as
// the extracts could be optimized away.		// the extracts could be optimized away.
Value *EV;		Value *EV;
ConstantInt Ex1Idx, Ex2Idx;		ConstantInt Ex1Idx, Ex2Idx;
if (match(V1, m_ExtractElt(m_Value(EV), m_ConstantInt(Ex1Idx))) &&		if (match(V2, m_ExtractElt(m_Value(EV), m_ConstantInt(Ex2Idx)))) {
match(V2, m_ExtractElt(m_Deferred(EV), m_ConstantInt(Ex2Idx))) &&		if (match(V1, m_ExtractElt(m_Deferred(EV), m_ConstantInt(Ex1Idx)))) {
Ex1Idx->getZExtValue() + 1 == Ex2Idx->getZExtValue())		int Idx1 = Ex1Idx->getZExtValue();
return VLOperands::ScoreConsecutiveExtracts;		int Idx2 = Ex2Idx->getZExtValue();
		int Dist = Idx2 - Idx1;
		// The distance is too large - still may be profitable to use
		// shuffles.
		if (std::abs(Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		return (Dist > 0) ? VLOperands::ScoreConsecutiveExtracts
		: VLOperands::ScoreReversedExtracts;
		}
		return VLOperands::ScoreFail;
		}

auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (I1 && I2) {		if (I1 && I2) {
if (I1 == I2)		if (I1->getParent() != I2->getParent())
return VLOperands::ScoreSplat;		return VLOperands::ScoreFail;
InstructionsState S = getSameOpcode({I1, I2});		InstructionsState S = getSameOpcode({I1, I2});
// Note: Only consider instructions with <= 2 operands to avoid		// Note: Only consider instructions with <= 2 operands to avoid
// complexity explosion.		// complexity explosion.
if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)		if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)
return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes		return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes
: VLOperands::ScoreSameOpcode;		: VLOperands::ScoreSameOpcode;
}		}

if (isa<UndefValue>(V2))		if (isa<UndefValue>(V2))
return VLOperands::ScoreUndef;		return VLOperands::ScoreUndef;

return VLOperands::ScoreFail;		return VLOperands::ScoreFail;
}		}

/// Holds the values and their lane that are taking part in the look-ahead		/// Holds the values and their lanes that are taking part in the look-ahead
/// score calculation. This is used in the external uses cost calculation.		/// score calculation. This is used in the external uses cost calculation.
SmallDenseMap<Value *, int> InLookAheadValues;		/// Need to hold all the lanes in case of splat/broadcast at least to
		/// correctly check for the use in the different lane.
		SmallDenseMap<Value *, SmallSet<int, 4>> InLookAheadValues;

/// \Returns the additinal cost due to uses of \p LHS and \p RHS that are		/// \Returns the additinal cost due to uses of \p LHS and \p RHS that are
/// either external to the vectorized code, or require shuffling.		/// either external to the vectorized code, or require shuffling.
int getExternalUsesCost(const std::pair<Value *, int> &LHS,		int getExternalUsesCost(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS) {		const std::pair<Value *, int> &RHS) {
int Cost = 0;		int Cost = 0;
std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};		std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};
for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {		for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {
Show All 13 Lines	int getExternalUsesCost(const std::pair<Value *, int> &LHS,
unsigned UsersBudget = LookAheadUsersBudget;		unsigned UsersBudget = LookAheadUsersBudget;
for (User *U : V->users()) {		for (User *U : V->users()) {
if (const TreeEntry *UserTE = R.getTreeEntry(U)) {		if (const TreeEntry *UserTE = R.getTreeEntry(U)) {
// The user is in the VectorizableTree. Check if we need to insert.		// The user is in the VectorizableTree. Check if we need to insert.
auto It = llvm::find(UserTE->Scalars, U);		auto It = llvm::find(UserTE->Scalars, U);
assert(It != UserTE->Scalars.end() && "U is in UserTE");		assert(It != UserTE->Scalars.end() && "U is in UserTE");
int UserLn = std::distance(UserTE->Scalars.begin(), It);		int UserLn = std::distance(UserTE->Scalars.begin(), It);
assert(UserLn >= 0 && "Bad lane");		assert(UserLn >= 0 && "Bad lane");
if (UserLn != Ln)		// If the values are different, check just the line of the current
		// value. If the values are the same, need to add UserInDiffLaneCost
		// only if UserLn does not match both line numbers.
		if ((LHS.first != RHS.first && UserLn != Ln) \|\|
		(LHS.first == RHS.first && UserLn != LHS.second &&
		UserLn != RHS.second)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// Check if the user is in the look-ahead code.		// Check if the user is in the look-ahead code.
auto It2 = InLookAheadValues.find(U);		auto It2 = InLookAheadValues.find(U);
if (It2 != InLookAheadValues.end()) {		if (It2 != InLookAheadValues.end()) {
// The user is in the look-ahead code. Check the lane.		// The user is in the look-ahead code. Check the lane.
if (It2->second != Ln)		if (!It2->getSecond().contains(Ln)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// The user is neither in SLP tree nor in the look-ahead code.		// The user is neither in SLP tree nor in the look-ahead code.
Cost += ExternalUseCost;		Cost += ExternalUseCost;
		break;
}		}
}		}
// Limit the number of visited uses to cap compilation time.		// Limit the number of visited uses to cap compilation time.
if (--UsersBudget == 0)		if (--UsersBudget == 0)
break;		break;
}		}
}		}
return Cost;		return Cost;
Show All 22 Lines	class VLOperands {
/// Luís F. W. Góes		/// Luís F. W. Góes
int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,		int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS, int CurrLevel,		const std::pair<Value *, int> &RHS, int CurrLevel,
int MaxLevel) {		int MaxLevel) {

Value *V1 = LHS.first;		Value *V1 = LHS.first;
Value *V2 = RHS.first;		Value *V2 = RHS.first;
// Get the shallow score of V1 and V2.		// Get the shallow score of V1 and V2.
int ShallowScoreAtThisLevel =		int ShallowScoreAtThisLevel = std::max(
std::max((int)ScoreFail, getShallowScore(V1, V2, DL, SE) -		(int)ScoreFail, getShallowScore(V1, V2, DL, SE, getNumLanes()) -
getExternalUsesCost(LHS, RHS));		getExternalUsesCost(LHS, RHS));
int Lane1 = LHS.second;		int Lane1 = LHS.second;
int Lane2 = RHS.second;		int Lane2 = RHS.second;

// If reached MaxLevel,		// If reached MaxLevel,
// or if V1 and V2 are not instructions,		// or if V1 and V2 are not instructions,
// or if they are SPLAT,		// or if they are SPLAT,
// or if they are not consecutive, early return the current cost.		// or if they are not consecutive,
		// or if profitable to vectorize loads or extractelements, early return
		// the current cost.
auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|		if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|
ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|		ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|
(isa<LoadInst>(I1) && isa<LoadInst>(I2) && ShallowScoreAtThisLevel))		(((isa<LoadInst>(I1) && isa<LoadInst>(I2)) \|\|
		(isa<ExtractElementInst>(I1) && isa<ExtractElementInst>(I2))) &&
		ShallowScoreAtThisLevel))
return ShallowScoreAtThisLevel;		return ShallowScoreAtThisLevel;
assert(I1 && I2 && "Should have early exited.");		assert(I1 && I2 && "Should have early exited.");

// Keep track of in-tree values for determining the external-use cost.		// Keep track of in-tree values for determining the external-use cost.
InLookAheadValues[V1] = Lane1;		InLookAheadValues[V1].insert(Lane1);
InLookAheadValues[V2] = Lane2;		InLookAheadValues[V2].insert(Lane2);

// Contains the I2 operand indexes that got matched with I1 operands.		// Contains the I2 operand indexes that got matched with I1 operands.
SmallSet<unsigned, 4> Op2Used;		SmallSet<unsigned, 4> Op2Used;

// Recursion towards the operands of I1 and I2. We are trying all possbile		// Recursion towards the operands of I1 and I2. We are trying all possbile
// operand pairs, and keeping track of the best score.		// operand pairs, and keeping track of the best score.
for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();		for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();
OpIdx1 != NumOperands1; ++OpIdx1) {		OpIdx1 != NumOperands1; ++OpIdx1) {
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	getBestOperand(unsigned OpIdx, int Lane, int LastLane,
// Sometimes we have more than one option (e.g., Opcode and Undefs), so we		// Sometimes we have more than one option (e.g., Opcode and Undefs), so we
// are using the score to differentiate between the two.		// are using the score to differentiate between the two.
struct BestOpData {		struct BestOpData {
Optional<unsigned> Idx = None;		Optional<unsigned> Idx = None;
unsigned Score = 0;		unsigned Score = 0;
} BestOp;		} BestOp;

// Iterate through all unused operands and look for the best.		// Iterate through all unused operands and look for the best.
		bool IsOpLastLaneUndef = isa<UndefValue>(OpLastLane);
for (unsigned Idx = 0; Idx != NumOperands; ++Idx) {		for (unsigned Idx = 0; Idx != NumOperands; ++Idx) {
// Get the operand at Idx and Lane.		// Get the operand at Idx and Lane.
OperandData &OpData = getData(Idx, Lane);		OperandData &OpData = getData(Idx, Lane);
Value *Op = OpData.V;		Value *Op = OpData.V;
bool OpAPO = OpData.APO;		bool OpAPO = OpData.APO;

// Skip already selected operands.		// Skip already selected operands.
if (OpData.IsUsed)		if (OpData.IsUsed)
continue;		continue;

// Skip if we are trying to move the operand to a position with a		// Skip if we are trying to move the operand to a position with a
// different opcode in the linearized tree form. This would break the		// different opcode in the linearized tree form. This would break the
// semantics.		// semantics.
if (OpAPO != OpIdxAPO)		if (OpAPO != OpIdxAPO)
continue;		continue;

		// Ignore two undefs.
		if (IsOpLastLaneUndef && isa<UndefValue>(Op)) {
		if (BestOp.Score < VLOperands::ScoreUndef) {
		BestOp.Idx = Idx;
		BestOp.Score = VLOperands::ScoreUndef;
		}
		continue;
		}

// Look for an operand that matches the current mode.		// Look for an operand that matches the current mode.
switch (RMode) {		switch (RMode) {
case ReorderingMode::Load:		case ReorderingMode::Load:
case ReorderingMode::Constant:		case ReorderingMode::Constant:
case ReorderingMode::Opcode: {		case ReorderingMode::Opcode: {
bool LeftToRight = Lane > LastLane;		bool LeftToRight = Lane > LastLane;
Value *OpLeft = (LeftToRight) ? OpLastLane : Op;		Value *OpLeft = (LeftToRight) ? OpLastLane : Op;
Value *OpRight = (LeftToRight) ? Op : OpLastLane;		Value *OpRight = (LeftToRight) ? Op : OpLastLane;
unsigned Score =		unsigned Score =
getLookAheadScore({OpLeft, LastLane}, {OpRight, Lane});		getLookAheadScore({OpLeft, LastLane}, {OpRight, Lane});
if (Score > BestOp.Score) {		if (Score > BestOp.Score) {
BestOp.Idx = Idx;		BestOp.Idx = Idx;
BestOp.Score = Score;		BestOp.Score = Score;
}		}
break;		break;
}		}
case ReorderingMode::Splat:		case ReorderingMode::Splat:
if (Op == OpLastLane)		// Undef is also can be part of splat/broadcast.
		if (Op == OpLastLane \|\| IsOpLastLaneUndef \|\| isa<UndefValue>(Op))
BestOp.Idx = Idx;		BestOp.Idx = Idx;
break;		break;
case ReorderingMode::Failed:		case ReorderingMode::Failed:
return None;		return None;
		case ReorderingMode::Unknown:
		llvm_unreachable("Unknown mode is not expected here.");
}		}
}		}

if (BestOp.Idx) {		if (BestOp.Idx) {
getData(BestOp.Idx.getValue(), Lane).IsUsed = true;		getData(BestOp.Idx.getValue(), Lane).IsUsed = true;
return BestOp.Idx;		return BestOp.Idx;
}		}
// If we could not find a good match return None.		// If we could not find a good match return None.
return None;		return None;
}		}

/// Helper for reorderOperandVecs. \Returns the lane that we should start		/// Helper for reorderOperandVecs. \Returns the lane that we should start
/// reordering from. This is the one which has the least number of operands		/// reordering from. This is the one which has the least number of operands
/// that can freely move about.		/// that can freely move about or less profitable because it already has the
		/// most optimal set of operands.
unsigned getBestLaneToStartReordering() const {		unsigned getBestLaneToStartReordering() const {
unsigned BestLane = 0;		unsigned BestLane = 0;
unsigned Min = UINT_MAX;		unsigned Min = UINT_MAX;
for (unsigned Lane = 0, NumLanes = getNumLanes(); Lane != NumLanes;		unsigned SameOpNumber = 0;
++Lane) {		for (int I = getNumLanes(); I > 0; --I) {
unsigned NumFreeOps = getMaxNumOperandsThatCanBeReordered(Lane);		unsigned Lane = I - 1;
if (NumFreeOps < Min) {		std::pair<unsigned, unsigned> NumFreeOpsHash =
Min = NumFreeOps;		getMaxNumOperandsThatCanBeReordered(Lane);
		// Compare the number of operands that can move and choose the one with
		// the least number.
		if (NumFreeOpsHash.first < Min) {
		Min = NumFreeOpsHash.first;
		SameOpNumber = NumFreeOpsHash.second;
		BestLane = Lane;
		} else if (NumFreeOpsHash.first == Min &&
		NumFreeOpsHash.second < SameOpNumber) {
		// Select the most optimal lane in terms of number of operands that
		// should be moved around.
		SameOpNumber = NumFreeOpsHash.second;
BestLane = Lane;		BestLane = Lane;
}		}
}		}
return BestLane;		return BestLane;
}		}

/// \Returns the maximum number of operands that are allowed to be reordered		/// \Returns the maximum number of operands that are allowed to be reordered
/// for \p Lane. This is used as a heuristic for selecting the first lane to		/// for \p Lane and the number of compatible instructions(with the same
/// start operand reordering.		/// parent/opcode). This is used as a heuristic for selecting the first lane
unsigned getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {		/// to start operand reordering.
		std::pair<unsigned, unsigned>
		getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {
unsigned CntTrue = 0;		unsigned CntTrue = 0;
unsigned NumOperands = getNumOperands();		unsigned NumOperands = getNumOperands();
// Operands with the same APO can be reordered. We therefore need to count		// Operands with the same APO can be reordered. We therefore need to count
// how many of them we have for each APO, like this: Cnt[APO] = x.		// how many of them we have for each APO, like this: Cnt[APO] = x.
// Since we only have two APOs, namely true and false, we can avoid using		// Since we only have two APOs, namely true and false, we can avoid using
// a map. Instead we can simply count the number of operands that		// a map. Instead we can simply count the number of operands that
// correspond to one of them (in this case the 'true' APO), and calculate		// correspond to one of them (in this case the 'true' APO), and calculate
// the other by subtracting it from the total number of operands.		// the other by subtracting it from the total number of operands.
for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx)		// Operands with the same instruction opcode and parent are more
if (getData(OpIdx, Lane).APO)		// profitable since we don't need to move them in many cases.
		bool AllUndefs = true;
		unsigned SameCodeParentOps = 0;
		unsigned Opcode = 0;
		BasicBlock *Parent = nullptr;
		for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {
		const OperandData &OpData = getData(OpIdx, Lane);
		if (OpData.APO)
++CntTrue;		++CntTrue;
		if (auto *I = dyn_cast<Instruction>(OpData.V)) {
		if (Opcode != I->getOpcode() \|\| I->getParent() != Parent) {
		if (SameCodeParentOps == 0) {
		SameCodeParentOps = 1;
		Opcode = I->getOpcode();
		Parent = I->getParent();
		} else {
		--SameCodeParentOps;
		}
		} else {
		++SameCodeParentOps;
		}
		}
		AllUndefs = AllUndefs && isa<UndefValue>(OpData.V);
		}
		if (AllUndefs)
		return std::make_pair(UINT_MAX, 0);
unsigned CntFalse = NumOperands - CntTrue;		unsigned CntFalse = NumOperands - CntTrue;
return std::max(CntTrue, CntFalse);		return std::make_pair(std::max(CntTrue, CntFalse), SameCodeParentOps);
}		}

/// Go through the instructions in VL and append their operands.		/// Go through the instructions in VL and append their operands.
void appendOperandsOfVL(ArrayRef<Value *> VL) {		void appendOperandsOfVL(ArrayRef<Value *> VL) {
assert(!VL.empty() && "Bad VL");		assert(!VL.empty() && "Bad VL");
assert((empty() \|\| VL.size() == getNumLanes()) &&		assert((empty() \|\| VL.size() == getNumLanes()) &&
"Expected same number of lanes");		"Expected same number of lanes");
assert(isa<Instruction>(VL[0]) && "Expected instruction");		unsigned NumOperands = VL0.getNumOperands();
unsigned NumOperands = cast<Instruction>(VL[0])->getNumOperands();
OpsVec.resize(NumOperands);		OpsVec.resize(NumOperands);
unsigned NumLanes = VL.size();		unsigned NumLanes = VL.size();
for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {		for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {
OpsVec[OpIdx].resize(NumLanes);		OpsVec[OpIdx].resize(NumLanes);
for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {		for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
		if (isa<PoisonValue>(VL[Lane])) {
		OpsVec[OpIdx][Lane] = {
		PoisonValue::get(VL0.getOperand(OpIdx)->getType()), false,
		false};
		continue;
		}
		if (isa<UndefValue>(VL[Lane])) {
		OpsVec[OpIdx][Lane] = {
		UndefValue::get(VL0.getOperand(OpIdx)->getType()), false,
		false};
		continue;
		}
assert(isa<Instruction>(VL[Lane]) && "Expected instruction");		assert(isa<Instruction>(VL[Lane]) && "Expected instruction");
// Our tree has just 3 nodes: the root and two operands.		// Our tree has just 3 nodes: the root and two operands.
// It is therefore trivial to get the APO. We only need to check the		// It is therefore trivial to get the APO. We only need to check the
// opcode of VL[Lane] and whether the operand at OpIdx is the LHS or		// opcode of VL[Lane] and whether the operand at OpIdx is the LHS or
// RHS operand. The LHS operand of both add and sub is never attached		// RHS operand. The LHS operand of both add and sub is never attached
// to an inversese operation in the linearized form, therefore its APO		// to an inversese operation in the linearized form, therefore its APO
// is false. The RHS is true only if VL[Lane] is an inverse operation.		// is false. The RHS is true only if VL[Lane] is an inverse operation.

// Since operand reordering is performed on groups of commutative		// Since operand reordering is performed on groups of commutative
// operations or alternating sequences (e.g., +, -), we can safely		// operations or alternating sequences (e.g., +, -), we can safely
// tell the inverse operations by checking commutativity.		// tell the inverse operations by checking commutativity.
bool IsInverseOperation = !isCommutative(cast<Instruction>(VL[Lane]));		bool IsInverseOperation = !isCommutative(cast<Instruction>(VL[Lane]));
bool APO = (OpIdx == 0) ? false : IsInverseOperation;		bool APO = (OpIdx == 0) ? false : IsInverseOperation;
OpsVec[OpIdx][Lane] = {cast<Instruction>(VL[Lane])->getOperand(OpIdx),		OpsVec[OpIdx][Lane] = {cast<Instruction>(VL[Lane])->getOperand(OpIdx),
APO, false};		APO, false};
}		}
}		}
}		}

/// \returns the number of operands.		/// \returns the number of operands.
unsigned getNumOperands() const { return OpsVec.size(); }		unsigned getNumOperands() const { return OpsVec.size(); }

/// \returns the number of lanes.		/// \returns the number of lanes.
unsigned getNumLanes() const { return OpsVec[0].size(); }		unsigned getNumLanes() const { return NumLanes; }

/// \returns the operand value at \p OpIdx and \p Lane.		/// \returns the operand value at \p OpIdx and \p Lane.
Value *getValue(unsigned OpIdx, unsigned Lane) const {		Value *getValue(unsigned OpIdx, unsigned Lane) const {
return getData(OpIdx, Lane).V;		return getData(OpIdx, Lane).V;
}		}

/// \returns true if the data structure is empty.		/// \returns true if the data structure is empty.
bool empty() const { return OpsVec.empty(); }		bool empty() const { return OpsVec.empty(); }
Show All 10 Lines	bool shouldBroadcast(Value *Op, unsigned OpIdx, unsigned Lane) {
if (Ln == Lane)		if (Ln == Lane)
continue;		continue;
// This is set to true if we found a candidate for broadcast at Lane.		// This is set to true if we found a candidate for broadcast at Lane.
bool FoundCandidate = false;		bool FoundCandidate = false;
for (unsigned OpI = 0, OpE = getNumOperands(); OpI != OpE; ++OpI) {		for (unsigned OpI = 0, OpE = getNumOperands(); OpI != OpE; ++OpI) {
OperandData &Data = getData(OpI, Ln);		OperandData &Data = getData(OpI, Ln);
if (Data.APO != OpAPO \|\| Data.IsUsed)		if (Data.APO != OpAPO \|\| Data.IsUsed)
continue;		continue;
if (Data.V == Op) {		if (Data.V == Op \|\| isa<UndefValue>(Op)) {
FoundCandidate = true;		FoundCandidate = true;
Data.IsUsed = true;		Data.IsUsed = true;
break;		break;
}		}
}		}
if (!FoundCandidate)		if (!FoundCandidate)
return false;		return false;
}		}
return true;		return true;
}		}

public:		public:
/// Initialize with all the operands of the instruction vector \p RootVL.		/// Initialize with all the operands of the instruction vector \p RootVL.
VLOperands(ArrayRef<Value *> RootVL, const DataLayout &DL,		VLOperands(Instruction &VL0, ArrayRef<Value *> RootVL, const DataLayout &DL,
ScalarEvolution &SE, const BoUpSLP &R)		ScalarEvolution &SE, const BoUpSLP &R)
: DL(DL), SE(SE), R(R) {		: DL(DL), SE(SE), R(R), VL0(VL0) {
// Append all the operands of RootVL.		// Append all the operands of RootVL.
appendOperandsOfVL(RootVL);		appendOperandsOfVL(RootVL);
		// PowerOf2Ceil(distance between the last instrcution and the first
		// instruction in the array of scalars).
		NumLanes = PowerOf2Ceil(
		std::distance(RootVL.begin(), find_if(reverse(RootVL), [](Value *V) {
		return !isa<UndefValue>(V);
		}).base()));
}		}

/// \Returns a value vector with the operands across all lanes for the		/// \Returns a value vector with the operands across all lanes for the
/// opearnd at \p OpIdx.		/// opearnd at \p OpIdx.
ValueList getVL(unsigned OpIdx) const {		ValueList getVL(unsigned OpIdx) const {
ValueList OpVL(OpsVec[OpIdx].size());		ValueList OpVL(OpsVec[OpIdx].size());
assert(OpsVec[OpIdx].size() == getNumLanes() &&		assert(std::all_of(std::next(OpsVec[OpIdx].begin(), getNumLanes()),
		OpsVec[OpIdx].end(),
		[](const OperandData &Data) {
		return isa<UndefValue>(Data.V);
		}) &&
"Expected same num of lanes across all operands");		"Expected same num of lanes across all operands");
for (unsigned Lane = 0, Lanes = getNumLanes(); Lane != Lanes; ++Lane)		for (unsigned Lane = 0, Lanes = OpsVec[OpIdx].size(); Lane != Lanes;
		++Lane)
OpVL[Lane] = OpsVec[OpIdx][Lane].V;		OpVL[Lane] = OpsVec[OpIdx][Lane].V;
return OpVL;		return OpVL;
}		}

// Performs operand reordering for 2 or more operands.		// Performs operand reordering for 2 or more operands.
// The original operands are in OrigOps[OpIdx][Lane].		// The original operands are in OrigOps[OpIdx][Lane].
// The reordered operands are returned in 'SortedOps[OpIdx][Lane]'.		// The reordered operands are returned in 'SortedOps[OpIdx][Lane]'.
void reorder() {		void reorder() {
unsigned NumOperands = getNumOperands();		unsigned NumOperands = getNumOperands();
unsigned NumLanes = getNumLanes();		unsigned NumLanes = getNumLanes();
// Each operand has its own mode. We are using this mode to help us select		// Each operand has its own mode. We are using this mode to help us select
// the instructions for each lane, so that they match best with the ones		// the instructions for each lane, so that they match best with the ones
// we have selected so far.		// we have selected so far.
SmallVector<ReorderingMode, 2> ReorderingModes(NumOperands);		SmallVector<ReorderingMode, 2> ReorderingModes(NumOperands,
		ReorderingMode::Unknown);

// This is a greedy single-pass algorithm. We are going over each lane		// This is a greedy single-pass algorithm. We are going over each lane
// once and deciding on the best order right away with no back-tracking.		// once and deciding on the best order right away with no back-tracking.
// However, in order to increase its effectiveness, we start with the lane		// However, in order to increase its effectiveness, we start with the lane
// that has operands that can move the least. For example, given the		// that has operands that can move the least. For example, given the
// following lanes:		// following lanes:
// Lane 0 : A[0] = B[0] + C[0] // Visited 3rd		// Lane 0 : A[0] = B[0] + C[0] // Visited 3rd
// Lane 1 : A[1] = C[1] - B[1] // Visited 1st		// Lane 1 : A[1] = C[1] - B[1] // Visited 1st
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	void reorder() {
if (!StrategyFailed)		if (!StrategyFailed)
break;		break;
}		}
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD static StringRef getModeStr(ReorderingMode RMode) {		LLVM_DUMP_METHOD static StringRef getModeStr(ReorderingMode RMode) {
switch (RMode) {		switch (RMode) {
		case ReorderingMode::Unknown:
		return "Unknown";
case ReorderingMode::Load:		case ReorderingMode::Load:
return "Load";		return "Load";
case ReorderingMode::Opcode:		case ReorderingMode::Opcode:
return "Opcode";		return "Opcode";
case ReorderingMode::Constant:		case ReorderingMode::Constant:
return "Constant";		return "Constant";
case ReorderingMode::Splat:		case ReorderingMode::Splat:
return "Splat";		return "Splat";
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	#endif

~BoUpSLP();		~BoUpSLP();

private:		private:
/// Checks if all users of \p I are the part of the vectorization tree.		/// Checks if all users of \p I are the part of the vectorization tree.
bool areAllUsersVectorized(Instruction *I,		bool areAllUsersVectorized(Instruction *I,
ArrayRef<Value *> VectorizedVals) const;		ArrayRef<Value *> VectorizedVals) const;

		/// Gets most optimial vectorization factor for the tree entry.
		/// \param UserVFs Vectorization factors of the user nodes.
		/// \param IE The starting node when trying to get the vectorization factor.
		/// Required to stop correctly inside of loops, if we have PHI instructions.
		unsigned getEntryVF(const TreeEntry *E, SmallSet<unsigned, 4> &UserVFs,
		const TreeEntry *IE);

/// \returns the cost of the vectorizable entry.		/// \returns the cost of the vectorizable entry.
InstructionCost getEntryCost(const TreeEntry *E,		InstructionCost getEntryCost(const TreeEntry *E,
ArrayRef<Value *> VectorizedVals);		ArrayRef<Value *> VectorizedVals);

/// This is the recursive part of buildTree.		/// This is the recursive part of buildTree.
void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth,		void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth,
const EdgeInfo &EI);		const EdgeInfo &EI);

/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can		/// \returns true if the ExtractElement/ExtractValue instructions in \p VL can
/// be vectorized to use the original vector (or aggregate "bitcast" to a		/// be vectorized to use the original vector (or aggregate "bitcast" to a
/// vector) and sets \p CurrentOrder to the identity permutation; otherwise		/// vector) and sets \p CurrentOrder to the identity permutation; otherwise
/// returns false, setting \p CurrentOrder to either an empty vector or a		/// returns false, setting \p CurrentOrder to either an empty vector or a
/// non-identity permutation that allows to reuse extract instructions.		/// non-identity permutation that allows to reuse extract instructions.
bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,		bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,
SmallVectorImpl<unsigned> &CurrentOrder) const;		SmallVectorImpl<unsigned> &CurrentOrder) const;

/// Vectorize a single entry in the tree.		/// Vectorize a single entry in the tree.
Value vectorizeTree(TreeEntry E);		Value vectorizeTree(TreeEntry E);

/// Vectorize a single entry in the tree, starting in \p VL.		/// Vectorize a single entry in the tree, starting in \p VL and for
Value vectorizeTree(ArrayRef<Value > VL);		/// vectorization factor \p VF.
		Value vectorizeTree(ArrayRef<Value > VL, unsigned VF);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars.		/// context means the creation of vectors from a group of scalars.
InstructionCost		InstructionCost
getGatherCost(FixedVectorType *Ty,		getGatherCost(FixedVectorType *Ty,
const DenseSet<unsigned> &ShuffledIndices) const;		const DenseSet<unsigned> &ShuffledIndices) const;

/// Checks if the gathered \p VL can be represented as shuffle(s) of previous		/// Checks if the gathered \p VL can be represented as shuffle(s) of previous
/// tree entries.		/// tree entries.
/// \returns ShuffleKind, if gathered values can be represented as shuffles of		/// \returns ShuffleKind, if gathered values can be represented as shuffles of
/// previous tree entries. \p Mask is filled with the shuffle mask.		/// previous tree entries. \p Mask is filled with the shuffle mask.
Optional<TargetTransformInfo::ShuffleKind>		Optional<TargetTransformInfo::ShuffleKind>
isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,		isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,
SmallVectorImpl<const TreeEntry *> &Entries);		SmallVectorImpl<const TreeEntry *> &Entries);

/// \returns the scalarization cost for this list of values. Assuming that		/// \returns the scalarization cost for this list of values. Assuming that
/// this subtree gets vectorized, we may need to extract the values from the		/// this subtree gets vectorized, we may need to extract the values from the
/// roots. This method calculates the cost of extracting the values.		/// roots. This method calculates the cost of extracting the values.
InstructionCost getGatherCost(ArrayRef<Value *> VL) const;		InstructionCost getGatherCost(ArrayRef<Value *> VL, unsigned VF) const;

/// Set the Builder insert point to one after the last instruction in		/// Set the Builder insert point to one after the last instruction in
/// the bundle		/// the bundle
void setInsertPointAfterBundle(const TreeEntry *E);		void setInsertPointAfterBundle(const TreeEntry *E);

/// \returns a vector from a collection of scalars in \p VL.		/// \returns a vector from a collection of scalars in \p VL.
Value gather(ArrayRef<Value > VL);		Value gather(ArrayRef<Value > VL);

/// \returns whether the VectorizableTree is fully vectorizable and will		/// \returns whether the VectorizableTree is fully vectorizable and will
/// be beneficial even the tree height is tiny.		/// be beneficial even the tree height is tiny.
bool isFullyVectorizableTinyTree() const;		bool isFullyVectorizableTinyTree() const;

/// Reorder commutative or alt operands to get better probability of		/// Reorder commutative or alt operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		static void reorderInputsAccordingToOpcode(
SmallVectorImpl<Value *> &Left,		Instruction &VL0, ArrayRef<Value > VL, SmallVectorImpl<Value > &Left,
SmallVectorImpl<Value *> &Right,		SmallVectorImpl<Value *> &Right, const DataLayout &DL,
const DataLayout &DL,		ScalarEvolution &SE, const BoUpSLP &R);
ScalarEvolution &SE,
const BoUpSLP &R);
struct TreeEntry {		struct TreeEntry {
using VecTreeTy = SmallVector<std::unique_ptr<TreeEntry>, 8>;		using VecTreeTy = SmallVector<std::unique_ptr<TreeEntry>, 8>;
TreeEntry(VecTreeTy &Container) : Container(Container) {}		TreeEntry(VecTreeTy &Container) : Container(Container) {}

/// \returns true if the scalars in VL are equal to this entry.		/// \returns true if the scalars in VL are equal to this entry. The scalars
		/// in VL are equal to this entry if it contains the same scalars(or udefs)
		/// on the same places.
bool isSame(ArrayRef<Value *> VL) const {		bool isSame(ArrayRef<Value *> VL) const {
if (VL.size() == Scalars.size())		if (!ReuseShuffleIndices.empty()) {
return std::equal(VL.begin(), VL.end(), Scalars.begin());		for (int I = 0, E = VL.size(); I < E; ++I) {
return VL.size() == ReuseShuffleIndices.size() &&		int Idx = ReuseShuffleIndices[I];
std::equal(		if (Idx == UndefMaskElem) {
VL.begin(), VL.end(), ReuseShuffleIndices.begin(),		if (!isa<UndefValue>(VL[I]))
[this](Value *V, int Idx) { return V == Scalars[Idx]; });		return false;
		continue;
		}
		if (VL[I] != Scalars[Idx] &&
		(!isa<UndefValue>(VL[I]) \|\| isa<PoisonValue>(Scalars[I])))
		return false;
		}
		return true;
		}
		for (int I = 0, E = VL.size(); I < E; ++I)
		if (VL[I] != Scalars[I] &&
		(!isa<UndefValue>(VL[I]) \|\| isa<PoisonValue>(Scalars[I])))
		return false;
		return true;
}		}

/// A vector of scalars.		/// A vector of scalars.
ValueList Scalars;		ValueList Scalars;

/// The Scalars are vectorized into this value. It is initialized to Null.		/// The Scalars are vectorized into this value. It is initialized to Null.
Value *VectorizedValue = nullptr;		Value *VectorizedValue = nullptr;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void setOperand(unsigned OpIdx, ArrayRef<Value *> OpVL) {
Operands.resize(OpIdx + 1);		Operands.resize(OpIdx + 1);
assert(Operands[OpIdx].empty() && "Already resized?");		assert(Operands[OpIdx].empty() && "Already resized?");
Operands[OpIdx].resize(Scalars.size());		Operands[OpIdx].resize(Scalars.size());
for (unsigned Lane = 0, E = Scalars.size(); Lane != E; ++Lane)		for (unsigned Lane = 0, E = Scalars.size(); Lane != E; ++Lane)
Operands[OpIdx][Lane] = OpVL[Lane];		Operands[OpIdx][Lane] = OpVL[Lane];
}		}

/// Set the operands of this bundle in their original order.		/// Set the operands of this bundle in their original order.
void setOperandsInOrder() {		void setOperandsInOrder(Instruction *I0) {
assert(Operands.empty() && "Already initialized?");		assert(Operands.empty() && "Already initialized?");
auto *I0 = cast<Instruction>(Scalars[0]);
Operands.resize(I0->getNumOperands());		Operands.resize(I0->getNumOperands());
unsigned NumLanes = Scalars.size();		unsigned NumLanes = Scalars.size();
for (unsigned OpIdx = 0, NumOperands = I0->getNumOperands();		for (unsigned OpIdx = 0, NumOperands = I0->getNumOperands();
OpIdx != NumOperands; ++OpIdx) {		OpIdx != NumOperands; ++OpIdx) {
Operands[OpIdx].resize(NumLanes);		Operands[OpIdx].resize(NumLanes);
for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {		for (unsigned Lane = 0; Lane != NumLanes; ++Lane) {
		if (isa<PoisonValue>(Scalars[Lane])) {
		Operands[OpIdx][Lane] =
		PoisonValue::get(I0->getOperand(OpIdx)->getType());
		continue;
		}
		if (isa<UndefValue>(Scalars[Lane])) {
		Operands[OpIdx][Lane] =
		UndefValue::get(I0->getOperand(OpIdx)->getType());
		continue;
		}
auto *I = cast<Instruction>(Scalars[Lane]);		auto *I = cast<Instruction>(Scalars[Lane]);
assert(I->getNumOperands() == NumOperands &&		assert(I->getNumOperands() == NumOperands &&
"Expected same number of operands");		"Expected same number of operands");
Operands[OpIdx][Lane] = I->getOperand(OpIdx);		Operands[OpIdx][Lane] = I->getOperand(OpIdx);
}		}
}		}
}		}

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
unsigned getAltOpcode() const {		unsigned getAltOpcode() const {
return AltOp ? AltOp->getOpcode() : 0;		return AltOp ? AltOp->getOpcode() : 0;
}		}

/// Update operations state of this entry if reorder occurred.		/// Update operations state of this entry if reorder occurred.
bool updateStateIfReorder() {		bool updateStateIfReorder() {
if (ReorderIndices.empty())		if (ReorderIndices.empty())
return false;		return false;
InstructionsState S = getSameOpcode(Scalars, ReorderIndices.front());		unsigned Size = Scalars.size();
		InstructionsState S =
		getSameOpcode(Scalars, *find_if(ReorderIndices, [Size](unsigned Idx) {
		return Idx < Size;
		}));
setOperations(S);		setOperations(S);
return true;		return true;
}		}
/// When ReuseShuffleIndices is empty it just returns position of \p V		/// When ReuseShuffleIndices is empty it just returns position of \p V
/// within vector of Scalars. Otherwise, try to remap on its reuse index.		/// within vector of Scalars. Otherwise, try to remap on its reuse index.
int findLaneForValue(Value *V) const {		int findLaneForValue(Value *V) const {
unsigned FoundLane = std::distance(Scalars.begin(), find(Scalars, V));		unsigned FoundLane = std::distance(Scalars.begin(), find(Scalars, V));
assert(FoundLane < Scalars.size() && "Couldn't find extract lane");		assert(FoundLane < Scalars.size() && "Couldn't find extract lane");
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	dbgs() << "SLP: ReuseShuffleCost + VecCost - ScalarCost = " <<
ReuseShuffleCost + VecCost - ScalarCost << "\n";		ReuseShuffleCost + VecCost - ScalarCost << "\n";
}		}
#endif		#endif

/// Create a new VectorizableTree entry.		/// Create a new VectorizableTree entry.
TreeEntry newTreeEntry(ArrayRef<Value > VL, Optional<ScheduleData *> Bundle,		TreeEntry newTreeEntry(ArrayRef<Value > VL, Optional<ScheduleData *> Bundle,
const InstructionsState &S,		const InstructionsState &S,
const EdgeInfo &UserTreeIdx,		const EdgeInfo &UserTreeIdx,
ArrayRef<unsigned> ReuseShuffleIndices = None,		ArrayRef<int> ReuseShuffleIndices = None,
ArrayRef<unsigned> ReorderIndices = None) {		ArrayRef<unsigned> ReorderIndices = None) {
TreeEntry::EntryState EntryState =		TreeEntry::EntryState EntryState =
Bundle ? TreeEntry::Vectorize : TreeEntry::NeedToGather;		Bundle ? TreeEntry::Vectorize : TreeEntry::NeedToGather;
return newTreeEntry(VL, EntryState, Bundle, S, UserTreeIdx,		return newTreeEntry(VL, EntryState, Bundle, S, UserTreeIdx,
ReuseShuffleIndices, ReorderIndices);		ReuseShuffleIndices, ReorderIndices);
}		}

TreeEntry newTreeEntry(ArrayRef<Value > VL,		TreeEntry newTreeEntry(ArrayRef<Value > VL,
TreeEntry::EntryState EntryState,		TreeEntry::EntryState EntryState,
Optional<ScheduleData *> Bundle,		Optional<ScheduleData *> Bundle,
const InstructionsState &S,		const InstructionsState &S,
const EdgeInfo &UserTreeIdx,		const EdgeInfo &UserTreeIdx,
ArrayRef<unsigned> ReuseShuffleIndices = None,		ArrayRef<int> ReuseShuffleIndices = None,
ArrayRef<unsigned> ReorderIndices = None) {		ArrayRef<unsigned> ReorderIndices = None) {
assert(((!Bundle && EntryState == TreeEntry::NeedToGather) \|\|		assert(((!Bundle && EntryState == TreeEntry::NeedToGather) \|\|
(Bundle && EntryState != TreeEntry::NeedToGather)) &&		(Bundle && EntryState != TreeEntry::NeedToGather)) &&
"Need to vectorize gather entry?");		"Need to vectorize gather entry?");
VectorizableTree.push_back(std::make_unique<TreeEntry>(VectorizableTree));		VectorizableTree.push_back(std::make_unique<TreeEntry>(VectorizableTree));
TreeEntry *Last = VectorizableTree.back().get();		TreeEntry *Last = VectorizableTree.back().get();
Last->Idx = VectorizableTree.size() - 1;		Last->Idx = VectorizableTree.size() - 1;
Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());		Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->State = EntryState;		Last->State = EntryState;
Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),		Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
ReuseShuffleIndices.end());		ReuseShuffleIndices.end());
Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());		Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());
Last->setOperations(S);		Last->setOperations(S);
		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
if (Last->State != TreeEntry::NeedToGather) {		if (Last->State != TreeEntry::NeedToGather) {
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
assert(!getTreeEntry(V) && "Scalar already in tree!");		assert(!getTreeEntry(V) && "Scalar already in tree!");
ScalarToTreeEntry[V] = Last;		ScalarToTreeEntry[V] = Last;
}		}
// Update the scheduler bundle to point to this TreeEntry.		// Update the scheduler bundle to point to this TreeEntry.
unsigned Lane = 0;		unsigned Lane = 0;
for (ScheduleData *BundleMember = Bundle.getValue(); BundleMember;		for (ScheduleData *BundleMember = Bundle.getValue(); BundleMember;
BundleMember = BundleMember->NextInBundle) {		BundleMember = BundleMember->NextInBundle) {
BundleMember->TE = Last;		BundleMember->TE = Last;
BundleMember->Lane = Lane;		BundleMember->Lane = Lane;
++Lane;		++Lane;
}		}
assert((!Bundle.getValue() \|\| Lane == VL.size()) &&		assert((!Bundle.getValue() \|\|
		Lane == std::distance(InstructionsOnly.begin(),
		InstructionsOnly.end())) &&
"Bundle and VL out of sync");		"Bundle and VL out of sync");
} else {		} else {
MustGather.insert(VL.begin(), VL.end());		MustGather.insert(InstructionsOnly.begin(), InstructionsOnly.end());
}		}

if (UserTreeIdx.UserTE)		if (UserTreeIdx.UserTE)
Last->UserTreeIndices.push_back(UserTreeIdx);		Last->UserTreeIndices.push_back(UserTreeIdx);

return Last;		return Last;
}		}

Show All 18 Lines	#endif
}		}

/// Maps a specific scalar to its tree entry.		/// Maps a specific scalar to its tree entry.
SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;		SmallDenseMap<Value, TreeEntry > ScalarToTreeEntry;

/// Maps a value!to the proposed vectorizable size.		/// Maps a value!to the proposed vectorizable size.
SmallDenseMap<Value *, unsigned> InstrElementSize;		SmallDenseMap<Value *, unsigned> InstrElementSize;

		/// Vectorization factors for tree entries.
		SmallDenseMap<const TreeEntry *, unsigned> EntryVFs;

/// A list of scalars that we found that we need to keep as scalars.		/// A list of scalars that we found that we need to keep as scalars.
ValueSet MustGather;		ValueSet MustGather;

		/// A list of loads to be gathered during the vectorization process. We can
		/// try to vectorize them at the end, if profitable.
		SmallVector<LoadInst *, 4> GatheredLoads;
		/// The index of the first gathered load entry in the VectorizeTree.
		int GatheredLoadsEntriesFirst = -1;

/// This POD struct describes one external user in the vectorized tree.		/// This POD struct describes one external user in the vectorized tree.
struct ExternalUser {		struct ExternalUser {
ExternalUser(Value S, llvm::User U, int L)		ExternalUser(Value S, llvm::User U, int L)
: Scalar(S), User(U), Lane(L) {}		: Scalar(S), User(U), Lane(L) {}

// Which scalar in our function.		// Which scalar in our function.
Value *Scalar;		Value *Scalar;

▲ Show 20 Lines • Show All 655 Lines • ▼ Show 20 Lines
void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ExtraValueToDebugLocsMap &ExternallyUsedValues,		ExtraValueToDebugLocsMap &ExternallyUsedValues,
ArrayRef<Value *> UserIgnoreLst) {		ArrayRef<Value *> UserIgnoreLst) {
deleteTree();		deleteTree();
UserIgnoreList = UserIgnoreLst;		UserIgnoreList = UserIgnoreLst;
if (!allSameType(Roots))		if (!allSameType(Roots))
return;		return;
buildTree_rec(Roots, 0, EdgeInfo());		buildTree_rec(Roots, 0, EdgeInfo());
		// Try to vectorize gathered loads.
		if (!GatheredLoads.empty() && !isTreeTinyAndNotFullyVectorizable()) {
		GatheredLoadsEntriesFirst = VectorizableTree.size();
		SmallDenseMap<LoadInst , Value , 8> GatherPointers;
		for (LoadInst *LI : GatheredLoads)
		GatherPointers.try_emplace(LI,
		getUnderlyingObject(LI->getPointerOperand()));

		// Sort by type, base pointers and parents.
		auto &&LoadSorter = [&GatherPointers](LoadInst V, LoadInst V2) {
		return V->getParent() < V2->getParent() \|\|
		(V->getParent() == V2->getParent() &&
		V->getPointerOperand()->getType() <
		V2->getPointerOperand()->getType()) \|\|
		(V->getParent() == V2->getParent() &&
		V->getPointerOperand()->getType() ==
		V2->getPointerOperand()->getType() &&
		GatherPointers[V] < GatherPointers[V2]);
		};

		llvm::stable_sort(GatheredLoads, LoadSorter);

		// Try to vectorize elements based on their types, bases and parents.
		for (auto IncIt = GatheredLoads.begin(), E = GatheredLoads.end();
		IncIt != E;) {

		// Look for the next elements with the same type.
		auto *SameTypeIt = IncIt;
		Type EltTy = (IncIt)->getPointerOperand()->getType();
		Value Ptr = GatherPointers[IncIt];

		SetVector<LoadInst *> Set(IncIt, SameTypeIt);
		while (SameTypeIt != E &&
		(SameTypeIt)->getParent() == (IncIt)->getParent() &&
		(*SameTypeIt)->getPointerOperand()->getType() == EltTy &&
		Ptr == GatherPointers[*SameTypeIt]) {
		if (!getTreeEntry(*SameTypeIt))
		Set.insert(*SameTypeIt);
		++SameTypeIt;
		}

		ArrayRef<LoadInst *> Loads = Set.getArrayRef();
		int NumElts = Loads.size();
		if (NumElts >= 3 \|\| (NumElts == 2 && all_of(Loads, [](LoadInst *LI) {
		return LI->hasOneUse();
		}))) {
		SmallVector<Value *, 4> Pointers(NumElts);
		for (int I = 0; I < NumElts; ++I)
		Pointers[I] = Loads[I]->getPointerOperand();
		SmallVector<unsigned, 4> SortedIndicies;
		Type *ScalarTy = Loads.front()->getType();
		if (sortPtrAccesses(Pointers, ScalarTy, DL, SE, SortedIndicies)) {
		if (SortedIndicies.empty()) {
		SortedIndicies.assign(NumElts, 0);
		std::iota(SortedIndicies.begin(), SortedIndicies.end(), 0);
		}
		Optional<int> Diff = getPointersDiff(
		ScalarTy, Pointers[SortedIndicies.front()], ScalarTy,
		Pointers[SortedIndicies.back()], DL, SE);
		int MaxLoads = std::max(getMaxVecRegSize() / DL->getTypeSizeInBits(
		Loads[0]->getType()),
		Roots.size()) *
		(NumElts >= 4 ? 1 : 2);
		if (Diff && *Diff < MaxLoads) {
		SmallVector<Value *, 4> Values(
		PowerOf2Ceil(Diff + 1), UndefValue::get((IncIt)->getType()));
		// Sort loads.
		Values[0] = Loads[SortedIndicies.front()];
		for (int I = 1; I < NumElts; ++I) {
		Optional<int> Diff = getPointersDiff(
		ScalarTy, Pointers[SortedIndicies.front()], ScalarTy,
		Pointers[SortedIndicies[I]], DL, SE);
		Values[*Diff] = Loads[SortedIndicies[I]];
		}
		LLVM_DEBUG(dbgs() << "SLP: Trying to vectorize gathered loads ("
		<< NumElts << ")\n");

		buildTree_rec(Values, 0, EdgeInfo());
		}
		}
		}

		// Start over at the next instruction of a different type (or the end).
		IncIt = SameTypeIt;
		}
		}

// Collect the values that we need to extract from the tree.		// Collect the values that we need to extract from the tree.
for (auto &TEPtr : VectorizableTree) {		for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();		TreeEntry *Entry = TEPtr.get();

// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

// For each lane:		// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];
		if (isa<UndefValue>(Scalar))
		continue;
int FoundLane = Entry->findLaneForValue(Scalar);		int FoundLane = Entry->findLaneForValue(Scalar);

// Check if the scalar is externally used as an extra arg.		// Check if the scalar is externally used as an extra arg.
auto ExtI = ExternallyUsedValues.find(Scalar);		auto ExtI = ExternallyUsedValues.find(Scalar);
if (ExtI != ExternallyUsedValues.end()) {		if (ExtI != ExternallyUsedValues.end()) {
LLVM_DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane "		LLVM_DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane "
<< Lane << " from " << *Scalar << ".\n");		<< Lane << " from " << *Scalar << ".\n");
ExternalUses.emplace_back(Scalar, nullptr, FoundLane);		ExternalUses.emplace_back(Scalar, nullptr, FoundLane);
}		}
for (User *U : Scalar->users()) {		for (User *U : Scalar->users()) {
LLVM_DEBUG(dbgs() << "SLP: Checking user:" << *U << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Checking user:" << *U << ".\n");

Instruction *UserInst = dyn_cast<Instruction>(U);		Instruction *UserInst = dyn_cast<Instruction>(U);
if (!UserInst)		if (!UserInst)
continue;		continue;

// Skip in-tree scalars that become vectors		// Skip in-tree scalars that become vectors
if (TreeEntry *UseEntry = getTreeEntry(U)) {		if (TreeEntry *UseEntry = getTreeEntry(U)) {
Value *UseScalar = UseEntry->Scalars[0];		auto *It = llvm::find_if(UseEntry->Scalars, Instruction::classof);
		assert(It != UseEntry->Scalars.end() &&
		"At least single instruction is expected.");
		Value UseScalar = It;
// Some in-tree scalars will remain as scalar in vectorized		// Some in-tree scalars will remain as scalar in vectorized
// instructions. If that is the case, the one in Lane 0 will		// instructions. If that is the case, the one in the first lane will
// be used.		// be used.
if (UseScalar != U \|\|		if (UseScalar != U \|\|
UseEntry->State == TreeEntry::ScatterVectorize \|\|		UseEntry->State == TreeEntry::ScatterVectorize \|\|
!InTreeUserNeedToExtract(Scalar, UserInst, TLI)) {		!InTreeUserNeedToExtract(Scalar, UserInst, TLI)) {
LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U		LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U
<< ".\n");		<< ".\n");
assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");		assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
continue;		continue;
}		}
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions Could it be actually no one `Instruction` in `UseEntry` or should it be assert? anton-afanasyev: Could it be actually no one `Instruction` in `UseEntry` or should it be assert?
}		}

// Ignore users in the user ignore list.		// Ignore users in the user ignore list.
if (is_contained(UserIgnoreList, UserInst))		if (is_contained(UserIgnoreList, UserInst))
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions "Lane 0" seems outdated here, but not sure about better description. anton-afanasyev: "Lane 0" seems outdated here, but not sure about better description.
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "		LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "
<< Lane << " from " << *Scalar << ".\n");		<< Lane << " from " << *Scalar << ".\n");
ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));		ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));
}		}
}		}
}		}
}		}

		/// Tries to find subvector of loads and builds new vector of only loads if can
		/// be profitable.
		static void
		gatherPossiblyVectorizableLoads(const BoUpSLP &R, ArrayRef<Value *> VL,
		SmallVectorImpl<LoadInst *> &GatheredLoads) {
		for (Value *V : VL) {
		if (auto *LI = dyn_cast<LoadInst>(V))
		if (!R.isDeleted(LI))
		GatheredLoads.push_back(LI);
		}
		}

void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,		void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
const EdgeInfo &UserTreeIdx) {		const EdgeInfo &UserTreeIdx) {
assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");		assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");

InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (Depth == RecursionMaxDepth) {		if (Depth == RecursionMaxDepth) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
Show All 19 Lines	void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,

if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))		if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))
if (SI->getValueOperand()->getType()->isVectorTy()) {		if (SI->getValueOperand()->getType()->isVectorTy()) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

		auto InitialInstructionsOnly = make_filter_range(VL, Instruction::classof);
// If all of the operands are identical or constant we have a simple solution.		// If all of the operands are identical or constant we have a simple solution.
if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !S.getOpcode()) {		if (allConstant(VL) \|\| isSplat(VL) \|\|
		!allSameBlock(InitialInstructionsOnly) \|\| !S.getOpcode()) {
		gatherPossiblyVectorizableLoads(*this, VL, GatheredLoads);
LLVM_DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

// We now know that this is a vector of instructions of the same type from		// We now know that this is a vector of instructions of the same type from
// the same block.		// the same block.

// Don't vectorize ephemeral values.		// Don't vectorize ephemeral values.
for (Value *V : VL) {		for (Value *V : InitialInstructionsOnly) {
if (EphValues.count(V)) {		if (EphValues.count(V)) {
LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V		LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V
<< ") is ephemeral.\n");		<< ") is ephemeral.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// Check if this is a duplicate of another entry.		// Check if this is a duplicate of another entry.
if (TreeEntry *E = getTreeEntry(S.OpValue)) {		if (TreeEntry *E = getTreeEntry(S.OpValue)) {
LLVM_DEBUG(dbgs() << "SLP: \tChecking bundle: " << *S.OpValue << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tChecking bundle: " << *S.OpValue << ".\n");
if (!E->isSame(VL)) {		if (!E->isSame(VL)) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
// Record the reuse of the tree node. FIXME, currently this is only used to		// Record the reuse of the tree node. FIXME, currently this is only used to
// properly draw the graph rather than for the actual vectorization.		// properly draw the graph rather than for the actual vectorization.
E->UserTreeIndices.push_back(UserTreeIdx);		E->UserTreeIndices.push_back(UserTreeIdx);
LLVM_DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *S.OpValue		LLVM_DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *S.OpValue
<< ".\n");		<< ".\n");
return;		return;
}		}

// Check that none of the instructions in the bundle are already in the tree.		// Check that none of the instructions in the bundle are already in the tree.
for (Value *V : VL) {		for (Value *V : InitialInstructionsOnly) {
		RKSimonUnsubmitted Done Reply Inline Actions Can we use for (Value V : make_filter_range(VL, Instruction::classof) ? RKSimon:* Can we use for (Value *V : make_filter_range(VL, Instruction::classof) ?
auto *I = dyn_cast<Instruction>(V);		if (getTreeEntry(V)) {
if (!I)
continue;
if (getTreeEntry(I)) {
LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V		LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V
<< ") is already in tree.\n");		<< ") is already in tree.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// If any of the scalars is marked as a value that needs to stay scalar, then		// The reduction nodes (stored in UserIgnoreList) should stay scalar.
		RKSimonUnsubmitted Done Reply Inline Actions for (Value V : make_filter_range(VL, Instruction::classof) ? RKSimon:* for (Value *V : make_filter_range(VL, Instruction::classof) ?
// we need to gather the scalars.		for (Value *V : InitialInstructionsOnly) {
// The reduction nodes (stored in UserIgnoreList) also should stay scalar.		if (is_contained(UserIgnoreList, V)) {
for (Value *V : VL) {
if (MustGather.count(V) \|\| is_contained(UserIgnoreList, V)) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// Check that all of the users of the scalars that we want to vectorize are		// Check that all of the users of the scalars that we want to vectorize are
// schedulable.		// schedulable.
auto *VL0 = cast<Instruction>(S.OpValue);		auto *VL0 = cast<Instruction>(S.OpValue);
BasicBlock *BB = VL0->getParent();		BasicBlock *BB = VL0->getParent();

if (!DT->isReachableFromEntry(BB)) {		if (!DT->isReachableFromEntry(BB)) {
// Don't go into unreachable blocks. They may contain instructions with		// Don't go into unreachable blocks. They may contain instructions with
// dependency cycles which confuse the final scheduling.		// dependency cycles which confuse the final scheduling.
LLVM_DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");		LLVM_DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

// Check that every instruction appears once in this bundle.		// Check that every instruction appears once in this bundle.
SmallVector<unsigned, 4> ReuseShuffleIndicies;		SmallVector<int> ReuseShuffleIndicies;
		RKSimonUnsubmitted Not Done Reply Inline Actions Should ReuseShuffleIndicies be SmallVector<int, 4> - and we then tag undefs with -1 (llvm::UndefMaskElem) ? RKSimon: Should ReuseShuffleIndicies be SmallVector<int, 4> - and we then tag undefs with -1 (llvm…
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, it won't work, need to register actual positions in `ReuseShuffleIndicies`, `-1` does not work here ABataev: No, it won't work, need to register actual positions in `ReuseShuffleIndicies`, `-1` does not…
SmallVector<Value *, 4> UniqueValues;		SmallVector<Value *, 4> UniqueValues;
DenseMap<Value *, unsigned> UniquePositions;		DenseMap<Value *, unsigned> UniquePositions;
		UniqueValues.reserve(VL.size());
		ReuseShuffleIndicies.reserve(VL.size());
		unsigned NumberOfInstructions = 0;
		unsigned UserNumberOfInstructions = 0;
		if (const TreeEntry *UserTE = UserTreeIdx.UserTE)
		UserNumberOfInstructions =
		count_if(UserTE->Scalars, [](Value *V) { return !isa<UndefValue>(V); });
		unsigned Pos = 0;
for (Value *V : VL) {		for (Value *V : VL) {
		if (isa<PoisonValue>(V)) {
		ReuseShuffleIndicies.emplace_back(UndefMaskElem);
		++Pos;
		continue;
		}
		if (isa<UndefValue>(V)) {
		ReuseShuffleIndicies.emplace_back(
		Pos < UserNumberOfInstructions ? Pos : UndefMaskElem);
		++Pos;
		continue;
		}
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		auto Res = UniquePositions.try_emplace(V, UniqueValues.size());
ReuseShuffleIndicies.emplace_back(Res.first->second);		ReuseShuffleIndicies.emplace_back(Res.first->second);
if (Res.second)		if (Res.second) {
UniqueValues.emplace_back(V);		UniqueValues.emplace_back(V);
		++NumberOfInstructions;
		}
		++Pos;
}		}
size_t NumUniqueScalarValues = UniqueValues.size();		if (NumberOfInstructions == VL.size()) {
if (NumUniqueScalarValues == VL.size()) {
ReuseShuffleIndicies.clear();		ReuseShuffleIndicies.clear();
} else {		} else {
LLVM_DEBUG(dbgs() << "SLP: Shuffle for reused scalars.\n");		LLVM_DEBUG(dbgs() << "SLP: Shuffle for reused scalars.\n");
if (NumUniqueScalarValues <= 1 \|\|		if (NumberOfInstructions <= 1) {
!llvm::isPowerOf2_32(NumUniqueScalarValues)) {		gatherPossiblyVectorizableLoads(*this, VL, GatheredLoads);
LLVM_DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");		LLVM_DEBUG(dbgs() << "SLP: Single scalar in bundle"
		<< *UniqueValues.front() << ".\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
		// Check if the reuse shuffle mask is uniform and no need to count undefs
		// as real operands.
		if ((UserNumberOfInstructions == 0 \|\|
		UserNumberOfInstructions == NumberOfInstructions) &&
		ShuffleVectorInst::isIdentityMask(ReuseShuffleIndicies))
		ReuseShuffleIndicies.clear();
		UniqueValues.append(VL.size() - UniqueValues.size(),
		UndefValue::get(VL0->getType()));
VL = UniqueValues;		VL = UniqueValues;
}		}
		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);

auto &BSRef = BlocksSchedules[BB];		auto &BSRef = BlocksSchedules[BB];
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions `assert(NumberOfInstructions != 0 && "...")` and `if (NumberOfInstructions == 1)`? anton-afanasyev: `assert(NumberOfInstructions != 0 && "...")` and `if (NumberOfInstructions == 1)`?
if (!BSRef)		if (!BSRef)
BSRef = std::make_unique<BlockScheduling>(BB);		BSRef = std::make_unique<BlockScheduling>(BB);

BlockScheduling &BS = *BSRef.get();		BlockScheduling &BS = *BSRef.get();

Optional<ScheduleData *> Bundle = BS.tryScheduleBundle(VL, this, S);		Optional<ScheduleData *> Bundle = BS.tryScheduleBundle(VL, this, S);
if (!Bundle) {		if (!Bundle) {
LLVM_DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");		LLVM_DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions what do you think about defining InstructionsOnly in InstructionsState? dtemirbulatov: what do you think about defining InstructionsOnly in InstructionsState?
		ABataevAuthorUnsubmitted Done Reply Inline Actions I don't think it is really required. `InstructrionsOnly` is just a range, not a container ABataev: I don't think it is really required. `InstructrionsOnly` is just a range, not a container
assert((!BS.getScheduleData(VL0) \|\|		assert((!BS.getScheduleData(VL0) \|\|
!BS.getScheduleData(VL0)->isPartOfBundle()) &&		!BS.getScheduleData(VL0)->isPartOfBundle()) &&
"tryScheduleBundle should cancelScheduling on failure");		"tryScheduleBundle should cancelScheduling on failure");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
LLVM_DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");		LLVM_DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");

unsigned ShuffleOrOp = S.isAltShuffle() ?		unsigned ShuffleOrOp = S.isAltShuffle() ?
(unsigned) Instruction::ShuffleVector : S.getOpcode();		(unsigned) Instruction::ShuffleVector : S.getOpcode();
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
auto *PH = cast<PHINode>(VL0);		auto *PH = cast<PHINode>(VL0);

// Check for terminator values (e.g. invoke).		// Check for terminator values (e.g. invoke).
for (Value *V : VL)		for (Value *V : InstructionsOnly)
for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {		for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {
Instruction *Term = dyn_cast<Instruction>(		auto *Term =
cast<PHINode>(V)->getIncomingValueForBlock(		dyn_cast<Instruction>(cast<PHINode>(V)->getIncomingValueForBlock(
PH->getIncomingBlock(I)));		PH->getIncomingBlock(I)));
if (Term && Term->isTerminator()) {		if (Term && Term->isTerminator()) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Do these trivial style refactors separately now to reduce the size of the patch? RKSimon: Do these trivial style refactors separately now to reduce the size of the patch?
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: Need to swizzle PHINodes (terminator use).\n");		<< "SLP: Need to swizzle PHINodes (terminator use).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}

TreeEntry *TE =		TreeEntry *TE =
newTreeEntry(VL, Bundle, S, UserTreeIdx, ReuseShuffleIndicies);		newTreeEntry(VL, Bundle, S, UserTreeIdx, ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");

// Keeps the reordered operands to avoid code duplication.		// Keeps the reordered operands to avoid code duplication.
SmallVector<ValueList, 2> OperandsVec;		SmallVector<ValueList, 2> OperandsVec;
for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {		for (unsigned I = 0, E = PH->getNumIncomingValues(); I < E; ++I) {
if (!DT->isReachableFromEntry(PH->getIncomingBlock(I))) {		if (!DT->isReachableFromEntry(PH->getIncomingBlock(I))) {
ValueList Operands(VL.size(), PoisonValue::get(PH->getType()));		ValueList Operands(VL.size(), PoisonValue::get(PH->getType()));
TE->setOperand(I, Operands);		TE->setOperand(I, Operands);
OperandsVec.push_back(Operands);		OperandsVec.push_back(Operands);
continue;		continue;
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions Do these trivial style refactors separately now to reduce the size of the patch? RKSimon: Do these trivial style refactors separately now to reduce the size of the patch?
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<PHINode>(V)->getIncomingValueForBlock(		Operands.emplace_back(
		isa<PoisonValue>(V) ? PoisonValue::get(V->getType())
		: isa<UndefValue>(V) ? UndefValue::get(V->getType())
		: cast<PHINode>(V)->getIncomingValueForBlock(
PH->getIncomingBlock(I)));		PH->getIncomingBlock(I)));
TE->setOperand(I, Operands);		TE->setOperand(I, Operands);
OperandsVec.push_back(Operands);		OperandsVec.push_back(Operands);
}		}
for (unsigned OpIdx = 0, OpE = OperandsVec.size(); OpIdx != OpE; ++OpIdx)		for (unsigned OpIdx = 0, OpE = OperandsVec.size(); OpIdx != OpE; ++OpIdx)
buildTree_rec(OperandsVec[OpIdx], Depth + 1, {TE, OpIdx});		buildTree_rec(OperandsVec[OpIdx], Depth + 1, {TE, OpIdx});
return;		return;
}		}
case Instruction::ExtractValue:		case Instruction::ExtractValue:
Show All 19 Lines	case Instruction::ExtractElement: {
for (unsigned Idx : CurrentOrder)		for (unsigned Idx : CurrentOrder)
dbgs() << " " << Idx;		dbgs() << " " << Idx;
dbgs() << "\n";		dbgs() << "\n";
});		});
// Insert new order with initial value 0, if it does not exist,		// Insert new order with initial value 0, if it does not exist,
// otherwise return the iterator to the existing one.		// otherwise return the iterator to the existing one.
newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies, CurrentOrder);		ReuseShuffleIndicies, CurrentOrder);
		// No need to reorder if still need to shuffle reuses.
		if (ReuseShuffleIndicies.empty()) {
findRootOrder(CurrentOrder);		findRootOrder(CurrentOrder);
++NumOpsWantToKeepOrder[CurrentOrder];		++NumOpsWantToKeepOrder[CurrentOrder];
		} else {
		++NumOpsWantToKeepOriginalOrder;
		}
// This is a special case, as it does not gather, but at the same time		// This is a special case, as it does not gather, but at the same time
// we are not extending buildTree_rec() towards the operands.		// we are not extending buildTree_rec() towards the operands.
ValueList Op0;		ValueList Op0;
Op0.assign(VL.size(), VL0->getOperand(0));		Op0.assign(VL.size(), VL0->getOperand(0));
VectorizableTree.back()->setOperand(0, Op0);		VectorizableTree.back()->setOperand(0, Op0);
return;		return;
}		}
LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");		LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
return;		return;
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
assert(ReuseShuffleIndicies.empty() && "All inserts should be unique");		assert(
		(ReuseShuffleIndicies.empty() \|\| NumberOfInstructions != VL.size()) &&
		"All inserts should be unique");

// Check that we have a buildvector and not a shuffle of 2 or more		// Check that we have a buildvector and not a shuffle of 2 or more
// different vectors.		// different vectors.
ValueSet SourceVectors;		ValueSet SourceVectors;
for (Value *V : VL)		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
SourceVectors.insert(cast<Instruction>(V)->getOperand(0));		SourceVectors.insert(cast<Instruction>(V)->getOperand(0));
		}

if (count_if(VL, [&SourceVectors](Value *V) {		if (count_if(InstructionsOnly, [&SourceVectors](Value *V) {
return !SourceVectors.contains(V);		return !SourceVectors.contains(V);
}) >= 2) {		}) >= 2) {
// Found 2nd source vector - cancel.		// Found 2nd source vector - cancel.
LLVM_DEBUG(dbgs() << "SLP: Gather of insertelement vectors with "		LLVM_DEBUG(dbgs() << "SLP: Gather of insertelement vectors with "
"different source vectors.\n");		"different source vectors.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
return;		return;
}		}

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx);		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx);
LLVM_DEBUG(dbgs() << "SLP: added inserts bundle.\n");		LLVM_DEBUG(dbgs() << "SLP: added inserts bundle.\n");

constexpr int NumOps = 2;		constexpr int NumOps = 2;
ValueList VectorOperands[NumOps];		ValueList VectorOperands[NumOps];
for (int I = 0; I < NumOps; ++I) {		for (int I = 0; I < NumOps; ++I) {
for (Value *V : VL)		for (Value *V : VL) {
		if (isa<PoisonValue>(V)) {
		VectorOperands[I].push_back(PoisonValue::get(
		cast<Instruction>(VL0)->getOperand(I)->getType()));
		continue;
		}
		if (isa<UndefValue>(V)) {
		VectorOperands[I].push_back(UndefValue::get(
		cast<Instruction>(VL0)->getOperand(I)->getType()));
		continue;
		}
VectorOperands[I].push_back(cast<Instruction>(V)->getOperand(I));		VectorOperands[I].push_back(cast<Instruction>(V)->getOperand(I));
		}

TE->setOperand(I, VectorOperands[I]);		TE->setOperand(I, VectorOperands[I]);
}		}
buildTree_rec(VectorOperands[NumOps - 1], Depth + 1, {TE, 0});		buildTree_rec(VectorOperands[NumOps - 1], Depth + 1, {TE, 0});
return;		return;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Check that a vectorized load would load the same memory as a scalar		// Check that a vectorized load would load the same memory as a scalar
Show All 10 Lines	case Instruction::Load: {
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");
return;		return;
}		}

// Make sure all loads in the bundle are simple - we can't vectorize		// Make sure all loads in the bundle are simple - we can't vectorize
// atomic or volatile loads.		// atomic or volatile loads.
SmallVector<Value *, 4> PointerOps(VL.size());		SmallVector<Value *, 4> PointerOps(NumberOfInstructions);
auto POIter = PointerOps.begin();		OrdersType OriginalOrder(NumberOfInstructions, 0);
for (Value *V : VL) {		auto *POIter = PointerOps.begin();
auto *L = cast<LoadInst>(V);		auto *OOIter = OriginalOrder.begin();
		bool IsOOIdentity = true;
		for (int I = 0, E = VL.size(); I < E; ++I) {
		if (isa<UndefValue>(VL[I]))
		continue;
		auto *L = cast<LoadInst>(VL[I]);
if (!L->isSimple()) {		if (!L->isSimple()) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
return;		return;
}		}
*POIter = L->getPointerOperand();		*POIter = L->getPointerOperand();
++POIter;		++POIter;
		*OOIter = I;
		IsOOIdentity \|= std::distance(OriginalOrder.begin(), OOIter) == I;
		++OOIter;
}		}

OrdersType CurrentOrder;		OrdersType CurrentOrder;
// Check the order of pointer operands.		// Check the order of pointer operands.
if (llvm::sortPtrAccesses(PointerOps, ScalarTy, DL, SE, CurrentOrder)) {		if (llvm::sortPtrAccesses(PointerOps, ScalarTy, DL, SE, CurrentOrder)) {
Value *Ptr0;		Value *Ptr0;
Value *PtrN;		Value *PtrN;
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
Ptr0 = PointerOps.front();		Ptr0 = PointerOps.front();
PtrN = PointerOps.back();		PtrN = PointerOps.back();
} else {		} else {
Ptr0 = PointerOps[CurrentOrder.front()];		Ptr0 = PointerOps[CurrentOrder.front()];
PtrN = PointerOps[CurrentOrder.back()];		PtrN = PointerOps[CurrentOrder.back()];
}		}
Optional<int> Diff = getPointersDiff(		Optional<int> Diff = getPointersDiff(
ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);		ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
// Check that the sorted loads are consecutive.		// Check that the sorted loads are consecutive.
if (static_cast<unsigned>(*Diff) == VL.size() - 1) {		int AcceptableDiff = NumberOfInstructions - 1;
		Align CommonAlign = cast<LoadInst>(VL0)->getAlign();
		vdmitrieUnsubmitted Not Done Reply Inline Actions This check is not quite complete. If we for example have following scalars set (VL) 0: load i32 from p[0] 1: load i32 from p[2] 3: undef i32 4: undef i32 (note that p[1] is not loaded) Pointers difference is 8, number of instructions is 2 and VL size is 4: thus 8 <= (4 -1)4 is true but pointers actually not loaded consecutively (although It is vectorizeable via masked load+shuffle but support seems not implemented yet). Similar issue exists for store. vdmitrie:* This check is not quite complete. If we for example have following scalars set (VL) 0: load…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Hmm, see lines 4574-4600 (masked load + shuffle) and 4643-4678 (shuffle + masked store) ABataev: Hmm, see lines 4574-4600 (masked load + shuffle) and 4643-4678 (shuffle + masked store)
		vdmitrieUnsubmitted Not Done Reply Inline Actions Note that two is a power of two. Thus at 4569 it takes path that creates plain load and ends up with loading p[0] + p[1]. And even if we would go masked load+shuffle path that not correct either. Mask and shuffle there being built based on undefs rather than pointer analysis of scalar loads. In order to end up with loading p[0] and p2[] VL should look like: 0: load p[0] 1: undef 2: load p[2] 3: undef vdmitrie: Note that two is a power of two. Thus at 4569 it takes path that creates plain load and ends up…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah, yes. Will check this carefully. ABataev: Ah, yes. Will check this carefully.
		if (!CurrentOrder.empty())
		CommonAlign = cast<LoadInst>(VL[OriginalOrder[CurrentOrder.front()]])
		->getAlign();
		unsigned Sz = DL->getTypeStoreSize(ScalarTy);
		if (Diff && *Diff >= AcceptableDiff &&
		*Diff <= static_cast<int>(VL.size() - 1) &&
		(TTI->isLegalMaskedLoad(
		FixedVectorType::get(ScalarTy, PowerOf2Ceil(*Diff + 1)),
		CommonAlign) \|\|
		isPowerOf2_32(
		std::min(PowerOf2Ceil(*Diff + 1),
		alignTo((Diff + 1) Sz, CommonAlign) / Sz)))) {
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
// Original loads are consecutive and does not require reordering.		if (*Diff == AcceptableDiff && IsOOIdentity) {
++NumOpsWantToKeepOriginalOrder;		// Original loads are consecutive and do not require reordering.
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S,
UserTreeIdx, ReuseShuffleIndicies);		UserTreeIdx, ReuseShuffleIndicies);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
		} else {
		OrdersType NormalizedOrder(VL.size(), VL.size());
		for (int I = 0, E = OriginalOrder.size(); I < E; ++I) {
		NormalizedOrder[*getPointersDiff(ScalarTy, Ptr0, ScalarTy,
		PointerOps[I], DL, SE)] =
		OriginalOrder[I];
		}
		// Need to extend.
		TreeEntry *TE =
		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
		ReuseShuffleIndicies, NormalizedOrder);
		TE->setOperandsInOrder(VL0);
		}
		// Count orders of non-gathered loads only.
		if ((UserTreeIdx.UserTE \|\| Depth == 0) &&
		!all_of(InstructionsOnly,
		[this](Value *V) { return MustGather.contains(V); }))
		++NumOpsWantToKeepOriginalOrder;
LLVM_DEBUG(dbgs() << "SLP: added a vector of loads.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of loads.\n");
} else {		} else {
		OrdersType NormalizedOrder(VL.size(), VL.size());
		SmallVector<int, 4> Orders(CurrentOrder.size());
		inversePermutation(CurrentOrder, Orders);
		for (int I = 0, E = CurrentOrder.size(); I < E; ++I) {
		NormalizedOrder[*getPointersDiff(
		ScalarTy, Ptr0, ScalarTy, PointerOps[Orders[I]], DL, SE)] =
		OriginalOrder[Orders[I]];
		}
// Need to reorder.		// Need to reorder.
TreeEntry *TE =		TreeEntry *TE =
newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies, CurrentOrder);		ReuseShuffleIndicies, NormalizedOrder);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled loads.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled loads.\n");
findRootOrder(CurrentOrder);		// No need to reorder if still need to shuffle reuses.
++NumOpsWantToKeepOrder[CurrentOrder];		if (ReuseShuffleIndicies.empty()) {
		findRootOrder(NormalizedOrder);
		++NumOpsWantToKeepOrder[NormalizedOrder];
		} else {
		++NumOpsWantToKeepOriginalOrder;
		}
}		}
return;		return;
}		}
Align CommonAlignment = cast<LoadInst>(VL0)->getAlign();		Align CommonAlignment = cast<LoadInst>(VL0)->getAlign();
for (Value *V : VL)		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
CommonAlignment =		CommonAlignment =
commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
		}
if (TTI->isLegalMaskedGather(FixedVectorType::get(ScalarTy, VL.size()),		if (TTI->isLegalMaskedGather(FixedVectorType::get(ScalarTy, VL.size()),
CommonAlignment)) {		CommonAlignment)) {
// Vectorizing non-consecutive loads with `llvm.masked.gather`.		// Vectorizing non-consecutive loads with `llvm.masked.gather`.
TreeEntry *TE = newTreeEntry(VL, TreeEntry::ScatterVectorize, Bundle,		TreeEntry *TE = newTreeEntry(VL, TreeEntry::ScatterVectorize, Bundle,
S, UserTreeIdx, ReuseShuffleIndicies);		S, UserTreeIdx, ReuseShuffleIndicies);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
		PointerOps.append(
		VL.size() - NumberOfInstructions,
		UndefValue::get(cast<LoadInst>(VL0)->getPointerOperandType()));
buildTree_rec(PointerOps, Depth + 1, {TE, 0});		buildTree_rec(PointerOps, Depth + 1, {TE, 0});
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: added a vector of non-consecutive loads.\n");		<< "SLP: added a vector of non-consecutive loads.\n");
return;		return;
}		}
}		}

LLVM_DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");
Show All 10 Lines	switch (ShuffleOrOp) {
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
Type *Ty = cast<Instruction>(V)->getOperand(0)->getType();		Type *Ty = cast<Instruction>(V)->getOperand(0)->getType();
if (Ty != SrcTy \|\| !isValidElementType(Ty)) {		if (Ty != SrcTy \|\| !isValidElementType(Ty)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: Gathering casts with different src types.\n");		<< "SLP: Gathering casts with different src types.\n");
return;		return;
}		}
}		}
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of casts.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of casts.\n");

TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<Instruction>(V)->getOperand(i));		Operands.push_back(isa<PoisonValue>(V) ? PoisonValue::get(SrcTy)
		: isa<UndefValue>(V)
		? UndefValue::get(SrcTy)
		: cast<Instruction>(V)->getOperand(i));

buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
// Check that all of the compares have the same predicate.		// Check that all of the compares have the same predicate.
CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
CmpInst::Predicate SwapP0 = CmpInst::getSwappedPredicate(P0);		CmpInst::Predicate SwapP0 = CmpInst::getSwappedPredicate(P0);
Type *ComparedTy = VL0->getOperand(0)->getType();		Type *ComparedTy = VL0->getOperand(0)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
CmpInst *Cmp = cast<CmpInst>(V);		auto *Cmp = cast<CmpInst>(V);
if ((Cmp->getPredicate() != P0 && Cmp->getPredicate() != SwapP0) \|\|		if ((Cmp->getPredicate() != P0 && Cmp->getPredicate() != SwapP0) \|\|
Cmp->getOperand(0)->getType() != ComparedTy) {		Cmp->getOperand(0)->getType() != ComparedTy) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: Gathering cmp with different predicate.\n");		<< "SLP: Gathering cmp with different predicate.\n");
return;		return;
}		}
}		}

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of compares.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of compares.\n");

ValueList Left, Right;		ValueList Left, Right;
if (cast<CmpInst>(VL0)->isCommutative()) {		if (cast<CmpInst>(VL0)->isCommutative()) {
// Commutative predicate - collect + sort operands of the instructions		// Commutative predicate - collect + sort operands of the instructions
// so that each side is more likely to have the same opcode.		// so that each side is more likely to have the same opcode.
assert(P0 == SwapP0 && "Commutative Predicate mismatch");		assert(P0 == SwapP0 && "Commutative Predicate mismatch");
reorderInputsAccordingToOpcode(VL, Left, Right, DL, SE, *this);		reorderInputsAccordingToOpcode(VL0, VL, Left, Right, DL, SE, this);
} else {		} else {
// Collect operands - commute if it uses the swapped predicate.		// Collect operands - commute if it uses the swapped predicate.
for (Value *V : VL) {		for (Value *V : VL) {
		if (isa<PoisonValue>(V)) {
		Left.push_back(PoisonValue::get(VL0->getOperand(0)->getType()));
		Right.push_back(PoisonValue::get(VL0->getOperand(1)->getType()));
		continue;
		}
		if (isa<UndefValue>(V)) {
		Left.push_back(UndefValue::get(VL0->getOperand(0)->getType()));
		Right.push_back(UndefValue::get(VL0->getOperand(1)->getType()));
		continue;
		}
auto *Cmp = cast<CmpInst>(V);		auto *Cmp = cast<CmpInst>(V);
Value *LHS = Cmp->getOperand(0);		Value *LHS = Cmp->getOperand(0);
Value *RHS = Cmp->getOperand(1);		Value *RHS = Cmp->getOperand(1);
if (Cmp->getPredicate() != P0)		if (Cmp->getPredicate() != P0)
std::swap(LHS, RHS);		std::swap(LHS, RHS);
Left.push_back(LHS);		Left.push_back(LHS);
Right.push_back(RHS);		Right.push_back(RHS);
}		}
Show All 27 Lines	case Instruction::Xor: {
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of un/bin op.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of un/bin op.\n");

// Sort operands of the instructions so that each side is more likely to		// Sort operands of the instructions so that each side is more likely to
// have the same opcode.		// have the same opcode.
if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {		if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {
ValueList Left, Right;		ValueList Left, Right;
reorderInputsAccordingToOpcode(VL, Left, Right, DL, SE, *this);		reorderInputsAccordingToOpcode(VL0, VL, Left, Right, DL, SE, this);
TE->setOperand(0, Left);		TE->setOperand(0, Left);
TE->setOperand(1, Right);		TE->setOperand(1, Right);
buildTree_rec(Left, Depth + 1, {TE, 0});		buildTree_rec(Left, Depth + 1, {TE, 0});
buildTree_rec(Right, Depth + 1, {TE, 1});		buildTree_rec(Right, Depth + 1, {TE, 1});
return;		return;
}		}

TE->setOperandsInOrder();		SmallVector<ValueList, 2> OperandsVec;
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned I = 0, E = VL0->getNumOperands(); I < E; ++I) {
ValueList Operands;		ValueList Operands;
		Value *DefinedOp = nullptr;
		// Cannot use undef for int div/rem, use the last real value instead.
		if (BinaryOperator::isIntDivRem(ShuffleOrOp)) {
		const auto It = find_if(VL, [I](Value V) {
		return isa<Instruction>(V) &&
		!isa<UndefValue>(cast<Instruction>(V)->getOperand(I));
		});
		if (It != VL.end())
		DefinedOp = cast<Instruction>(*It)->getOperand(I);
		}
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL.slice(
Operands.push_back(cast<Instruction>(V)->getOperand(i));		0, PowerOf2Ceil(std::distance(
		VL.begin(),
buildTree_rec(Operands, Depth + 1, {TE, i});		find_if(reverse(VL), Instruction::classof).base())))) {
		Value *OpV;
		if (isa<UndefValue>(V)) {
		if (BinaryOperator::isIntDivRem(ShuffleOrOp) && DefinedOp)
		OpV = DefinedOp;
		else
		OpV = isa<PoisonValue>(V)
		? PoisonValue::get(VL0->getOperand(I)->getType())
		: UndefValue::get(VL0->getOperand(I)->getType());
		} else {
		OpV = cast<Instruction>(V)->getOperand(I);
		if (isa<UndefValue>(OpV) &&
		BinaryOperator::isIntDivRem(ShuffleOrOp) && DefinedOp)
		OpV = DefinedOp;
		}
		Operands.push_back(OpV);
		}
		Operands.append(VL.size() - Operands.size(),
		UndefValue::get(VL0->getOperand(I)->getType()));
		TE->setOperand(I, Operands);
		OperandsVec.push_back(Operands);
}		}
		for (unsigned OpIdx = 0, OpE = OperandsVec.size(); OpIdx != OpE; ++OpIdx)
		vdmitrieUnsubmitted Not Done Reply Inline Actions Here is the case https://reviews.llvm.org/D75296 is trying to prevent. vdmitrie: Here is the case https://reviews.llvm.org/D75296 is trying to prevent.
		buildTree_rec(OperandsVec[OpIdx], Depth + 1, {TE, OpIdx});
return;		return;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
// We don't combine GEPs with complicated (nested) indexing.		// We don't combine GEPs with complicated (nested) indexing.
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
if (cast<Instruction>(V)->getNumOperands() != 2) {		if (cast<Instruction>(V)->getNumOperands() != 2) {
LLVM_DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");		LLVM_DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}

// We can't combine several GEPs into one vector if they operate on		// We can't combine several GEPs into one vector if they operate on
// different types.		// different types.
Type *Ty0 = VL0->getOperand(0)->getType();		Type *Ty0 = VL0->getOperand(0)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
Type *CurTy = cast<Instruction>(V)->getOperand(0)->getType();		Type *CurTy = cast<Instruction>(V)->getOperand(0)->getType();
if (Ty0 != CurTy) {		if (Ty0 != CurTy) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: not-vectorizable GEP (different types).\n");		<< "SLP: not-vectorizable GEP (different types).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}

// We don't combine GEPs with non-constant indexes.		// We don't combine GEPs with non-constant indexes.
Type *Ty1 = VL0->getOperand(1)->getType();		Type *Ty1 = VL0->getOperand(1)->getType();
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
auto Op = cast<Instruction>(V)->getOperand(1);		auto *Op = cast<Instruction>(V)->getOperand(1);
if (!isa<ConstantInt>(Op) \|\|		if (!isa<ConstantInt>(Op) \|\|
(Op->getType() != Ty1 &&		(Op->getType() != Ty1 &&
Op->getType()->getScalarSizeInBits() >		Op->getType()->getScalarSizeInBits() >
DL->getIndexSizeInBits(		DL->getIndexSizeInBits(
V->getType()->getPointerAddressSpace()))) {		V->getType()->getPointerAddressSpace()))) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: not-vectorizable GEP (non-constant indexes).\n");		<< "SLP: not-vectorizable GEP (non-constant indexes).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = 2; i < e; ++i) {		for (unsigned i = 0, e = 2; i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<Instruction>(V)->getOperand(i));		Operands.push_back(
		isa<PoisonValue>(V)
		? PoisonValue::get(VL0->getOperand(i)->getType())
		: isa<UndefValue>(V)
		? UndefValue::get(VL0->getOperand(i)->getType())
		: cast<Instruction>(V)->getOperand(i));

buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
case Instruction::Store: {		case Instruction::Store: {
// Check if the stores are consecutive or if we need to swizzle them.		// Check if the stores are consecutive or if we need to swizzle them.
llvm::Type *ScalarTy = cast<StoreInst>(VL0)->getValueOperand()->getType();		llvm::Type *ScalarTy = cast<StoreInst>(VL0)->getValueOperand()->getType();
// Avoid types that are padded when being allocated as scalars, while		// Avoid types that are padded when being allocated as scalars, while
// being packed together in a vector (such as i1).		// being packed together in a vector (such as i1).
if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
DL->getTypeAllocSizeInBits(ScalarTy)) {		DL->getTypeAllocSizeInBits(ScalarTy)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering stores of non-packed type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering stores of non-packed type.\n");
return;		return;
}		}
// Make sure all stores in the bundle are simple - we can't vectorize		// Make sure all stores in the bundle are simple - we can't vectorize
// atomic or volatile stores.		// atomic or volatile stores.
SmallVector<Value *, 4> PointerOps(VL.size());		SmallVector<Value *, 4> PointerOps(NumberOfInstructions);
		OrdersType OriginalOrder(NumberOfInstructions, 0);
ValueList Operands(VL.size());		ValueList Operands(VL.size());
auto POIter = PointerOps.begin();		auto POIter = PointerOps.begin();
auto OIter = Operands.begin();		auto OIter = Operands.begin();
for (Value *V : VL) {		auto *OOIter = OriginalOrder.begin();
auto *SI = cast<StoreInst>(V);		bool IsOOIdentity = true;
		for (int I = 0, E = VL.size(); I < E; ++I) {
		if (isa<UndefValue>(VL[I])) {
		*OIter = isa<PoisonValue>(VL[I])
		? PoisonValue::get(VL0->getOperand(0)->getType())
		: UndefValue::get(VL0->getOperand(0)->getType());
		++OIter;
		IsOOIdentity = I == 0;
		continue;
		}
		auto *SI = cast<StoreInst>(VL[I]);
if (!SI->isSimple()) {		if (!SI->isSimple()) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple stores.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple stores.\n");
return;		return;
}		}
*POIter = SI->getPointerOperand();		*POIter = SI->getPointerOperand();
*OIter = SI->getValueOperand();		*OIter = SI->getValueOperand();
		*OOIter = I;
++POIter;		++POIter;
++OIter;		++OIter;
		++OOIter;
}		}

OrdersType CurrentOrder;		OrdersType CurrentOrder;
		if (!llvm::sortPtrAccesses(PointerOps, ScalarTy, DL, SE,
		CurrentOrder)) {
		spatelUnsubmitted Not Done Reply Inline Actions Use isValidElementType() or check for undef directly? I still can't tell from the debug statement exactly what we are guarding against. Should the type check already be here even without this patch? spatel: Use isValidElementType() or check for undef directly? I still can't tell from the debug…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I was just trying to protect the code and try to support it only for simple types at first. There are some doubts that the cost for masked loads/stores is completed and I protected it to make it work only for simple types. I can remove this check if the cost model for masked ops is good enough. ABataev: I was just trying to protect the code and try to support it only for simple types at first.
		RKSimonUnsubmitted Not Done Reply Inline Actions masked load/store costs for constant masks should be good enough now (getScalarizationOverhead should now provide us with a reasonable fallback) RKSimon: masked load/store costs for constant masks should be good enough now (getScalarizationOverhead…
		BS.cancelScheduling(VL, VL0);
		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
		ReuseShuffleIndicies);
		LLVM_DEBUG(dbgs() << "SLP: Non-consecutive store.\n");
		vdmitrieUnsubmitted Not Done Reply Inline Actions "Non-consecutive" here is not the actual reason. vdmitrie: "Non-consecutive" here is not the actual reason.
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions Well, "unsortable" or "unprocessable" term would be more precise. But why did we change `if (sortPtrAccesses)...` to opposite condition? This change just duplicate debug output, since we didn't differentiate it. Also I'd prefer to see the same `if-else` structure as for the load case. anton-afanasyev: Well, "unsortable" or "unprocessable" term would be more precise. But why did we change `if…
		return;
		}
// Check the order of pointer operands.		// Check the order of pointer operands.
if (llvm::sortPtrAccesses(PointerOps, ScalarTy, DL, SE, CurrentOrder)) {
Value *Ptr0;		Value *Ptr0;
		vdmitrieUnsubmitted Not Done Reply Inline Actions If we have for example this sequence: store addr[2] store addr[0] store addr[1] undef then we bypass sorting pointers and end up vectorizing this store sequence with incorrect order. vdmitrie: If we have for example this sequence: store addr[2] store addr[0] store addr[1] undef then…
Value *PtrN;		Value *PtrN;
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
Ptr0 = PointerOps.front();		Ptr0 = PointerOps.front();
PtrN = PointerOps.back();		PtrN = PointerOps.back();
} else {		} else {
Ptr0 = PointerOps[CurrentOrder.front()];		Ptr0 = PointerOps[CurrentOrder.front()];
PtrN = PointerOps[CurrentOrder.back()];		PtrN = PointerOps[CurrentOrder.back()];
}		}
Optional<int> Dist =		Optional<int> Dist =
getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);		getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
// Check that the sorted pointer operands are consecutive.		// Check that the sorted pointer operands are consecutive.
if (static_cast<unsigned>(*Dist) == VL.size() - 1) {		int NormalizedSize = NumberOfInstructions - 1;
		if (Dist && *Dist >= NormalizedSize &&
		*Dist <= static_cast<int>(VL.size() - 1)) {
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
		TreeEntry *TE;
		if (NumberOfInstructions == VL.size() && IsOOIdentity) {
// Original stores are consecutive and does not require reordering.		// Original stores are consecutive and does not require reordering.
++NumOpsWantToKeepOriginalOrder;		TE = newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S,		ReuseShuffleIndicies);
UserTreeIdx, ReuseShuffleIndicies);		} else {
TE->setOperandsInOrder();		// Need to extend.
		OrdersType NormalizedOrder(VL.size(), VL.size());
		for (int I = 0, E = OriginalOrder.size(); I < E; ++I) {
		NormalizedOrder[*getPointersDiff(ScalarTy, Ptr0, ScalarTy,
		PointerOps[I], DL, SE)] =
		OriginalOrder[I];
		}
		TE = newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
		ReuseShuffleIndicies, NormalizedOrder);
		}
		TE->setOperandsInOrder(VL0);
buildTree_rec(Operands, Depth + 1, {TE, 0});		buildTree_rec(Operands, Depth + 1, {TE, 0});
		++NumOpsWantToKeepOriginalOrder;
LLVM_DEBUG(dbgs() << "SLP: added a vector of stores.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of stores.\n");
} else {		} else {
		OrdersType NormalizedOrder(VL.size(), VL.size());
		SmallVector<int, 4> Orders(CurrentOrder.size());
		inversePermutation(CurrentOrder, Orders);
		for (int I = 0, E = CurrentOrder.size(); I < E; ++I) {
		NormalizedOrder[*getPointersDiff(ScalarTy, Ptr0, ScalarTy,
		PointerOps[Orders[I]], DL, SE)] =
		OriginalOrder[Orders[I]];
		}
TreeEntry *TE =		TreeEntry *TE =
newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,		newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies, CurrentOrder);		ReuseShuffleIndicies, NormalizedOrder);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
buildTree_rec(Operands, Depth + 1, {TE, 0});		buildTree_rec(Operands, Depth + 1, {TE, 0});
LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled stores.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of jumbled stores.\n");
findRootOrder(CurrentOrder);		// No need to reorder if still need to shuffle reuses.
++NumOpsWantToKeepOrder[CurrentOrder];		if (ReuseShuffleIndicies.empty()) {
		findRootOrder(NormalizedOrder);
		++NumOpsWantToKeepOrder[NormalizedOrder];
		} else {
		++NumOpsWantToKeepOriginalOrder;
}		}
return;
}		}
		return;
}		}

BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Non-consecutive store.\n");		LLVM_DEBUG(dbgs() << "SLP: Non-consecutive store.\n");
return;		return;
}		}
case Instruction::Call: {		case Instruction::Call: {
// Check if the calls are all to the same vectorizable intrinsic or		// Check if the calls are all to the same vectorizable intrinsic or
// library function.		// library function.
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

VFShape Shape = VFShape::get(		VFShape Shape =
*CI, ElementCount::getFixed(static_cast<unsigned int>(VL.size())),		VFShape::get(*CI,
		ElementCount::getFixed(static_cast<unsigned int>(
		PowerOf2Ceil(NumberOfInstructions))),
false /HasGlobalPred/);		false /HasGlobalPred/);
Function VecFunc = VFDatabase(CI).getVectorizedFunction(Shape);		Function VecFunc = VFDatabase(CI).getVectorizedFunction(Shape);

if (!VecFunc && !isTriviallyVectorizable(ID)) {		if (!VecFunc && !isTriviallyVectorizable(ID)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");		LLVM_DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");
return;		return;
}		}
Function *F = CI->getCalledFunction();		Function *F = CI->getCalledFunction();
unsigned NumArgs = CI->getNumArgOperands();		unsigned NumArgs = CI->getNumArgOperands();
SmallVector<Value*, 4> ScalarArgs(NumArgs, nullptr);		SmallVector<Value*, 4> ScalarArgs(NumArgs, nullptr);
for (unsigned j = 0; j != NumArgs; ++j)		for (unsigned j = 0; j != NumArgs; ++j)
if (hasVectorInstrinsicScalarOpd(ID, j))		if (hasVectorInstrinsicScalarOpd(ID, j))
ScalarArgs[j] = CI->getArgOperand(j);		ScalarArgs[j] = CI->getArgOperand(j);
for (Value *V : VL) {		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
CallInst *CI2 = dyn_cast<CallInst>(V);		CallInst *CI2 = dyn_cast<CallInst>(V);
if (!CI2 \|\| CI2->getCalledFunction() != F \|\|		if (!CI2 \|\| CI2->getCalledFunction() != F \|\|
getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|		getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|
(VecFunc &&		(VecFunc &&
VecFunc != VFDatabase(*CI2).getVectorizedFunction(Shape)) \|\|		VecFunc != VFDatabase(*CI2).getVectorizedFunction(Shape)) \|\|
!CI->hasIdenticalOperandBundleSchema(*CI2)) {		!CI->hasIdenticalOperandBundleSchema(*CI2)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << V		LLVM_DEBUG(dbgs()
<< "\n");		<< "SLP: mismatched calls:" << CI << "!=" << V << "\n");
return;		return;
}		}
// Some intrinsics have scalar arguments and should be same in order for		// Some intrinsics have scalar arguments and should be same in order for
// them to be vectorized.		// them to be vectorized.
for (unsigned j = 0; j != NumArgs; ++j) {		for (unsigned j = 0; j != NumArgs; ++j) {
if (hasVectorInstrinsicScalarOpd(ID, j)) {		if (hasVectorInstrinsicScalarOpd(ID, j)) {
Value *A1J = CI2->getArgOperand(j);		Value *A1J = CI2->getArgOperand(j);
if (ScalarArgs[j] != A1J) {		if (ScalarArgs[j] != A1J) {
Show All 15 Lines	case Instruction::Call: {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:"		LLVM_DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:"
<< CI << "!=" << V << '\n');		<< CI << "!=" << V << '\n');
return;		return;
}		}
}		}
		SmallVector<Value *, 4> NormalizedCalls(VL.size(),
		UndefValue::get(CI->getType()));
		copy(VL, NormalizedCalls.begin());
		for (int I = NumberOfInstructions, E = PowerOf2Ceil(NumberOfInstructions);
		I < E; ++I)
		NormalizedCalls[I] = CI;

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {		for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL) {		for (Value *V : NormalizedCalls) {
		if (isa<PoisonValue>(V)) {
		Operands.push_back(PoisonValue::get(CI->getOperand(i)->getType()));
		continue;
		}
		if (isa<UndefValue>(V)) {
		Operands.push_back(UndefValue::get(CI->getOperand(i)->getType()));
		continue;
		}
auto *CI2 = cast<CallInst>(V);		auto *CI2 = cast<CallInst>(V);
Operands.push_back(CI2->getArgOperand(i));		Operands.push_back(CI2->getArgOperand(i));
}		}
buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
// If this is not an alternate sequence of opcode like add-sub		// If this is not an alternate sequence of opcode like add-sub
// then do not vectorize this instruction.		// then do not vectorize this instruction.
if (!S.isAltShuffle()) {		if (!S.isAltShuffle()) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");		LLVM_DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");
return;		return;
}		}
TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");		LLVM_DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");

// Reorder operands if reordering would enable vectorization.		// Reorder operands if reordering would enable vectorization.
if (isa<BinaryOperator>(VL0)) {		if (isa<BinaryOperator>(VL0)) {
ValueList Left, Right;		ValueList Left, Right;
reorderInputsAccordingToOpcode(VL, Left, Right, DL, SE, *this);		reorderInputsAccordingToOpcode(VL0, VL, Left, Right, DL, SE, this);
TE->setOperand(0, Left);		TE->setOperand(0, Left);
TE->setOperand(1, Right);		TE->setOperand(1, Right);
buildTree_rec(Left, Depth + 1, {TE, 0});		buildTree_rec(Left, Depth + 1, {TE, 0});
buildTree_rec(Right, Depth + 1, {TE, 1});		buildTree_rec(Right, Depth + 1, {TE, 1});
return;		return;
}		}

TE->setOperandsInOrder();		TE->setOperandsInOrder(VL0);
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<Instruction>(V)->getOperand(i));		Operands.push_back(
		isa<PoisonValue>(V)
		? PoisonValue::get(VL0->getOperand(i)->getType())
		: isa<UndefValue>(V)
		? UndefValue::get(VL0->getOperand(i)->getType())
		: cast<Instruction>(V)->getOperand(i));

buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, i});
}		}
return;		return;
}		}
default:		default:
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (E0->getOpcode() == Instruction::ExtractValue) {
// Check if load can be rewritten as load of vector.		// Check if load can be rewritten as load of vector.
LoadInst *LI = dyn_cast<LoadInst>(Vec);		LoadInst *LI = dyn_cast<LoadInst>(Vec);
if (!LI \|\| !LI->isSimple() \|\| !LI->hasNUses(VL.size()))		if (!LI \|\| !LI->isSimple() \|\| !LI->hasNUses(VL.size()))
return false;		return false;
} else {		} else {
NElts = cast<FixedVectorType>(Vec->getType())->getNumElements();		NElts = cast<FixedVectorType>(Vec->getType())->getNumElements();
}		}

if (NElts != VL.size())		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
return false;		const unsigned NumOfInstructions =
		std::distance(InstructionsOnly.begin(), InstructionsOnly.end());

// Check that all of the indices extract from the correct offset.		// Check that all of the indices extract from the correct offset.
bool ShouldKeepOrder = true;		bool ShouldKeepOrder = true;
unsigned E = VL.size();		unsigned E = VL.size();
// Assign to all items the initial value E + 1 so we can check if the extract		// Assign to all items the initial value E so we can check if the extract
// instruction index was used already.		// instruction index was used already.
// Also, later we can check that all the indices are used and we have a		// Also, later we can check that all the indices are used and we have a
// consecutive access in the extract instructions, by checking that no		// consecutive access in the extract instructions, by checking that no
// element of CurrentOrder still has value E + 1.		// element of CurrentOrder still has value E.
CurrentOrder.assign(E, E + 1);		CurrentOrder.assign(E, E);
unsigned I = 0;		unsigned I = 0;
for (; I < E; ++I) {		auto II = InstructionsOnly.begin();
		vdmitrieUnsubmitted Not Done Reply Inline Actions What is reasoning for this min? Imagine VL[0] and VL[1] are extracts of two subsequent elements from the same vector of size 2 and VL[2], VL[3] are extracts from another vector (which can even be of different size). NElts will be assigned 2 based on VL[0] while VL size is 4. The for loop at line 3300 will not visit 3th and 4th elements of the VL and final answer turns out "true" which is obviously incorrect as we must gather these extracts. vdmitrie: What is reasoning for this min? Imagine VL[0] and VL[1] are extracts of two subsequent elements…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Good catch, thanks! It is required to handle the case where 2 other elements are actually UndefvVslues. Just need to add a check for this here. ABataev: Good catch, thanks! It is required to handle the case where 2 other elements are actually…
auto *Inst = cast<Instruction>(VL[I]);		for (; I < NumOfInstructions; ++I, ++II) {
		auto Inst = cast<Instruction>(II);
if (Inst->getOperand(0) != Vec)		if (Inst->getOperand(0) != Vec)
break;		break;
Optional<unsigned> Idx = getExtractIndex(Inst);		Optional<unsigned> Idx = getExtractIndex(Inst);
if (!Idx)		if (!Idx)
break;		break;
const unsigned ExtIdx = *Idx;		const unsigned ExtIdx = *Idx;
		if (ExtIdx >= E)
		break;
if (ExtIdx != I) {		if (ExtIdx != I) {
if (ExtIdx >= E \|\| CurrentOrder[ExtIdx] != E + 1)		if (CurrentOrder[ExtIdx] != E)
break;		break;
ShouldKeepOrder = false;		ShouldKeepOrder = false;
CurrentOrder[ExtIdx] = I;		CurrentOrder[ExtIdx] = I;
} else {		} else {
if (CurrentOrder[I] != E + 1)		if (CurrentOrder[I] != E)
break;		break;
CurrentOrder[I] = I;		CurrentOrder[I] = I;
}		}
}		}
if (I < E) {		if (I < NumOfInstructions) {
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions Comment typo: `aggrgate`. anton-afanasyev: Comment typo: `aggrgate`.
CurrentOrder.clear();		CurrentOrder.clear();
return false;		return false;
}		}

return ShouldKeepOrder;		return ShouldKeepOrder;
}		}

bool BoUpSLP::areAllUsersVectorized(Instruction *I,		bool BoUpSLP::areAllUsersVectorized(Instruction *I,
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	computeExtractCost(ArrayRef<Value > VL, FixedVectorType VecTy,
bool AllConsecutive = true;		bool AllConsecutive = true;
unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;		unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;
unsigned Idx = -1;		unsigned Idx = -1;
InstructionCost Cost = 0;		InstructionCost Cost = 0;

// Process extracts in blocks of EltsPerVector to check if the source vector		// Process extracts in blocks of EltsPerVector to check if the source vector
// operand can be re-used directly. If not, add the cost of creating a shuffle		// operand can be re-used directly. If not, add the cost of creating a shuffle
// to extract the values into a vector register.		// to extract the values into a vector register.
		unsigned CurrentIdx = INT_MAX, PrevIdx;
for (auto *V : VL) {		for (auto *V : VL) {
++Idx;		++Idx;
		if (!isa<UndefValue>(V)) {
		PrevIdx = CurrentIdx;
		CurrentIdx = *getExtractIndex(cast<Instruction>(V));
// Reached the start of a new vector registers.		// Reached the start of a new vector registers.
if (Idx % EltsPerVector == 0) {		if (Idx % EltsPerVector == 0) {
AllConsecutive = true;		AllConsecutive = true;
continue;		continue;
}		}

// Check all extracts for a vector register on the target directly		// Check all extracts for a vector register on the target directly
// extract values in order.		// extract values in order.
unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));
unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));
AllConsecutive &= PrevIdx + 1 == CurrentIdx &&		AllConsecutive &= PrevIdx + 1 == CurrentIdx &&
CurrentIdx % EltsPerVector == Idx % EltsPerVector;		CurrentIdx % EltsPerVector == Idx % EltsPerVector;
		}
if (AllConsecutive)		if (AllConsecutive)
continue;		continue;

// Skip all indices, except for the last index per vector block.		// Skip all indices, except for the last index per vector block.
if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())		if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())
continue;		continue;

// If we have a series of extracts which are not consecutive and hence		// If we have a series of extracts which are not consecutive and hence
// cannot re-use the source vector register directly, compute the shuffle		// cannot re-use the source vector register directly, compute the shuffle
// cost to extract the a vector with EltsPerVector elements.		// cost to extract the a vector with EltsPerVector elements.
Cost += TTI.getShuffleCost(		Cost += TTI.getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc,		TargetTransformInfo::SK_PermuteSingleSrc,
FixedVectorType::get(VecTy->getElementType(), EltsPerVector));		FixedVectorType::get(VecTy->getElementType(), EltsPerVector));
}		}
return Cost;		return Cost;
}		}

		/// Returns the indecies for the first and the last instructions based on
		/// ordering.
		static std::pair<unsigned, unsigned>
		findMinMaxPos(ArrayRef<unsigned> ReorderedIndicies) {
		unsigned E = ReorderedIndicies.size();
		unsigned Min = E;
		unsigned Max = E;
		for (unsigned I = 0; I < E && (Min == E \|\| Max == E); ++I) {
		if (Min == E && ReorderedIndicies[I] < E)
		Min = I;
		if (Max == E && ReorderedIndicies[E - 1 - I] < E)
		Max = E - 1 - I;
		}
		return std::make_pair(Min, Max);
		}

		unsigned BoUpSLP::getEntryVF(const TreeEntry *E, SmallSet<unsigned, 4> &UserVFs,
		const TreeEntry *IE) {
		auto It = EntryVFs.find(E);
		if (It != EntryVFs.end())
		return It->second;
		auto &&GetVF = [](ArrayRef<Value *> Scalars,
		ArrayRef<unsigned> ReorderIndices,
		unsigned Opcode) -> unsigned {
		// For stores, the vectorization factor is the number of scalars, it is
		// aligned to the minimal/maximal size of the vector register.
		if (Opcode == Instruction::Store)
		return Scalars.size();
		unsigned NumValues =
		std::distance(Scalars.begin(), find_if(reverse(Scalars), [](Value *V) {
		return !isa<UndefValue>(V);
		}).base());
		if (!ReorderIndices.empty()) {
		unsigned MinPos, MaxPos;
		std::tie(MinPos, MaxPos) = findMinMaxPos(ReorderIndices);
		NumValues = std::max(NumValues, MaxPos + 1);
		}

		return PowerOf2Ceil(NumValues);
		};
		unsigned SelfVF = GetVF(E->Scalars, E->ReorderIndices, E->getOpcode());
		bool IsGather = E->State == TreeEntry::NeedToGather;
		EntryVFs.try_emplace(E, IsGather ? 0 : std::min<unsigned>(2, SelfVF));
		unsigned MinVF = E->Scalars.size();
		// Fill users vectorization factors to calculate shuffle cost correctly.
		for (const EdgeInfo &EI : E->UserTreeIndices) {
		if (!EI.UserTE \|\| EI.UserTE == IE)
		continue;
		SmallSet<unsigned, 4> UserUserVFs;
		if (unsigned UserVF = getEntryVF(EI.UserTE, UserUserVFs, IE)) {
		UserVFs.insert(UserVF);
		MinVF = std::max(std::min(MinVF, UserVF), SelfVF);
		}
		}
		if (SelfVF <= 1 \|\|
		(!IsGather && E->getNumOperands() < 1 && !UserVFs.contains(SelfVF)))
		SelfVF = std::max<unsigned>(2, MinVF);
		// if (IsGather && SelfVF < MinVF)
		// SelfVF = MinVF;
		EntryVFs[E] = SelfVF;
		return SelfVF;
		}

/// Shuffles \p Mask in accordance with the given \p SubMask.		/// Shuffles \p Mask in accordance with the given \p SubMask.
static void addMask(SmallVectorImpl<int> &Mask, ArrayRef<int> SubMask) {		static void addMask(SmallVectorImpl<int> &Mask, ArrayRef<int> SubMask) {
if (SubMask.empty())		if (SubMask.empty())
return;		return;
if (Mask.empty()) {		if (Mask.empty()) {
Mask.append(SubMask.begin(), SubMask.end());		Mask.append(SubMask.begin(), SubMask.end());
return;		return;
}		}
Show All 9 Lines	static void addMask(SmallVectorImpl<int> &Mask, ArrayRef<int> SubMask) {
}		}
Mask.swap(NewMask);		Mask.swap(NewMask);
}		}

InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,		InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
ArrayRef<Value *> VectorizedVals) {		ArrayRef<Value *> VectorizedVals) {
ArrayRef<Value*> VL = E->Scalars;		ArrayRef<Value*> VL = E->Scalars;

		SmallSet<unsigned, 4> UserVFs;
		// Original vectorization factor.
		unsigned SelfVF = getEntryVF(E, UserVFs, E);
		RKSimonUnsubmitted Not Done Reply Inline Actions can we use llvm::size(InstructionsOnly) ? RKSimon: can we use llvm::size(InstructionsOnly) ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, it does not work, `llvm::size` works only if it can be calculated in `O(1)`. Here it is not, since `InstructionsOnly` may have "holes". ABataev: No, it does not work, `llvm::size` works only if it can be calculated in `O(1)`. Here it is not…
		craig.topperUnsubmitted Done Reply Inline Actions Would using std::distance directly be more clear? You'd have to explicitly write begin()/end() though? craig.topper: Would using std::distance directly be more clear? You'd have to explicitly write begin()/end()…
		unsigned ShuffleVF = SelfVF;
		// Final vectorization factor after shuffling reuses.
		if (!E->ReuseShuffleIndices.empty()) {
		int Limit = VL.size();
		ShuffleVF = std::max<unsigned>(
		SelfVF, PowerOf2Ceil(std::distance(
		E->ReuseShuffleIndices.begin(),
		find_if(reverse(E->ReuseShuffleIndices), [Limit](int I) {
		return I < Limit;
		}).base())));
		}
		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
		const unsigned NumOfInstructions =
		std::distance(InstructionsOnly.begin(), InstructionsOnly.end());
		Value *V0 = nullptr;
Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		FixedVectorType *VecTy;
		FixedVectorType *FinalVecTy;
		if (!empty(InstructionsOnly)) {
		V0 = *InstructionsOnly.begin();
		if (StoreInst *SI = dyn_cast<StoreInst>(V0))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))		else if (CmpInst *CI = dyn_cast<CmpInst>(V0))
ScalarTy = CI->getOperand(0)->getType();		ScalarTy = CI->getOperand(0)->getType();
else if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))		else if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))
ScalarTy = IE->getOperand(1)->getType();		ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());
auto *FinalVecTy = VecTy;
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

// If we have computed a smaller type for the expression, update VecTy so		// If we have computed a smaller type for the expression, update VecTy so
// that the costs will be accurate.		// that the costs will be accurate.
if (MinBWs.count(VL[0]))		auto MinBWI = MinBWs.find(V0);
		if (MinBWI != MinBWs.end()) {
VecTy = FixedVectorType::get(		VecTy = FixedVectorType::get(
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());		IntegerType::get(F->getContext(), MinBWI->second.first), SelfVF);
		FinalVecTy = FixedVectorType::get(
		IntegerType::get(F->getContext(), MinBWI->second.first), ShuffleVF);
		} else {
		VecTy = FixedVectorType::get(ScalarTy, SelfVF);
		FinalVecTy = FixedVectorType::get(ScalarTy, ShuffleVF);
		}
		} else {
		VecTy = FixedVectorType::get(ScalarTy, SelfVF);
		FinalVecTy = FixedVectorType::get(ScalarTy, ShuffleVF);
		}
		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

unsigned ReuseShuffleNumbers = E->ReuseShuffleIndices.size();		unsigned ReuseShuffleNumbers = E->ReuseShuffleIndices.size();
bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
if (NeedToShuffleReuses)
FinalVecTy =
FixedVectorType::get(VecTy->getElementType(), ReuseShuffleNumbers);
// FIXME: it tries to fix a problem with MSVC buildbots.		// FIXME: it tries to fix a problem with MSVC buildbots.
TargetTransformInfo &TTIRef = *TTI;		TargetTransformInfo &TTIRef = *TTI;
auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,		auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, &InstructionsOnly,
VectorizedVals](InstructionCost &Cost,		VecTy, VectorizedVals](InstructionCost &Cost,
bool IsGather) {		bool IsGather) {
DenseMap<Value *, int> ExtractVectorsTys;		DenseMap<Value *, int> ExtractVectorsTys;
for (auto *V : VL) {		for (auto *V : InstructionsOnly) {
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|		if (!areAllUsersVectorized(cast<Instruction>(V), VectorizedVals) \|\|
(IsGather && ScalarToTreeEntry.count(V)))		(IsGather && ScalarToTreeEntry.count(V)))
continue;		continue;
auto *EE = cast<ExtractElementInst>(V);		auto *EE = cast<ExtractElementInst>(V);
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	for (const auto &Data : ExtractVectorsTys) {
Cost += TTIRef.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,		Cost += TTIRef.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,
VecTy, None, 0, EEVTy);		VecTy, None, 0, EEVTy);
}		}
}		}
};		};
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))		if (allConstant(VL))
return 0;		return 0;
if (isa<InsertElementInst>(VL[0]))		if (isa_and_nonnull<InsertElementInst>(V0))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<const TreeEntry *> Entries;		SmallVector<const TreeEntry *> Entries;
Optional<TargetTransformInfo::ShuffleKind> Shuffle =		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
isGatherShuffledEntry(E, Mask, Entries);		isGatherShuffledEntry(E, Mask, Entries);
if (Shuffle.hasValue()) {		if (Shuffle.hasValue()) {
InstructionCost GatherCost = 0;		InstructionCost GatherCost = 0;
if (ShuffleVectorInst::isIdentityMask(Mask)) {		if (ShuffleVectorInst::isIdentityMask(Mask)) {
// Perfect match in the graph, will reuse the previously vectorized		// Perfect match in the graph, will reuse the previously vectorized
Show All 19 Lines	if (Shuffle.hasValue()) {
return GatherCost;		return GatherCost;
}		}
if (isSplat(VL)) {		if (isSplat(VL)) {
// Found the broadcasting of the single scalar, calculate the cost as the		// Found the broadcasting of the single scalar, calculate the cost as the
// broadcast.		// broadcast.
return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);		return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
}		}
if (E->getOpcode() == Instruction::ExtractElement && allSameType(VL) &&		if (E->getOpcode() == Instruction::ExtractElement && allSameType(VL) &&
allSameBlock(VL) &&		allSameBlock(InstructionsOnly) &&
!isa<ScalableVectorType>(		!isa<ScalableVectorType>(
cast<ExtractElementInst>(E->getMainOp())->getVectorOperandType())) {		cast<ExtractElementInst>(E->getMainOp())->getVectorOperandType())) {
// Check that gather of extractelements can be represented as just a		// Check that gather of extractelements can be represented as just a
// shuffle of a single/two vectors the scalars are extracted from.		// shuffle of a single/two vectors the scalars are extracted from.
SmallVector<int> Mask;		SmallVector<int> Mask;
Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =		Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =
isShuffle(VL, Mask);		NumOfInstructions > 1
if (ShuffleKind.hasValue()) {		? isShuffle(llvm::to_vector<4>(InstructionsOnly), Mask)
		: None;
		if (NumOfInstructions == 1 \|\| ShuffleKind.hasValue()) {
// Found the bunch of extractelement instructions that must be gathered		// Found the bunch of extractelement instructions that must be gathered
// into a vector and can be represented as a permutation elements in a		// into a vector and can be represented as a permutation elements in a
// single input vector or of 2 input vectors.		// single input vector or of 2 input vectors.
InstructionCost Cost =		InstructionCost Cost = 0;
computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);		if (NumOfInstructions > 1)
		Cost = computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);
AdjustExtractsCost(Cost, /IsGather=/true);		AdjustExtractsCost(Cost, /IsGather=/true);
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);		FinalVecTy, E->ReuseShuffleIndices);
return Cost;		return Cost;
}		}
}		}
InstructionCost ReuseShuffleCost = 0;		InstructionCost ReuseShuffleCost = 0;
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
ReuseShuffleCost = TTI->getShuffleCost(		ReuseShuffleCost = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);		TTI::SK_PermuteSingleSrc, FinalVecTy, E->ReuseShuffleIndices);
return ReuseShuffleCost + getGatherCost(VL);		return ReuseShuffleCost + getGatherCost(VL, SelfVF);
}		}
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!E->ReorderIndices.empty()) {		if (!E->ReorderIndices.empty()) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
if (E->getOpcode() == Instruction::Store) {		if (E->getOpcode() == Instruction::Store) {
// For stores the order is actually a mask.		// For stores the order is actually a mask.
NewMask.resize(E->ReorderIndices.size());		NewMask.resize(E->ReorderIndices.size());
copy(E->ReorderIndices, NewMask.begin());		copy(E->ReorderIndices, NewMask.begin());
} else {		} else {
inversePermutation(E->ReorderIndices, NewMask);		inversePermutation(E->ReorderIndices, NewMask);
}		}
::addMask(Mask, NewMask);		::addMask(Mask, NewMask);
}		}
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
::addMask(Mask, E->ReuseShuffleIndices);		::addMask(Mask, E->ReuseShuffleIndices);
if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask))		if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask))
CommonCost =		CommonCost =
TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
assert(E->getOpcode() && allSameType(VL) && allSameBlock(VL) && "Invalid VL");		assert(E->getOpcode() && allSameType(VL) && allSameBlock(InstructionsOnly) &&
		"Invalid VL");
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI:		case Instruction::PHI:
return 0;		return 0;

case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
// The common cost of removal ExtractElement/ExtractValue instructions +		// The common cost of removal ExtractElement/ExtractValue instructions +
// the cost of shuffles, if required to resuffle the original vector.		// the cost of shuffles, if required to resuffle the original vector.
InstructionCost CommonCost = 0;		InstructionCost CommonCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
unsigned Idx = 0;		unsigned Idx = 0;
for (unsigned I : E->ReuseShuffleIndices) {		for (unsigned I : E->ReuseShuffleIndices) {
		if (I >= VL.size() \|\| isa<UndefValue>(VL[I]))
		continue;
if (ShuffleOrOp == Instruction::ExtractElement) {		if (ShuffleOrOp == Instruction::ExtractElement) {
auto *EE = cast<ExtractElementInst>(VL[I]);		auto *EE = cast<ExtractElementInst>(VL[I]);
CommonCost -= TTI->getVectorInstrCost(Instruction::ExtractElement,		CommonCost -= TTI->getVectorInstrCost(Instruction::ExtractElement,
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
EE->getVectorOperandType(),		EE->getVectorOperandType(),
*getExtractIndex(EE));		*getExtractIndex(EE));
} else {		} else {
CommonCost -= TTI->getVectorInstrCost(Instruction::ExtractElement,		CommonCost -= TTI->getVectorInstrCost(Instruction::ExtractElement,
VecTy, Idx);		VecTy, Idx);
++Idx;		++Idx;
}		}
}		}
Idx = ReuseShuffleNumbers;		Idx = ReuseShuffleNumbers;
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
if (ShuffleOrOp == Instruction::ExtractElement) {		if (ShuffleOrOp == Instruction::ExtractElement) {
auto *EE = cast<ExtractElementInst>(V);		auto *EE = cast<ExtractElementInst>(V);
CommonCost += TTI->getVectorInstrCost(Instruction::ExtractElement,		CommonCost += TTI->getVectorInstrCost(Instruction::ExtractElement,
EE->getVectorOperandType(),		EE->getVectorOperandType(),
*getExtractIndex(EE));		*getExtractIndex(EE));
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
} else {		} else {
--Idx;		--Idx;
CommonCost += TTI->getVectorInstrCost(Instruction::ExtractElement,		CommonCost += TTI->getVectorInstrCost(Instruction::ExtractElement,
VecTy, Idx);		VecTy, Idx);
}		}
}		}
}		}
		#ifndef NDEBUG
		OrdersType CurrentOrder;
		bool Reuse = canReuseExtract(VL, VL0, CurrentOrder);
		assert(Reuse && E->ReorderIndices.empty() \|\|
		(!Reuse && CurrentOrder.size() == E->ReorderIndices.size() &&
		std::equal(CurrentOrder.begin(), CurrentOrder.end(),
		E->ReorderIndices.begin())) &&
		"The sequence of extract elements must be reused or shuffled "
		"with the same mask.");
		#endif
if (ShuffleOrOp == Instruction::ExtractValue) {		if (ShuffleOrOp == Instruction::ExtractValue) {
for (unsigned I = 0, E = VL.size(); I < E; ++I) {		for (unsigned I = 0, E = VL.size(); I < E; ++I) {
		if (isa<UndefValue>(VL[I]))
		continue;
auto *EI = cast<Instruction>(VL[I]);		auto *EI = cast<Instruction>(VL[I]);
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EI->hasOneUse()) {		if (EI->hasOneUse()) {
Instruction *Ext = EI->user_back();		Instruction *Ext = EI->user_back();
if ((isa<SExtInst>(Ext) \|\| isa<ZExtInst>(Ext)) &&		if ((isa<SExtInst>(Ext) \|\| isa<ZExtInst>(Ext)) &&
all_of(Ext->users(),		all_of(Ext->users(),
[](User *U) { return isa<GetElementPtrInst>(U); })) {		[](User *U) { return isa<GetElementPtrInst>(U); })) {
// Use getExtractWithExtendCost() to calculate the cost of		// Use getExtractWithExtendCost() to calculate the cost of
// extractelement/ext pair.		// extractelement/ext pair.
CommonCost -= TTI->getExtractWithExtendCost(		CommonCost -= TTI->getExtractWithExtendCost(
Ext->getOpcode(), Ext->getType(), VecTy, I);		Ext->getOpcode(), Ext->getType(), VecTy, I);
// Add back the cost of s\|zext which is subtracted separately.		// Add back the cost of s\|zext which is subtracted separately.
CommonCost += TTI->getCastInstrCost(		CommonCost += TTI->getCastInstrCost(
Ext->getOpcode(), Ext->getType(), EI->getType(),		Ext->getOpcode(), Ext->getType(), EI->getType(),
TTI::getCastContextHint(Ext), CostKind, Ext);		TTI::getCastContextHint(Ext), CostKind, Ext);
continue;		continue;
}		}
}		}
CommonCost -=		CommonCost -=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);
}		}
} else {		} else {
AdjustExtractsCost(CommonCost, /IsGather=/false);		AdjustExtractsCost(CommonCost, /IsGather=/false);
}		}
return CommonCost;		return CommonCost;
		RKSimonUnsubmitted Not Done Reply Inline Actions duplicate cast RKSimon: duplicate cast
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
auto *SrcVecTy = cast<FixedVectorType>(VL0->getType());		auto *SrcVecTy = cast<FixedVectorType>(VL0->getType());

unsigned const NumElts = SrcVecTy->getNumElements();		unsigned const NumElts = SrcVecTy->getNumElements();
unsigned const NumScalars = VL.size();		unsigned const NumScalars = VL.size();
APInt DemandedElts = APInt::getNullValue(NumElts);		APInt DemandedElts = APInt::getNullValue(NumElts);
// TODO: Add support for Instruction::InsertValue.		// TODO: Add support for Instruction::InsertValue.
unsigned Offset = UINT_MAX;		unsigned Offset = UINT_MAX;
bool IsIdentity = true;		bool IsIdentity = true;
SmallVector<int> ShuffleMask(NumElts, UndefMaskElem);		SmallVector<int> ShuffleMask(NumElts, UndefMaskElem);
for (unsigned I = 0; I < NumScalars; ++I) {		for (unsigned I = 0; I < NumScalars; ++I) {
		if (isa<UndefValue>(VL[I]))
		continue;
Optional<int> InsertIdx = getInsertIndex(VL[I], 0);		Optional<int> InsertIdx = getInsertIndex(VL[I], 0);
if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
continue;		continue;
unsigned Idx = *InsertIdx;		unsigned Idx = *InsertIdx;
DemandedElts.setBit(Idx);		DemandedElts.setBit(Idx);
if (Idx < Offset) {		if (Idx < Offset) {
Offset = Idx;		Offset = Idx;
IsIdentity &= I == 0;		IsIdentity &= I == 0;
Show All 34 Lines	#endif
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy,		TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy,
TTI::getCastContextHint(VL0), CostKind, VL0);		TTI::getCastContextHint(VL0), CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		CommonCost -= (ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}

// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
InstructionCost ScalarCost = VL.size() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;

auto *SrcVecTy = FixedVectorType::get(SrcTy, VL.size());		auto *SrcVecTy = FixedVectorType::get(SrcTy, SelfVF);
InstructionCost VecCost = 0;		InstructionCost VecCost = 0;
// Check if the values are candidates to demote.		// Check if the values are candidates to demote.
if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {		if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {
VecCost = CommonCost + TTI->getCastInstrCost(		VecCost = CommonCost + TTI->getCastInstrCost(
E->getOpcode(), VecTy, SrcVecTy,		E->getOpcode(), VecTy, SrcVecTy,
TTI::getCastContextHint(VL0), CostKind, VL0);		TTI::getCastContextHint(VL0), CostKind, VL0);
}		}
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Select: {		case Instruction::Select: {
// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getCmpSelInstrCost(E->getOpcode(), ScalarTy, Builder.getInt1Ty(),		TTI->getCmpSelInstrCost(E->getOpcode(), ScalarTy, Builder.getInt1Ty(),
CmpInst::BAD_ICMP_PREDICATE, CostKind, VL0);		CmpInst::BAD_ICMP_PREDICATE, CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		CommonCost -= (ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
auto *MaskTy = FixedVectorType::get(Builder.getInt1Ty(), VL.size());		auto *MaskTy = FixedVectorType::get(Builder.getInt1Ty(), SelfVF);
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;

// Check if all entries in VL are either compares or selects with compares		// Check if all entries in VL are either compares or selects with compares
// as condition that have the same predicates.		// as condition that have the same predicates.
CmpInst::Predicate VecPred = CmpInst::BAD_ICMP_PREDICATE;		CmpInst::Predicate VecPred = CmpInst::BAD_ICMP_PREDICATE;
bool First = true;		bool First = true;
for (auto *V : VL) {		for (auto *V : VL) {
CmpInst::Predicate CurrentPred;		CmpInst::Predicate CurrentPred;
auto MatchCmp = m_Cmp(CurrentPred, m_Value(), m_Value());		auto MatchCmp = m_Cmp(CurrentPred, m_Value(), m_Value());
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
TargetTransformInfo::OperandValueProperties Op2VP =		TargetTransformInfo::OperandValueProperties Op2VP =
TargetTransformInfo::OP_PowerOf2;		TargetTransformInfo::OP_PowerOf2;

// If all operands are exactly the same ConstantInt then set the		// If all operands are exactly the same ConstantInt then set the
// operand kind to OK_UniformConstantValue.		// operand kind to OK_UniformConstantValue.
// If instead not all operands are constants, then set the operand kind		// If instead not all operands are constants, then set the operand kind
// to OK_AnyValue. If all operands are constants but not the same,		// to OK_AnyValue. If all operands are constants but not the same,
// then set the operand kind to OK_NonUniformConstantValue.		// then set the operand kind to OK_NonUniformConstantValue.
ConstantInt *CInt0 = nullptr;		Constant *C0 = nullptr;
for (unsigned i = 0, e = VL.size(); i < e; ++i) {		for (unsigned i = 0, e = VL.size(); i < e; ++i) {
		if (isa<UndefValue>(VL[i]))
		continue;
const Instruction *I = cast<Instruction>(VL[i]);		const Instruction *I = cast<Instruction>(VL[i]);
unsigned OpIdx = isa<BinaryOperator>(I) ? 1 : 0;		unsigned OpIdx = isa<BinaryOperator>(I) ? 1 : 0;
ConstantInt *CInt = dyn_cast<ConstantInt>(I->getOperand(OpIdx));		ConstantInt *CInt = dyn_cast<ConstantInt>(I->getOperand(OpIdx));
if (!CInt) {		Constant *UV = dyn_cast<UndefValue>(I->getOperand(OpIdx));
		if (!CInt && !UV) {
Op2VK = TargetTransformInfo::OK_AnyValue;		Op2VK = TargetTransformInfo::OK_AnyValue;
Op2VP = TargetTransformInfo::OP_None;		Op2VP = TargetTransformInfo::OP_None;
break;		break;
}		}
if (Op2VP == TargetTransformInfo::OP_PowerOf2 &&		if (Op2VP == TargetTransformInfo::OP_PowerOf2 &&
!CInt->getValue().isPowerOf2())		(UV \|\| !cast<ConstantInt>(CInt)->getValue().isPowerOf2()))
Op2VP = TargetTransformInfo::OP_None;		Op2VP = TargetTransformInfo::OP_None;
if (i == 0) {		if (i == 0) {
CInt0 = CInt;		C0 = CInt ? CInt : UV;
continue;		continue;
}		}
if (CInt0 != CInt)		if (C0 != (CInt ? CInt : UV))
Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;		Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
}		}

SmallVector<const Value *, 4> Operands(VL0->operand_values());		SmallVector<const Value *, 4> Operands(VL0->operand_values());
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		CommonCost -= (ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;
InstructionCost VecCost =		InstructionCost VecCost =
TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return CommonCost + VecCost - ScalarCost;		return CommonCost + VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;

InstructionCost ScalarEltCost = TTI->getArithmeticInstrCost(		InstructionCost ScalarEltCost = TTI->getArithmeticInstrCost(
Instruction::Add, ScalarTy, CostKind, Op1VK, Op2VK);		Instruction::Add, ScalarTy, CostKind, Op1VK, Op2VK);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		CommonCost -= (ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = NumOfInstructions * ScalarEltCost;
InstructionCost VecCost = TTI->getArithmeticInstrCost(		InstructionCost VecCost = TTI->getArithmeticInstrCost(
Instruction::Add, VecTy, CostKind, Op1VK, Op2VK);		Instruction::Add, VecTy, CostKind, Op1VK, Op2VK);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return CommonCost + VecCost - ScalarCost;		return CommonCost + VecCost - ScalarCost;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Cost of wide load - cost of scalar loads.		// Cost of wide load - cost of scalar loads.
Align Alignment = cast<LoadInst>(VL0)->getAlign();		Align Alignment = cast<LoadInst>(VL0)->getAlign();
InstructionCost ScalarEltCost = TTI->getMemoryOpCost(		InstructionCost ScalarEltCost = TTI->getMemoryOpCost(
Instruction::Load, ScalarTy, Alignment, 0, CostKind, VL0);		Instruction::Load, ScalarTy, Alignment, 0, CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		CommonCost -= (ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarLdCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarLdCost = NumOfInstructions * ScalarEltCost;

InstructionCost VecLdCost;		InstructionCost VecLdCost;
if (E->State == TreeEntry::Vectorize) {		if (E->State == TreeEntry::Vectorize) {
VecLdCost = TTI->getMemoryOpCost(Instruction::Load, VecTy, Alignment, 0,		unsigned MinIdx;
CostKind, VL0);		unsigned MaxIdx;
		if (E->ReorderIndices.empty()) {
		MinIdx = std::distance(VL.begin(), find_if(VL, Instruction::classof));
		spatelUnsubmitted Not Done Reply Inline Actions Are we always creating a masked load for a vector with 2 elements? This logic needs a code comment to explain the cases. spatel: Are we always creating a masked load for a vector with 2 elements? This logic needs a code…
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, no need to do it for 2 elements, removed it. ABataev: No, no need to do it for 2 elements, removed it.
		MaxIdx =
		std::distance(VL.begin(),
		find_if(reverse(VL), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		Align CommonAlign;
		if (E->ReorderIndices.empty())
		CommonAlign = Alignment;
		else
		CommonAlign =
		cast<LoadInst>(VL[E->ReorderIndices[MinIdx]])->getAlign();
		unsigned InstrDist = MaxIdx - MinIdx + 1;
		unsigned Sz = DL->getTypeStoreSize(ScalarTy);
		// Check if we can use load instead of masked load, i.e. we can directly
		// load aligned data.
		unsigned AlignedInstrDist = std::min(
		PowerOf2Ceil(InstrDist), alignTo(InstrDist * Sz, CommonAlign) / Sz);
		if (isPowerOf2_32(AlignedInstrDist)) {
		CommonAlign =
		commonAlignment(CommonAlign, CommonAlign.value() -
		(AlignedInstrDist - InstrDist));
		auto *LoadVecTy = VecTy;
		if (AlignedInstrDist != SelfVF)
		LoadVecTy = FixedVectorType::get(ScalarTy, AlignedInstrDist);
		VecLdCost = TTI->getMemoryOpCost(Instruction::Load, LoadVecTy,
		CommonAlign, 0, CostKind, VL0);
		} else {
		VecLdCost = TTI->getMaskedMemoryOpCost(Instruction::Load, VecTy,
		Alignment, 0, CostKind);
		}
} else {		} else {
assert(E->State == TreeEntry::ScatterVectorize && "Unknown EntryState");		assert(E->State == TreeEntry::ScatterVectorize && "Unknown EntryState");
Align CommonAlignment = Alignment;		Align CommonAlignment = Alignment;
for (Value *V : VL)		for (Value *V : InstructionsOnly)
CommonAlignment =		CommonAlignment =
commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
		unsigned NormalizedSz = llvm::PowerOf2Ceil(NumOfInstructions);
		auto *VecLdTy = FixedVectorType::get(ScalarTy, NormalizedSz);
VecLdCost = TTI->getGatherScatterOpCost(		VecLdCost = TTI->getGatherScatterOpCost(
Instruction::Load, VecTy, cast<LoadInst>(VL0)->getPointerOperand(),		Instruction::Load, VecLdTy,
/VariableMask=/false, Alignment, CostKind, VL0);		cast<LoadInst>(VL0)->getPointerOperand(),
		/VariableMask=/false, CommonAlignment, CostKind, VL0);
}		}
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecLdCost, ScalarLdCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecLdCost, ScalarLdCost));
return CommonCost + VecLdCost - ScalarLdCost;		return CommonCost + VecLdCost - ScalarLdCost;
}		}
case Instruction::Store: {		case Instruction::Store: {
// We know that we can merge the stores. Calculate the cost.		// We know that we can merge the stores. Calculate the cost.
bool IsReorder = !E->ReorderIndices.empty();		bool IsReorder = !E->ReorderIndices.empty();
auto *SI =		auto *SI =
cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);		cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);
Align Alignment = SI->getAlign();		Align Alignment = SI->getAlign();
InstructionCost ScalarEltCost = TTI->getMemoryOpCost(		InstructionCost ScalarEltCost = TTI->getMemoryOpCost(
Instruction::Store, ScalarTy, Alignment, 0, CostKind, VL0);		Instruction::Store, ScalarTy, Alignment, 0, CostKind, VL0);
InstructionCost ScalarStCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarStCost = NumOfInstructions * ScalarEltCost;
InstructionCost VecStCost = TTI->getMemoryOpCost(		InstructionCost VecStCost;
Instruction::Store, VecTy, Alignment, 0, CostKind, VL0);		unsigned MinIdx;
		unsigned MaxIdx;
		if (!IsReorder) {
		MinIdx = std::distance(VL.begin(), find_if(VL, Instruction::classof));
		MaxIdx =
		std::distance(VL.begin(),
		find_if(reverse(VL), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		if (NumOfInstructions != SelfVF) {
		VecStCost = TTI->getMaskedMemoryOpCost(Instruction::Store, VecTy,
		Alignment, 0, CostKind);
		} else {
		VecStCost = TTI->getMemoryOpCost(Instruction::Store, VecTy, Alignment,
		0, CostKind, VL0);
		}
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecStCost, ScalarStCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecStCost, ScalarStCost));
return CommonCost + VecStCost - ScalarStCost;		return CommonCost + VecStCost - ScalarStCost;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// Calculate the cost of the scalar and vector calls.		// Calculate the cost of the scalar and vector calls.
IntrinsicCostAttributes CostAttrs(ID, *CI, 1);		IntrinsicCostAttributes CostAttrs(ID, *CI, 1);
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getIntrinsicInstrCost(CostAttrs, CostKind);		TTI->getIntrinsicInstrCost(CostAttrs, CostKind);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		CommonCost -= (ReuseShuffleNumbers - NumOfInstructions) * ScalarEltCost;
}		}
InstructionCost ScalarCallCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCallCost = NumOfInstructions * ScalarEltCost;

auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI);		auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI);
InstructionCost VecCallCost =		InstructionCost VecCallCost =
std::min(VecCallCosts.first, VecCallCosts.second);		std::min(VecCallCosts.first, VecCallCosts.second);

LLVM_DEBUG(dbgs() << "SLP: Call cost " << VecCallCost - ScalarCallCost		LLVM_DEBUG(dbgs() << "SLP: Call cost " << VecCallCost - ScalarCallCost
<< " (" << VecCallCost << "-" << ScalarCallCost << ")"		<< " (" << VecCallCost << "-" << ScalarCallCost << ")"
<< " for " << *CI << "\n");		<< " for " << *CI << "\n");

return CommonCost + VecCallCost - ScalarCallCost;		return CommonCost + VecCallCost - ScalarCallCost;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
assert(E->isAltShuffle() &&		assert(E->isAltShuffle() &&
((Instruction::isBinaryOp(E->getOpcode()) &&		((Instruction::isBinaryOp(E->getOpcode()) &&
Instruction::isBinaryOp(E->getAltOpcode())) \|\|		Instruction::isBinaryOp(E->getAltOpcode())) \|\|
(Instruction::isCast(E->getOpcode()) &&		(Instruction::isCast(E->getOpcode()) &&
Instruction::isCast(E->getAltOpcode()))) &&		Instruction::isCast(E->getAltOpcode()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");
InstructionCost ScalarCost = 0;		InstructionCost ScalarCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
for (unsigned Idx : E->ReuseShuffleIndices) {		for (unsigned Idx : E->ReuseShuffleIndices) {
Instruction *I = cast<Instruction>(VL[Idx]);		if (Idx >= VL.size() \|\| isa<UndefValue>(VL[Idx]))
		continue;
		auto *I = cast<Instruction>(VL[Idx]);
CommonCost -= TTI->getInstructionCost(I, CostKind);		CommonCost -= TTI->getInstructionCost(I, CostKind);
}		}
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
Instruction *I = cast<Instruction>(V);		Instruction *I = cast<Instruction>(V);
CommonCost += TTI->getInstructionCost(I, CostKind);		CommonCost += TTI->getInstructionCost(I, CostKind);
}		}
}		}
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
		RKSimonUnsubmitted Done Reply Inline Actions InstructionsOnly ? RKSimon: InstructionsOnly ?
Instruction *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
assert(E->isOpcodeOrAlt(I) && "Unexpected main/alternate opcode");		assert(E->isOpcodeOrAlt(I) && "Unexpected main/alternate opcode");
ScalarCost += TTI->getInstructionCost(I, CostKind);		ScalarCost += TTI->getInstructionCost(I, CostKind);
}		}
// VecCost is equal to sum of the cost of creating 2 vectors		// VecCost is equal to sum of the cost of creating 2 vectors
// and the cost of creating shuffle.		// and the cost of creating shuffle.
InstructionCost VecCost = 0;		InstructionCost VecCost = 0;
if (Instruction::isBinaryOp(E->getOpcode())) {		if (Instruction::isBinaryOp(E->getOpcode())) {
VecCost = TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind);		VecCost = TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind);
VecCost += TTI->getArithmeticInstrCost(E->getAltOpcode(), VecTy,		VecCost += TTI->getArithmeticInstrCost(E->getAltOpcode(), VecTy,
CostKind);		CostKind);
} else {		} else {
Type *Src0SclTy = E->getMainOp()->getOperand(0)->getType();		Type *Src0SclTy = E->getMainOp()->getOperand(0)->getType();
Type *Src1SclTy = E->getAltOp()->getOperand(0)->getType();		Type *Src1SclTy = E->getAltOp()->getOperand(0)->getType();
auto *Src0Ty = FixedVectorType::get(Src0SclTy, VL.size());		auto *Src0Ty = FixedVectorType::get(Src0SclTy, SelfVF);
auto *Src1Ty = FixedVectorType::get(Src1SclTy, VL.size());		auto *Src1Ty = FixedVectorType::get(Src1SclTy, SelfVF);
VecCost = TTI->getCastInstrCost(E->getOpcode(), VecTy, Src0Ty,		VecCost = TTI->getCastInstrCost(E->getOpcode(), VecTy, Src0Ty,
TTI::CastContextHint::None, CostKind);		TTI::CastContextHint::None, CostKind);
VecCost += TTI->getCastInstrCost(E->getAltOpcode(), VecTy, Src1Ty,		VecCost += TTI->getCastInstrCost(E->getAltOpcode(), VecTy, Src1Ty,
TTI::CastContextHint::None, CostKind);		TTI::CastContextHint::None, CostKind);
}		}

SmallVector<int> Mask(E->Scalars.size());		SmallVector<int> Mask(E->Scalars.size());
for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I) {		for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I) {
		if (isa<UndefValue>(E->Scalars[I])) {
		Mask[I] = UndefMaskElem;
		continue;
		}
auto *OpInst = cast<Instruction>(E->Scalars[I]);		auto *OpInst = cast<Instruction>(E->Scalars[I]);
assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");		assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");
Mask[I] = I + (OpInst->getOpcode() == E->getAltOpcode() ? End : 0);		Mask[I] = I + (OpInst->getOpcode() == E->getAltOpcode() ? End : 0);
}		}
VecCost +=		VecCost +=
TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask, 0);		TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask, 0);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return CommonCost + VecCost - ScalarCost;		return CommonCost + VecCost - ScalarCost;
Show All 19 Lines	bool BoUpSLP::isFullyVectorizableTinyTree() const {
// with the second gather nodes if they have less scalar operands rather than		// with the second gather nodes if they have less scalar operands rather than
// the initial tree element (may be profitable to shuffle the second gather)		// the initial tree element (may be profitable to shuffle the second gather)
// or they are extractelements, which form shuffle.		// or they are extractelements, which form shuffle.
SmallVector<int> Mask;		SmallVector<int> Mask;
if (VectorizableTree[0]->State == TreeEntry::Vectorize &&		if (VectorizableTree[0]->State == TreeEntry::Vectorize &&
(allConstant(VectorizableTree[1]->Scalars) \|\|		(allConstant(VectorizableTree[1]->Scalars) \|\|
isSplat(VectorizableTree[1]->Scalars) \|\|		isSplat(VectorizableTree[1]->Scalars) \|\|
(VectorizableTree[1]->State == TreeEntry::NeedToGather &&		(VectorizableTree[1]->State == TreeEntry::NeedToGather &&
VectorizableTree[1]->Scalars.size() <		PowerOf2Floor(
VectorizableTree[0]->Scalars.size()) \|\|		count_if(VectorizableTree[1]->Scalars, Instruction::classof)) <
		PowerOf2Floor(count_if(VectorizableTree[0]->Scalars,
		Instruction::classof))) \|\|
(VectorizableTree[1]->State == TreeEntry::NeedToGather &&		(VectorizableTree[1]->State == TreeEntry::NeedToGather &&
VectorizableTree[1]->getOpcode() == Instruction::ExtractElement &&		VectorizableTree[1]->getOpcode() == Instruction::ExtractElement &&
isShuffle(VectorizableTree[1]->Scalars, Mask))))		isShuffle(VectorizableTree[1]->Scalars, Mask))))
return true;		return true;

// Gathering cost would be too much for tiny trees.		// Gathering cost would be too much for tiny trees.
if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|		if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|
VectorizableTree[1]->State == TreeEntry::NeedToGather)		VectorizableTree[1]->State == TreeEntry::NeedToGather)
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	InstructionCost BoUpSLP::getSpillCost() const {
return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {		InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "		LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
<< VectorizableTree.size() << ".\n");		<< VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0]->Scalars.size();

for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {		for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I].get();		TreeEntry &TE = *VectorizableTree[I].get();

		// Exclude cost of gather loads nodes which are not used.
		if (GatheredLoadsEntriesFirst >= 0 &&
		I >= static_cast<unsigned>(GatheredLoadsEntriesFirst) &&
		TE.State == TreeEntry::NeedToGather) {
		assert(all_of(TE.Scalars,
		[this](Value *V) {
		return (isa<LoadInst>(V) && MustGather.contains(V)) \|\|
		isa<Constant>(V) \|\|
		V->getType()->isPtrOrPtrVectorTy();
		}) &&
		"Expected loads, pointers or constants only.");
		continue;
		}

InstructionCost C = getEntryCost(&TE, VectorizedVals);		InstructionCost C = getEntryCost(&TE, VectorizedVals);
Cost += C;		Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n"		<< ".\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	if (EU.User && isa<InsertElementInst>(EU.User)) {
int Idx = *InsertIdx;		int Idx = *InsertIdx;
ShuffleMask[VecId][Idx] = EU.Lane;		ShuffleMask[VecId][Idx] = EU.Lane;
IsIdentity.set(IsIdentity.test(VecId) &		IsIdentity.set(IsIdentity.test(VecId) &
(EU.Lane == Idx \|\| EU.Lane == UndefMaskElem));		(EU.Lane == Idx \|\| EU.Lane == UndefMaskElem));
DemandedElts[VecId].setBit(Idx);		DemandedElts[VecId].setBit(Idx);
}		}
}		}

		// BundleWidth varies in the treee, need to get the VF for each tree node.
		const TreeEntry *TE = getTreeEntry(EU.Scalar);
		SmallSet<unsigned int, 4> UserVFs;
		unsigned BundleWidth = getEntryVF(TE, UserVFs, TE);
		if (!TE->ReuseShuffleIndices.empty()) {
		int Limit = TE->ReuseShuffleIndices.size();
		BundleWidth = std::max<unsigned>(
		BundleWidth,
		PowerOf2Ceil(std::distance(
		TE->ReuseShuffleIndices.begin(),
		find_if(reverse(TE->ReuseShuffleIndices), [Limit](int I) {
		return I < Limit;
		}).base())));
		}

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);		auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);
auto *ScalarRoot = VectorizableTree[0]->Scalars[0];		auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
auto Extend =		auto Extend =
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
if (ViewSLPTree)		if (ViewSLPTree)
ViewGraph(this, "SLP" + F->getName(), false, Str);		ViewGraph(this, "SLP" + F->getName(), false, Str);
#endif		#endif

return Cost;		return Cost;
}		}

Optional<TargetTransformInfo::ShuffleKind>		Optional<TargetTransformInfo::ShuffleKind>
BoUpSLP::isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,		BoUpSLP::isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,
		RKSimonUnsubmitted Done Reply Inline Actions IgnoredIndices might be cheaper as a SparseBitVector ? RKSimon: IgnoredIndices might be cheaper as a SparseBitVector ?
SmallVectorImpl<const TreeEntry *> &Entries) {		SmallVectorImpl<const TreeEntry *> &Entries) {
// TODO: currently checking only for Scalars in the tree entry, need to count		// TODO: currently checking only for Scalars in the tree entry, need to count
// reused elements too for better cost estimation.		// reused elements too for better cost estimation.
Mask.assign(TE->Scalars.size(), UndefMaskElem);		Mask.assign(TE->Scalars.size(), UndefMaskElem);
Entries.clear();		Entries.clear();
// Build a lists of values to tree entries.		// Build a lists of values to tree entries.
DenseMap<Value , SmallPtrSet<const TreeEntry , 4>> ValueToTEs;		DenseMap<Value , SmallPtrSet<const TreeEntry , 4>> ValueToTEs;
for (const std::unique_ptr<TreeEntry> &EntryPtr : VectorizableTree) {		for (const std::unique_ptr<TreeEntry> &EntryPtr : VectorizableTree) {
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	BoUpSLP::isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,
}		}
return None;		return None;
}		}

InstructionCost		InstructionCost
BoUpSLP::getGatherCost(FixedVectorType *Ty,		BoUpSLP::getGatherCost(FixedVectorType *Ty,
const DenseSet<unsigned> &ShuffledIndices) const {		const DenseSet<unsigned> &ShuffledIndices) const {
unsigned NumElts = Ty->getNumElements();		unsigned NumElts = Ty->getNumElements();
APInt DemandedElts = APInt::getNullValue(NumElts);		APInt DemandedElts = APInt::getNullValue(NumElts);
for (unsigned I = 0; I < NumElts; ++I)		for (unsigned I = 0; I < NumElts; ++I)
		RKSimonUnsubmitted Not Done Reply Inline Actions trivial style refactor - pull out of patch? RKSimon: trivial style refactor - pull out of patch?
if (!ShuffledIndices.count(I))		if (!ShuffledIndices.count(I))
DemandedElts.setBit(I);		DemandedElts.setBit(I);
InstructionCost Cost =		InstructionCost Cost =
TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,		TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,
/Extract/ false);		/Extract/ false);
if (!ShuffledIndices.empty())		if (!ShuffledIndices.empty())
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, Ty);		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, Ty);
return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL) const {		InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
		unsigned VF) const {
// Find the type of the operands in VL.		// Find the type of the operands in VL.
Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());		auto *VecTy = FixedVectorType::get(ScalarTy, VF);
// Find the cost of inserting/extracting values from the vector.		// Find the cost of inserting/extracting values from the vector.
// Check if the same elements are inserted several times and count them as		// Check if the same elements are inserted several times and count them as
// shuffle candidates.		// shuffle candidates.
DenseSet<unsigned> ShuffledElements;		DenseSet<unsigned> ShuffledElements;
DenseSet<Value *> UniqueElements;		DenseSet<Value *> UniqueElements;
// Iterate in reverse order to consider insert elements with the high cost.		// Iterate in reverse order to consider insert elements with the high cost.
for (unsigned I = VL.size(); I > 0; --I) {		for (int I = VF; I > 0; --I) {
unsigned Idx = I - 1;		unsigned Idx = I - 1;
if (isConstant(VL[Idx]))		if (isConstant(VL[Idx]))
continue;		continue;
if (!UniqueElements.insert(VL[Idx]).second)		if (!UniqueElements.insert(VL[Idx]).second)
ShuffledElements.insert(Idx);		ShuffledElements.insert(Idx);
}		}
return getGatherCost(VecTy, ShuffledElements);		return getGatherCost(VecTy, ShuffledElements);
}		}

// Perform operand reordering on the instructions in VL and return the reordered		// Perform operand reordering on the instructions in VL and return the reordered
// operands in Left and Right.		// operands in Left and Right.
void BoUpSLP::reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		void BoUpSLP::reorderInputsAccordingToOpcode(
SmallVectorImpl<Value *> &Left,		Instruction &VL0, ArrayRef<Value > VL, SmallVectorImpl<Value > &Left,
SmallVectorImpl<Value *> &Right,		SmallVectorImpl<Value *> &Right, const DataLayout &DL, ScalarEvolution &SE,
const DataLayout &DL,
ScalarEvolution &SE,
const BoUpSLP &R) {		const BoUpSLP &R) {
if (VL.empty())		if (VL.empty())
return;		return;
VLOperands Ops(VL, DL, SE, R);		VLOperands Ops(VL0, VL, DL, SE, R);
// Reorder the operands in place.		// Reorder the operands in place.
Ops.reorder();		Ops.reorder();
Left = Ops.getVL(0);		Left = Ops.getVL(0);
Right = Ops.getVL(1);		Right = Ops.getVL(1);
}		}

void BoUpSLP::setInsertPointAfterBundle(const TreeEntry *E) {		void BoUpSLP::setInsertPointAfterBundle(const TreeEntry *E) {
		auto InstructionsOnly = make_filter_range(E->Scalars, Instruction::classof);
		if (llvm::empty(InstructionsOnly))
		return;
// Get the basic block this bundle is in. All instructions in the bundle		// Get the basic block this bundle is in. All instructions in the bundle
// should be in this block.		// should be in this block.
auto *Front = E->getMainOp();		auto *Front = E->getMainOp();
auto *BB = Front->getParent();		auto *BB = Front->getParent();
assert(llvm::all_of(E->Scalars, [=](Value *V) -> bool {		assert(llvm::all_of(InstructionsOnly, [=](Value *V) -> bool {
auto *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;		return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;
}));		}));

// The last instruction in the bundle in program order.		// The last instruction in the bundle in program order.
Instruction *LastInst = nullptr;		Instruction *LastInst = nullptr;

// Find the last instruction. The common case should be that BB has been		// Find the last instruction. The common case should be that BB has been
// scheduled, and the last instruction is VL.back(). So we start with		// scheduled, and the last instruction is VL.back(). So we start with
// VL.back() and iterate over schedule data until we reach the end of the		// VL.back() and iterate over schedule data until we reach the end of the
// bundle. The end of the bundle is marked by null ScheduleData.		// bundle. The end of the bundle is marked by null ScheduleData.
if (BlocksSchedules.count(BB)) {		if (BlocksSchedules.count(BB)) {
auto *Bundle =		auto *Bundle = BlocksSchedules[BB]->getScheduleData(
BlocksSchedules[BB]->getScheduleData(E->isOneOf(E->Scalars.back()));		E->isOneOf(*llvm::reverse(InstructionsOnly).begin()));
if (Bundle && Bundle->isPartOfBundle())		if (Bundle && Bundle->isPartOfBundle())
for (; Bundle; Bundle = Bundle->NextInBundle)		for (; Bundle; Bundle = Bundle->NextInBundle)
if (Bundle->OpValue == Bundle->Inst)		if (Bundle->OpValue == Bundle->Inst)
LastInst = Bundle->Inst;		LastInst = Bundle->Inst;
}		}

// LastInst can still be null at this point if there's either not an entry		// LastInst can still be null at this point if there's either not an entry
// for BB in BlocksSchedules or there's no ScheduleData available for		// for BB in BlocksSchedules or there's no ScheduleData available for
Show All 9 Lines	void BoUpSLP::setInsertPointAfterBundle(const TreeEntry *E) {
// will visit all the remaining instructions in the block.		// will visit all the remaining instructions in the block.
//		//
// One of the reasons we exit early from buildTree_rec is to place an upper		// One of the reasons we exit early from buildTree_rec is to place an upper
// bound on compile-time. Thus, taking an additional compile-time hit here is		// bound on compile-time. Thus, taking an additional compile-time hit here is
// not ideal. However, this should be exceedingly rare since it requires that		// not ideal. However, this should be exceedingly rare since it requires that
// we both exit early from buildTree_rec and that the bundle be out-of-order		// we both exit early from buildTree_rec and that the bundle be out-of-order
// (causing us to iterate all the way to the end of the block).		// (causing us to iterate all the way to the end of the block).
if (!LastInst) {		if (!LastInst) {
SmallPtrSet<Value *, 16> Bundle(E->Scalars.begin(), E->Scalars.end());		SmallPtrSet<Value *, 16> Bundle(InstructionsOnly.begin(),
		InstructionsOnly.end());
for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {		for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {
if (Bundle.erase(&I) && E->isOpcodeOrAlt(&I))		if (Bundle.erase(&I) && E->isOpcodeOrAlt(&I))
LastInst = &I;		LastInst = &I;
if (Bundle.empty())		if (Bundle.empty())
break;		break;
}		}
}		}
assert(LastInst && "Failed to find last instruction in bundle");		assert(LastInst && "Failed to find last instruction in bundle");

// Set the insertion point after the last instruction in the bundle. Set the		// Set the insertion point after the last instruction in the bundle. Set the
// debug location to Front.		// debug location to Front.
Builder.SetInsertPoint(BB, ++LastInst->getIterator());		Builder.SetInsertPoint(BB, ++LastInst->getIterator());
Builder.SetCurrentDebugLocation(Front->getDebugLoc());		Builder.SetCurrentDebugLocation(Front->getDebugLoc());
}		}

Value BoUpSLP::gather(ArrayRef<Value > VL) {		Value BoUpSLP::gather(ArrayRef<Value > VL) {
		spatelUnsubmitted Not Done Reply Inline Actions I did some clean-ups while trying to understand the behavior of this code, so this patch will need a (hopefully reduced diff) update: rG7451bf0b0b6d rG062276c69109 This one may also require rebase: rGa44238cb443f spatel: I did some clean-ups while trying to understand the behavior of this code, so this patch will…
// List of instructions/lanes from current block and/or the blocks which are		// List of instructions/lanes from current block and/or the blocks which are
// part of the current loop. These instructions will be inserted at the end to		// part of the current loop. These instructions will be inserted at the end to
// make it possible to optimize loops and hoist invariant instructions out of		// make it possible to optimize loops and hoist invariant instructions out of
// the loops body with better chances for success.		// the loops body with better chances for success.
SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;		SmallVector<std::pair<Value *, unsigned>, 4> PostponedInsts;
SmallSet<int, 4> PostponedIndices;		SmallSet<int, 4> PostponedIndices;
Loop *L = LI->getLoopFor(Builder.GetInsertBlock());		Loop *L = LI->getLoopFor(Builder.GetInsertBlock());
auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {		auto &&CheckPredecessor = [](BasicBlock InstBB, BasicBlock InsertBB) {
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	public:

~ShuffleInstructionBuilder() {		~ShuffleInstructionBuilder() {
assert((IsFinalized \|\| Mask.empty()) &&		assert((IsFinalized \|\| Mask.empty()) &&
"Shuffle construction must be finalized.");		"Shuffle construction must be finalized.");
}		}
};		};
} // namespace		} // namespace

Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL) {		Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL, unsigned VF) {
unsigned VF = VL.size();
InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (S.getOpcode()) {		if (S.getOpcode()) {
if (TreeEntry *E = getTreeEntry(S.OpValue))		if (TreeEntry *E = getTreeEntry(S.OpValue))
if (E->isSame(VL)) {		if (VL.size() == E->Scalars.size() && E->isSame(VL)) {
Value *V = vectorizeTree(E);		Value *V = vectorizeTree(E);
if (VF != cast<FixedVectorType>(V->getType())->getNumElements()) {		if (VF != cast<FixedVectorType>(V->getType())->getNumElements()) {
if (!E->ReuseShuffleIndices.empty()) {		if (!E->ReuseShuffleIndices.empty()) {
// Reshuffle to get only unique values.		// Reshuffle to get only unique values.
// If some of the scalars are duplicated in the vectorization tree		// If some of the scalars are duplicated in the vectorization tree
// entry, we do not vectorize them but instead generate a mask for		// entry, we do not vectorize them but instead generate a mask for
// the reuses. But if there are several users of the same entry,		// the reuses. But if there are several users of the same entry,
// they may have different vectorization factors. This is especially		// they may have different vectorization factors. This is especially
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	if (!ReuseShuffleIndicies.empty()) {
}		}
}		}
return Vec;		return Vec;
}		}

Value BoUpSLP::vectorizeTree(TreeEntry E) {		Value BoUpSLP::vectorizeTree(TreeEntry E) {
IRBuilder<>::InsertPointGuard Guard(Builder);		IRBuilder<>::InsertPointGuard Guard(Builder);

		Instruction *VL0 = E->getMainOp();
if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

		SmallSet<unsigned, 4> UserVFs;
		unsigned SelfVF = getEntryVF(E, UserVFs, E);
		unsigned ShuffleVF = SelfVF;
		if (!E->ReuseShuffleIndices.empty()) {
		int Limit = E->Scalars.size();
		ShuffleVF = std::max<unsigned>(
		SelfVF, PowerOf2Ceil(std::distance(
		E->ReuseShuffleIndices.begin(),
		find_if(reverse(E->ReuseShuffleIndices), [Limit](int I) {
		return I < Limit;
		}).base())));
		}
		Type *ScalarTy = VL0->getType();
		if (auto *Store = dyn_cast<StoreInst>(VL0))
		ScalarTy = Store->getValueOperand()->getType();
		else if (auto *IE = dyn_cast<InsertElementInst>(VL0))
		ScalarTy = IE->getOperand(1)->getType();
		auto *VecTy = FixedVectorType::get(ScalarTy, SelfVF);
		if (isa<PoisonValue>(VL0))
		return PoisonValue::get(VecTy);
		if (isa<UndefValue>(VL0))
		return UndefValue::get(VecTy);

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
unsigned VF = E->Scalars.size();		ShuffleInstructionBuilder ShuffleBuilder(Builder, ShuffleVF);
if (NeedToShuffleReuses)
VF = E->ReuseShuffleIndices.size();
ShuffleInstructionBuilder ShuffleBuilder(Builder, VF);
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
Value *Vec;		Value *Vec;
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<const TreeEntry *> Entries;		SmallVector<const TreeEntry *> Entries;
Optional<TargetTransformInfo::ShuffleKind> Shuffle =		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
isGatherShuffledEntry(E, Mask, Entries);		isGatherShuffledEntry(E, Mask, Entries);
if (Shuffle.hasValue()) {		if (Shuffle.hasValue()) {
assert((Entries.size() == 1 \|\| Entries.size() == 2) &&		assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
"Expected shuffle of 1 or 2 entries.");		"Expected shuffle of 1 or 2 entries.");
Vec = Builder.CreateShuffleVector(Entries.front()->VectorizedValue,		Vec = Builder.CreateShuffleVector(Entries.front()->VectorizedValue,
Entries.back()->VectorizedValue, Mask);		Entries.back()->VectorizedValue, Mask);
} else {		} else {
Vec = gather(E->Scalars);		Vec = gather(makeArrayRef(E->Scalars).slice(0, SelfVF));
}		}
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
Vec = ShuffleBuilder.finalize(Vec);		Vec = ShuffleBuilder.finalize(Vec);
if (auto *I = dyn_cast<Instruction>(Vec)) {		if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherSeq.insert(I);		GatherSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
}		}
E->VectorizedValue = Vec;		E->VectorizedValue = Vec;
return Vec;		return Vec;
}		}

assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
		auto InstructionsOnly = make_filter_range(E->Scalars, Instruction::classof);
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
Instruction *VL0 = E->getMainOp();
Type *ScalarTy = VL0->getType();
if (auto *Store = dyn_cast<StoreInst>(VL0))
ScalarTy = Store->getValueOperand()->getType();
else if (auto *IE = dyn_cast<InsertElementInst>(VL0))
ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
auto *PH = cast<PHINode>(VL0);		auto *PH = cast<PHINode>(VL0);
Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());		Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());		PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());
Value *V = NewPhi;		Value *V = NewPhi;
if (NeedToShuffleReuses)		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = Builder.CreateShuffleVector(V, E->ReuseShuffleIndices, "shuffle");		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;

// PHINodes may have multiple entries from the same block. We want to		// PHINodes may have multiple entries from the same block. We want to
// visit every block once.		// visit every block once.
SmallPtrSet<BasicBlock*, 4> VisitedBBs;		SmallPtrSet<BasicBlock*, 4> VisitedBBs;

for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
BasicBlock *IBB = PH->getIncomingBlock(i);		BasicBlock *IBB = PH->getIncomingBlock(i);

if (!VisitedBBs.insert(IBB).second) {		if (!VisitedBBs.insert(IBB).second) {
NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);		NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);
continue;		continue;
}		}

Builder.SetInsertPoint(IBB->getTerminator());		Builder.SetInsertPoint(IBB->getTerminator());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
Value *Vec = vectorizeTree(E->getOperand(i));		Value *Vec = vectorizeTree(E->getOperand(i), SelfVF);
NewPhi->addIncoming(Vec, IBB);		NewPhi->addIncoming(Vec, IBB);
}		}

assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&		assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&
"Invalid number of incoming values");		"Invalid number of incoming values");
return V;		return V;
}		}

case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
Value *V = E->getSingleOperand(0);		Value *V = E->getSingleOperand(0);
Builder.SetInsertPoint(VL0);		Builder.SetInsertPoint(VL0);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);
E->VectorizedValue = V;		E->VectorizedValue = V;
return V;		return V;
}		}
case Instruction::ExtractValue: {		case Instruction::ExtractValue: {
auto *LI = cast<LoadInst>(E->getSingleOperand(0));		auto *LI = cast<LoadInst>(VL0->getOperand(0));
Builder.SetInsertPoint(LI);		Builder.SetInsertPoint(LI);
auto *PtrTy = PointerType::get(VecTy, LI->getPointerAddressSpace());		auto *PtrTy = PointerType::get(VecTy, LI->getPointerAddressSpace());
Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);		Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);
LoadInst *V = Builder.CreateAlignedLoad(VecTy, Ptr, LI->getAlign());		LoadInst *V = Builder.CreateAlignedLoad(VecTy, Ptr, LI->getAlign());
Value *NewV = propagateMetadata(V, E->Scalars);		Value *NewV = propagateMetadata(V, to_vector<4>(InstructionsOnly));
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
NewV = ShuffleBuilder.finalize(NewV);		NewV = ShuffleBuilder.finalize(NewV);
E->VectorizedValue = NewV;		E->VectorizedValue = NewV;
return NewV;		return NewV;
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
Builder.SetInsertPoint(VL0);		Builder.SetInsertPoint(VL0);
Value *V = vectorizeTree(E->getOperand(1));		Value *V = vectorizeTree(E->getOperand(1), SelfVF);

const unsigned NumElts =		const unsigned NumElts =
cast<FixedVectorType>(VL0->getType())->getNumElements();		cast<FixedVectorType>(VL0->getType())->getNumElements();
const unsigned NumScalars = E->Scalars.size();		const unsigned NumScalars = E->Scalars.size();

// Create InsertVector shuffle if necessary		// Create InsertVector shuffle if necessary
Instruction *FirstInsert = nullptr;		Instruction *FirstInsert = nullptr;
bool IsIdentity = true;		bool IsIdentity = true;
unsigned Offset = UINT_MAX;		unsigned Offset = UINT_MAX;
for (unsigned I = 0; I < NumScalars; ++I) {		for (unsigned I = 0; I < NumScalars; ++I) {
Value *Scalar = E->Scalars[I];		Value *Scalar = E->Scalars[I];
		if (isa<UndefValue>(Scalar))
		continue;
if (!FirstInsert &&		if (!FirstInsert &&
!is_contained(E->Scalars, cast<Instruction>(Scalar)->getOperand(0)))		(isa<UndefValue>(cast<Instruction>(Scalar)->getOperand(0)) \|\|
		!is_contained(E->Scalars,
		cast<Instruction>(Scalar)->getOperand(0))))
FirstInsert = cast<Instruction>(Scalar);		FirstInsert = cast<Instruction>(Scalar);
Optional<int> InsertIdx = getInsertIndex(Scalar, 0);		Optional<int> InsertIdx = getInsertIndex(Scalar, 0);
if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
continue;		continue;
unsigned Idx = *InsertIdx;		unsigned Idx = *InsertIdx;
if (Idx < Offset) {		if (Idx < Offset) {
Offset = Idx;		Offset = Idx;
IsIdentity &= I == 0;		IsIdentity &= I == 0;
} else {		} else {
assert(Idx >= Offset && "Failed to find vector index offset");		assert(Idx >= Offset && "Failed to find vector index offset");
IsIdentity &= Idx - Offset == I;		IsIdentity &= Idx - Offset == I;
}		}
}		}
assert(Offset < NumElts && "Failed to find vector index offset");		assert(Offset < NumElts && "Failed to find vector index offset");

// Create shuffle to resize vector		// Create shuffle to resize vector
		unsigned VNumElts = cast<FixedVectorType>(V->getType())->getNumElements();
SmallVector<int> Mask(NumElts, UndefMaskElem);		SmallVector<int> Mask(NumElts, UndefMaskElem);
if (!IsIdentity) {		if (!IsIdentity) {
for (unsigned I = 0; I < NumScalars; ++I) {		for (unsigned I = 0; I < NumScalars; ++I) {
Value *Scalar = E->Scalars[I];		Value *Scalar = E->Scalars[I];
		if (isa<UndefValue>(Scalar))
		continue;
Optional<int> InsertIdx = getInsertIndex(Scalar, 0);		Optional<int> InsertIdx = getInsertIndex(Scalar, 0);
if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
continue;		continue;
Mask[*InsertIdx - Offset] = I;		Mask[*InsertIdx - Offset] = I;
}		}
} else {		} else {
std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);		std::iota(Mask.begin(), std::next(Mask.begin(), VNumElts), 0);
}		}
if (!IsIdentity \|\| NumElts != NumScalars)		if (!IsIdentity \|\| NumElts != VNumElts)
V = Builder.CreateShuffleVector(V, Mask);		V = Builder.CreateShuffleVector(V, Mask);

if (NumElts != NumScalars) {		if (NumElts != VNumElts) {
SmallVector<int> InsertMask(NumElts);		SmallVector<int> InsertMask(NumElts);
std::iota(InsertMask.begin(), InsertMask.end(), 0);		std::iota(InsertMask.begin(), InsertMask.end(), 0);
for (unsigned I = 0; I < NumElts; I++) {		for (unsigned I = 0; I < NumElts; I++) {
if (Mask[I] != UndefMaskElem)		if (Mask[I] != UndefMaskElem)
InsertMask[Offset + I] = NumElts + I;		InsertMask[Offset + I] = NumElts + I;
}		}

V = Builder.CreateShuffleVector(		V = Builder.CreateShuffleVector(
FirstInsert->getOperand(0), V, InsertMask,		FirstInsert->getOperand(0), V, InsertMask,
cast<Instruction>(E->Scalars.back())->getName());		cast<Instruction>(*reverse(InstructionsOnly).begin())->getName());
}		}

++NumVectorInstructions;		++NumVectorInstructions;
E->VectorizedValue = V;		E->VectorizedValue = V;
return V;		return V;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *InVec = vectorizeTree(E->getOperand(0));		Value *InVec = vectorizeTree(E->getOperand(0), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

auto *CI = cast<CastInst>(VL0);		auto *CI = cast<CastInst>(VL0);
Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);		Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp: {		case Instruction::ICmp: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *L = vectorizeTree(E->getOperand(0));		Value *L = vectorizeTree(E->getOperand(0), SelfVF);
Value *R = vectorizeTree(E->getOperand(1));		Value *R = vectorizeTree(E->getOperand(1), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Value *V = Builder.CreateCmp(P0, L, R);		Value *V = Builder.CreateCmp(P0, L, R);
propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, E->Scalars, VL0);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Select: {		case Instruction::Select: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Cond = vectorizeTree(E->getOperand(0));		Value *Cond = vectorizeTree(E->getOperand(0), SelfVF);
Value *True = vectorizeTree(E->getOperand(1));		Value *True = vectorizeTree(E->getOperand(1), SelfVF);
Value *False = vectorizeTree(E->getOperand(2));		Value *False = vectorizeTree(E->getOperand(2), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateSelect(Cond, True, False);		Value *V = Builder.CreateSelect(Cond, True, False);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op = vectorizeTree(E->getOperand(0));		Value *Op = vectorizeTree(E->getOperand(0), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateUnOp(		Value *V = Builder.CreateUnOp(
static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);		static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);
Show All 24 Lines	switch (ShuffleOrOp) {
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *LHS = vectorizeTree(E->getOperand(0));		Value *LHS = vectorizeTree(E->getOperand(0), SelfVF);
Value *RHS = vectorizeTree(E->getOperand(1));		Value *RHS = vectorizeTree(E->getOperand(1), SelfVF);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateBinOp(		Value *V = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,		static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,
RHS);		RHS);
propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, E->Scalars, VL0);
if (auto *I = dyn_cast<Instruction>(V))		if (auto *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, llvm::to_vector<4>(InstructionsOnly));

ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;

return V;		return V;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Loads are inserted at the head of the tree because we don't want to		// Loads are inserted at the head of the tree because we don't want to
// sink them all the way down past store instructions.		// sink them all the way down past store instructions.
bool IsReorder = E->updateStateIfReorder();		bool IsReorder = E->updateStateIfReorder();
if (IsReorder)		if (IsReorder)
VL0 = E->getMainOp();		VL0 = E->getMainOp();
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

LoadInst *LI = cast<LoadInst>(VL0);		LoadInst *LI = cast<LoadInst>(VL0);
Instruction *NewLI;
unsigned AS = LI->getPointerAddressSpace();		unsigned AS = LI->getPointerAddressSpace();
Value *PO = LI->getPointerOperand();		Value *PO = LI->getPointerOperand();
		unsigned MinIdx;
		unsigned MaxIdx;
		if (E->ReorderIndices.empty()) {
		MinIdx = std::distance(E->Scalars.begin(),
		find_if(E->Scalars, Instruction::classof));
		MaxIdx =
		std::distance(
		E->Scalars.begin(),
		find_if(reverse(E->Scalars), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		unsigned NumOfInstructions = MaxIdx - MinIdx + 1;
		Value *VecPtr;
		Instruction *VecLI;
		Value *V;
		Align CommonAlignment = LI->getAlign();
if (E->State == TreeEntry::Vectorize) {		if (E->State == TreeEntry::Vectorize) {
		unsigned Sz = DL->getTypeStoreSize(ScalarTy);
Value *VecPtr = Builder.CreateBitCast(PO, VecTy->getPointerTo(AS));		unsigned AlignedNumOfInstructions =
		std::min(PowerOf2Ceil(NumOfInstructions),
		alignTo(NumOfInstructions * Sz, CommonAlignment) / Sz);
		if (isPowerOf2_32(AlignedNumOfInstructions)) {
		CommonAlignment =
		spatelUnsubmitted Not Done Reply Inline Actions Please add code comment/example to explain what the difference is between these 2 clauses. spatel: Please add code comment/example to explain what the difference is between these 2 clauses.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Fixed it, thanks. ABataev: Fixed it, thanks.
		commonAlignment(CommonAlignment, CommonAlignment.value() -
		(AlignedNumOfInstructions -
		NumOfInstructions));
		auto *LoadVecTy =
		FixedVectorType::get(ScalarTy, AlignedNumOfInstructions);
		VecPtr = Builder.CreateBitCast(PO, LoadVecTy->getPointerTo(AS));
		VecLI = Builder.CreateAlignedLoad(LoadVecTy, VecPtr, CommonAlignment);
		V = propagateMetadata(VecLI, llvm::to_vector<4>(InstructionsOnly));
		} else {
		VecPtr = Builder.CreateBitCast(PO, VecTy->getPointerTo(AS));
		SmallVector<Constant *, 4> Mask;
		Mask.reserve(SelfVF);
		Mask.append(NumOfInstructions, Builder.getInt1(/V=/true));
		Mask.append(SelfVF - NumOfInstructions, Builder.getInt1(/V=/false));
		VecLI = Builder.CreateMaskedLoad(VecTy, VecPtr, CommonAlignment,
		ConstantVector::get(Mask));
		V = propagateMetadata(VecLI, llvm::to_vector<4>(InstructionsOnly));
		}
// The pointer operand uses an in-tree scalar so we add the new BitCast		// The pointer operand uses an in-tree scalar so we add the new BitCast
// to ExternalUses list to make sure that an extract will be generated		// to ExternalUses list to make sure that an extract will be generated
// in the future.		// in the future.
if (getTreeEntry(PO))		if (getTreeEntry(PO))
ExternalUses.emplace_back(PO, cast<User>(VecPtr), 0);		ExternalUses.emplace_back(PO, cast<User>(VecPtr), 0);

NewLI = Builder.CreateAlignedLoad(VecTy, VecPtr, LI->getAlign());
} else {		} else {
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions `emplace_back()` anton-afanasyev: `emplace_back()`
assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");		assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");
Value *VecPtr = vectorizeTree(E->getOperand(0));		for (Value *V : InstructionsOnly)
// Use the minimum alignment of the gathered loads.
Align CommonAlignment = LI->getAlign();
for (Value *V : E->Scalars)
CommonAlignment =		CommonAlignment =
commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);		unsigned NormalizedSz = llvm::PowerOf2Ceil(NumOfInstructions);
		Value *VecPtr = vectorizeTree(E->getOperand(0), SelfVF);
		if (NormalizedSz != SelfVF) {
		spatelUnsubmitted Not Done Reply Inline Actions Is Passthrough a full vector of undef elements? If so, it should be created/named that way (or directly in the call to CreateMaskedLoad()) rather than in the loop. spatel: Is Passthrough a full vector of undef elements? If so, it should be created/named that way (or…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Fixed ABataev: Fixed
		// Reduce the original vector to optimize masked gather.
		SmallVector<int, 4> RedMask(NormalizedSz, 0);
		std::iota(RedMask.begin(), RedMask.end(), 0);
		VecPtr = Builder.CreateShuffleVector(VecPtr, RedMask);
		}
		SmallVector<Constant *, 4> Mask;
		Mask.reserve(SelfVF);
		RKSimonUnsubmitted Done Reply Inline Actions Isn't UndefValue is a type of Constant? Maybe add a comment explaining what you're doing here as its not clear, at least to me. RKSimon: Isn't UndefValue is a type of Constant? Maybe add a comment explaining what you're doing here…
		Mask.append(NumOfInstructions, Builder.getInt1(/V=/true));
		Mask.append(NormalizedSz - NumOfInstructions,
		Builder.getInt1(/V=/false));
		VecLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment,
		ConstantVector::get(Mask));
		V = propagateMetadata(VecLI, llvm::to_vector<4>(InstructionsOnly));
}		}
Value *V = propagateMetadata(NewLI, E->Scalars);

ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Store: {		case Instruction::Store: {
bool IsReorder = !E->ReorderIndices.empty();		bool IsReorder = E->updateStateIfReorder();
auto *SI = cast<StoreInst>(		if (IsReorder)
IsReorder ? E->Scalars[E->ReorderIndices.front()] : VL0);		VL0 = E->getMainOp();
		auto *SI = cast<StoreInst>(VL0);
unsigned AS = SI->getPointerAddressSpace();		unsigned AS = SI->getPointerAddressSpace();

setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *VecValue = vectorizeTree(E->getOperand(0));		Value *VecValue =
		vectorizeTree(E->getOperand(0),
		PowerOf2Ceil(std::distance(InstructionsOnly.begin(),
		InstructionsOnly.end())));
ShuffleBuilder.addMask(E->ReorderIndices);		ShuffleBuilder.addMask(E->ReorderIndices);
VecValue = ShuffleBuilder.finalize(VecValue);		VecValue = ShuffleBuilder.finalize(VecValue);

Value *ScalarPtr = SI->getPointerOperand();		Value *ScalarPtr = SI->getPointerOperand();
Value *VecPtr = Builder.CreateBitCast(
ScalarPtr, VecValue->getType()->getPointerTo(AS));		Align Alignment = SI->getAlign();
StoreInst *ST = Builder.CreateAlignedStore(VecValue, VecPtr,		unsigned MinIdx;
SI->getAlign());		unsigned MaxIdx;
		if (E->ReorderIndices.empty()) {
		MinIdx = std::distance(E->Scalars.begin(),
		find_if(E->Scalars, Instruction::classof));
		MaxIdx =
		std::distance(
		E->Scalars.begin(),
		find_if(reverse(E->Scalars), Instruction::classof).base()) -
		1;
		} else {
		std::tie(MinIdx, MaxIdx) = findMinMaxPos(E->ReorderIndices);
		}
		Value *VecPtr;
		Instruction *VecSI;
		if (std::distance(InstructionsOnly.begin(), InstructionsOnly.end()) ==
		SelfVF) {
		VecPtr = Builder.CreateBitCast(
		ScalarPtr,
		FixedVectorType::get(ScalarTy, SelfVF)->getPointerTo(AS));
		spatelUnsubmitted Not Done Reply Inline Actions Similar to above (so can we add a helper function to avoid duplicating the code?): Please add code comment/example to explain what the difference is between these 2 clauses. spatel: Similar to above (so can we add a helper function to avoid duplicating the code?): Please add…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Fixed, thanks! ABataev: Fixed, thanks!
		VecSI = Builder.CreateAlignedStore(VecValue, VecPtr, Alignment);
		} else {
		VecPtr = Builder.CreateBitCast(ScalarPtr,
		VecValue->getType()->getPointerTo(AS));
		SmallVector<Constant , 4> Mask(SelfVF, Builder.getInt1(/V=*/false));
		for (unsigned I = 0; I < SelfVF; ++I) {
		if (E->ReorderIndices[I] != SelfVF)
		Mask[I] = Builder.getInt1(/V=/true);
		}
		VecSI = Builder.CreateMaskedStore(VecValue, VecPtr, Alignment,
		ConstantVector::get(Mask));
		}

// The pointer operand uses an in-tree scalar, so add the new BitCast to		// The pointer operand uses an in-tree scalar, so add the new BitCast to
// ExternalUses to make sure that an extract will be generated in the		// ExternalUses to make sure that an extract will be generated in the
// future.		// future.
if (getTreeEntry(ScalarPtr))		if (getTreeEntry(ScalarPtr))
ExternalUses.push_back(ExternalUser(ScalarPtr, cast<User>(VecPtr), 0));		ExternalUses.emplace_back(ScalarPtr, cast<User>(VecPtr), 0);

Value *V = propagateMetadata(ST, E->Scalars);		Value *V = propagateMetadata(VecSI, llvm::to_vector<4>(InstructionsOnly));

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op0 = vectorizeTree(E->getOperand(0));		Value *Op0 = vectorizeTree(E->getOperand(0), SelfVF);

std::vector<Value *> OpVecs;		std::vector<Value *> OpVecs;
for (int j = 1, e = cast<GetElementPtrInst>(VL0)->getNumOperands(); j < e;		for (int j = 1, e = cast<GetElementPtrInst>(VL0)->getNumOperands(); j < e;
++j) {		++j) {
ValueList &VL = E->getOperand(j);		ValueList &VL = E->getOperand(j);
// Need to cast all elements to the same type before vectorization to		// Need to cast all elements to the same type before vectorization to
// avoid crash.		// avoid crash.
Type *VL0Ty = VL0->getOperand(j)->getType();		Type *VL0Ty = VL0->getOperand(j)->getType();
Type *Ty = llvm::all_of(		Type *Ty = llvm::all_of(
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions hmm, it might be unsafe to try to obtain type here since any element of VL could be Undef? dtemirbulatov: hmm, it might be unsafe to try to obtain type here since any element of VL could be Undef?
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @e = dso_local local_unnamed_addr global i32 0, align 4 @f = dso_local local_unnamed_addr global i32 0, align 4 ; Function Attrs: nofree norecurse nounwind uwtable define dso_local i32 @g() local_unnamed_addr #0 { entry: %0 = load i32, i32* @e, align 4 %tobool.not19 = icmp eq i32 %0, 0 br i1 %tobool.not19, label %while.end, label %while.body while.body: ; preds = %entry, %while.body.backedge %c.022 = phi i32* [ %c.022.be, %while.body.backedge ], [ undef, %entry ] %b.021 = phi i32* [ %b.021.be, %while.body.backedge ], [ undef, %entry ] %a.020 = phi i32* [ %a.020.be, %while.body.backedge ], [ undef, %entry ] %incdec.ptr = getelementptr inbounds i32, i32* %c.022, i64 1 %1 = ptrtoint i32* %c.022 to i64 %2 = trunc i64 %1 to i32 %incdec.ptr1 = getelementptr inbounds i32, i32* %a.020, i64 1 %incdec.ptr2 = getelementptr inbounds i32, i32* %b.021, i64 1 switch i32 %2, label %while.body.backedge [ i32 2, label %sw.bb i32 4, label %sw.bb6 ] sw.bb: ; preds = %while.body %incdec.ptr3 = getelementptr inbounds i32, i32* %b.021, i64 2 %3 = ptrtoint i32* %incdec.ptr2 to i64 %4 = trunc i64 %3 to i32 %incdec.ptr4 = getelementptr inbounds i32, i32* %a.020, i64 2 store i32 %4, i32* %incdec.ptr1, align 4 %incdec.ptr5 = getelementptr inbounds i32, i32* %c.022, i64 2 br label %while.body.backedge sw.bb6: ; preds = %while.body %incdec.ptr7 = getelementptr inbounds i32, i32* %a.020, i64 2 %incdec.ptr8 = getelementptr inbounds i32, i32* %c.022, i64 2 %5 = ptrtoint i32* %incdec.ptr to i64 %6 = trunc i64 %5 to i32 %incdec.ptr9 = getelementptr inbounds i32, i32* %b.021, i64 2 store i32 %6, i32* %incdec.ptr2, align 4 br label %while.body.backedge while.body.backedge: ; preds = %sw.bb6, %while.body, %sw.bb %c.022.be = phi i32* [ %incdec.ptr, %while.body ], [ %incdec.ptr8, %sw.bb6 ], [ %incdec.ptr5, %sw.bb ] %b.021.be = phi i32* [ %incdec.ptr2, %while.body ], [ %incdec.ptr9, %sw.bb6 ], [ %incdec.ptr3, %sw.bb ] %a.020.be = phi i32* [ %incdec.ptr1, %while.body ], [ %incdec.ptr7, %sw.bb6 ], [ %incdec.ptr4, %sw.bb ] br label %while.body while.end: ; preds = %entry ret i32 undef } attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+avx,+avx2,+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" } dtemirbulatov: target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"…
		ABataevAuthorUnsubmitted Done Reply Inline Actions UndefValue also has an associated type, so it should be fine. Your reproducer crashes because of different reasons. ABataev: UndefValue also has an associated type, so it should be fine. Your reproducer crashes because…
VL, [VL0Ty](Value *V) { return VL0Ty == V->getType(); })		VL, [VL0Ty](Value *V) { return VL0Ty == V->getType(); })
? VL0Ty		? VL0Ty
: DL->getIndexType(cast<GetElementPtrInst>(VL0)		: DL->getIndexType(cast<GetElementPtrInst>(VL0)
->getPointerOperandType()		->getPointerOperandType()
->getScalarType());		->getScalarType());
for (Value *&V : VL) {		for (Value *&V : VL) {
		if (isa<UndefValue>(V))
		continue;
auto *CI = cast<ConstantInt>(V);		auto *CI = cast<ConstantInt>(V);
V = ConstantExpr::getIntegerCast(CI, Ty,		V = ConstantExpr::getIntegerCast(CI, Ty,
CI->getValue().isSignBitSet());		CI->getValue().isSignBitSet());
}		}
Value *OpVec = vectorizeTree(VL);		Value *OpVec = vectorizeTree(VL, SelfVF);
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Value *V = Builder.CreateGEP(		Value *V = Builder.CreateGEP(
cast<GetElementPtrInst>(VL0)->getSourceElementType(), Op0, OpVecs);		cast<GetElementPtrInst>(VL0)->getSourceElementType(), Op0, OpVecs);
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, llvm::to_vector<4>(InstructionsOnly));

ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;

return V;		return V;
Show All 9 Lines	case Instruction::Call: {
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI);		auto VecCallCosts = getVectorCallCosts(CI, VecTy, TTI, TLI);
bool UseIntrinsic = ID != Intrinsic::not_intrinsic &&		bool UseIntrinsic = ID != Intrinsic::not_intrinsic &&
VecCallCosts.first <= VecCallCosts.second;		VecCallCosts.first <= VecCallCosts.second;

Value *ScalarArg = nullptr;		Value *ScalarArg = nullptr;
std::vector<Value *> OpVecs;		std::vector<Value *> OpVecs;
SmallVector<Type *, 2> TysForDecl =		SmallVector<Type *, 2> TysForDecl = {
{FixedVectorType::get(CI->getType(), E->Scalars.size())};		FixedVectorType::get(CI->getType(), SelfVF)};
for (int j = 0, e = CI->getNumArgOperands(); j < e; ++j) {		for (int j = 0, e = CI->getNumArgOperands(); j < e; ++j) {
ValueList OpVL;		ValueList OpVL;
// Some intrinsics have scalar arguments. This argument should not be		// Some intrinsics have scalar arguments. This argument should not be
// vectorized.		// vectorized.
if (UseIntrinsic && hasVectorInstrinsicScalarOpd(IID, j)) {		if (UseIntrinsic && hasVectorInstrinsicScalarOpd(IID, j)) {
CallInst *CEI = cast<CallInst>(VL0);		CallInst *CEI = cast<CallInst>(VL0);
ScalarArg = CEI->getArgOperand(j);		ScalarArg = CEI->getArgOperand(j);
OpVecs.push_back(CEI->getArgOperand(j));		OpVecs.push_back(CEI->getArgOperand(j));
if (hasVectorInstrinsicOverloadedScalarOpd(IID, j))		if (hasVectorInstrinsicOverloadedScalarOpd(IID, j))
TysForDecl.push_back(ScalarArg->getType());		TysForDecl.push_back(ScalarArg->getType());
continue;		continue;
}		}

Value *OpVec = vectorizeTree(E->getOperand(j));		Value *OpVec = vectorizeTree(E->getOperand(j), SelfVF);
LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");		LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Function *CF;		Function *CF;
if (!UseIntrinsic) {		if (!UseIntrinsic) {
VFShape Shape =		VFShape Shape = VFShape::get(*CI, ElementCount::getFixed(SelfVF),
VFShape::get(*CI, ElementCount::getFixed(static_cast<unsigned>(
VecTy->getNumElements())),
false /HasGlobalPred/);		false /HasGlobalPred/);
CF = VFDatabase(*CI).getVectorizedFunction(Shape);		CF = VFDatabase(*CI).getVectorizedFunction(Shape);
} else {		} else {
CF = Intrinsic::getDeclaration(F->getParent(), ID, TysForDecl);		CF = Intrinsic::getDeclaration(F->getParent(), ID, TysForDecl);
}		}

SmallVector<OperandBundleDef, 1> OpBundles;		SmallVector<OperandBundleDef, 1> OpBundles;
CI->getOperandBundlesAsDefs(OpBundles);		CI->getOperandBundlesAsDefs(OpBundles);
Value *V = Builder.CreateCall(CF, OpVecs, OpBundles);		Value *V = Builder.CreateCall(CF, OpVecs, OpBundles);

// The scalar argument uses an in-tree scalar so we add the new vectorized		// The scalar argument uses an in-tree scalar so we add the new vectorized
// call to ExternalUses list to make sure that an extract will be		// call to ExternalUses list to make sure that an extract will be
// generated in the future.		// generated in the future.
if (ScalarArg && getTreeEntry(ScalarArg))		if (ScalarArg && getTreeEntry(ScalarArg))
ExternalUses.push_back(ExternalUser(ScalarArg, cast<User>(V), 0));		ExternalUses.push_back(ExternalUser(ScalarArg, cast<User>(V), 0));

propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, to_vector<4>(InstructionsOnly), VL0);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
assert(E->isAltShuffle() &&		assert(E->isAltShuffle() &&
((Instruction::isBinaryOp(E->getOpcode()) &&		((Instruction::isBinaryOp(E->getOpcode()) &&
Instruction::isBinaryOp(E->getAltOpcode())) \|\|		Instruction::isBinaryOp(E->getAltOpcode())) \|\|
(Instruction::isCast(E->getOpcode()) &&		(Instruction::isCast(E->getOpcode()) &&
Instruction::isCast(E->getAltOpcode()))) &&		Instruction::isCast(E->getAltOpcode()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");

Value LHS = nullptr, RHS = nullptr;		Value LHS = nullptr, RHS = nullptr;
if (Instruction::isBinaryOp(E->getOpcode())) {		if (Instruction::isBinaryOp(E->getOpcode())) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), SelfVF);
RHS = vectorizeTree(E->getOperand(1));		RHS = vectorizeTree(E->getOperand(1), SelfVF);
} else {		} else {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeTree(E->getOperand(0), SelfVF);
}		}

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value V0, V1;		Value V0, V1;
if (Instruction::isBinaryOp(E->getOpcode())) {		if (Instruction::isBinaryOp(E->getOpcode())) {
V0 = Builder.CreateBinOp(		V0 = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS, RHS);		static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS, RHS);
V1 = Builder.CreateBinOp(		V1 = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getAltOpcode()), LHS, RHS);		static_cast<Instruction::BinaryOps>(E->getAltOpcode()), LHS, RHS);
} else {		} else {
V0 = Builder.CreateCast(		V0 = Builder.CreateCast(
static_cast<Instruction::CastOps>(E->getOpcode()), LHS, VecTy);		static_cast<Instruction::CastOps>(E->getOpcode()), LHS, VecTy);
V1 = Builder.CreateCast(		V1 = Builder.CreateCast(
static_cast<Instruction::CastOps>(E->getAltOpcode()), LHS, VecTy);		static_cast<Instruction::CastOps>(E->getAltOpcode()), LHS, VecTy);
}		}

// Create shuffle to take alternate operations from the vector.		// Create shuffle to take alternate operations from the vector.
// Also, gather up main and alt scalar ops to propagate IR flags to		// Also, gather up main and alt scalar ops to propagate IR flags to
// each vector operation.		// each vector operation.
ValueList OpScalars, AltScalars;		ValueList OpScalars, AltScalars;
unsigned Sz = E->Scalars.size();		SmallVector<int> Mask(SelfVF);
SmallVector<int> Mask(Sz);		for (unsigned I = 0; I < SelfVF; ++I) {
for (unsigned I = 0; I < Sz; ++I) {		if (isa<UndefValue>(E->Scalars[I])) {
		Mask[I] = I;
		OpScalars.push_back(E->Scalars[I]);
		continue;
		}
auto *OpInst = cast<Instruction>(E->Scalars[I]);		auto *OpInst = cast<Instruction>(E->Scalars[I]);
assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");		assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");
if (OpInst->getOpcode() == E->getAltOpcode()) {		if (OpInst->getOpcode() == E->getAltOpcode()) {
Mask[I] = Sz + I;		Mask[I] = SelfVF + I;
AltScalars.push_back(E->Scalars[I]);		AltScalars.push_back(E->Scalars[I]);
} else {		} else {
Mask[I] = I;		Mask[I] = I;
OpScalars.push_back(E->Scalars[I]);		OpScalars.push_back(E->Scalars[I]);
}		}
}		}

propagateIRFlags(V0, OpScalars);		propagateIRFlags(V0, OpScalars);
propagateIRFlags(V1, AltScalars);		propagateIRFlags(V1, AltScalars);

Value *V = Builder.CreateShuffleVector(V0, V1, Mask);		Value *V = Builder.CreateShuffleVector(V0, V1, Mask);
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, llvm::to_vector<4>(InstructionsOnly));
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;

return V;		return V;
}		}
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	for (const auto &ExternalUse : ExternalUses) {
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
continue;		continue;
TreeEntry *E = getTreeEntry(Scalar);		TreeEntry *E = getTreeEntry(Scalar);
assert(E && "Invalid scalar");		assert(E && "Invalid scalar");
assert(E->State != TreeEntry::NeedToGather &&		assert(E->State != TreeEntry::NeedToGather &&
"Extracting from a gather list");		"Extracting from a gather list");

Value *Vec = E->VectorizedValue;		Value *Vec = E->VectorizedValue;
		if (!Vec && E->getOpcode() == Instruction::Load &&
		E->UserTreeIndices.empty() && E != VectorizableTree[0].get())
		Vec = vectorizeTree(E);
assert(Vec && "Can't find vectorizable value");		assert(Vec && "Can't find vectorizable value");

Value *Lane = Builder.getInt32(ExternalUse.Lane);		Value *Lane = Builder.getInt32(ExternalUse.Lane);
auto ExtractAndExtendIfNeeded = [&](Value *Vec) {		auto ExtractAndExtendIfNeeded = [&](Value *Vec) {
if (Scalar->getType() != Vec->getType()) {		if (Scalar->getType() != Vec->getType()) {
Value *Ex;		Value *Ex;
// "Reuse" the existing extract to improve final codegen.		// "Reuse" the existing extract to improve final codegen.
if (auto *ES = dyn_cast<ExtractElementInst>(Scalar)) {		if (auto *ES = dyn_cast<ExtractElementInst>(Scalar)) {
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	for (auto &TEPtr : VectorizableTree) {
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

assert(Entry->VectorizedValue && "Can't find vectorizable value");		assert(Entry->VectorizedValue && "Can't find vectorizable value");

// For each lane:		// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];
		if (isa<UndefValue>(Scalar))
		continue;

#ifndef NDEBUG		#ifndef NDEBUG
Type *Ty = Scalar->getType();		Type *Ty = Scalar->getType();
if (!Ty->isVoidTy()) {		if (!Ty->isVoidTy()) {
for (User *U : Scalar->users()) {		for (User *U : Scalar->users()) {
LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");

// It is legal to delete users in the ignorelist.		// It is legal to delete users in the ignorelist.
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	auto &&TryScheduleBundle = [this, OldScheduleEnd, SLP](bool ReSchedule,
while (((!Bundle && ReSchedule) \|\| (Bundle && !Bundle->isReady())) &&		while (((!Bundle && ReSchedule) \|\| (Bundle && !Bundle->isReady())) &&
!ReadyInsts.empty()) {		!ReadyInsts.empty()) {
ScheduleData *Picked = ReadyInsts.pop_back_val();		ScheduleData *Picked = ReadyInsts.pop_back_val();
if (Picked->isSchedulingEntity() && Picked->isReady())		if (Picked->isSchedulingEntity() && Picked->isReady())
schedule(Picked, ReadyInsts);		schedule(Picked, ReadyInsts);
}		}
};		};

		auto InstructionsOnly = make_filter_range(VL, Instruction::classof);
// Make sure that the scheduling region contains all		// Make sure that the scheduling region contains all
// instructions of the bundle.		// instructions of the bundle.
for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
if (!extendSchedulingRegion(V, S)) {		if (!extendSchedulingRegion(V, S)) {
// If the scheduling region got new instructions at the lower end (or it		// If the scheduling region got new instructions at the lower end (or it
// is a new region for the first bundle). This makes it necessary to		// is a new region for the first bundle). This makes it necessary to
// recalculate all dependencies.		// recalculate all dependencies.
// Otherwise the compiler may crash trying to incorrectly calculate		// Otherwise the compiler may crash trying to incorrectly calculate
// dependencies and emit instruction in the wrong order at the actual		// dependencies and emit instruction in the wrong order at the actual
// scheduling.		// scheduling.
TryScheduleBundle(/ReSchedule=/false, nullptr);		TryScheduleBundle(/ReSchedule=/false, nullptr);
return None;		return None;
}		}
}		}

for (Value *V : VL) {		for (Value *V : InstructionsOnly) {
ScheduleData *BundleMember = getScheduleData(V);		ScheduleData *BundleMember = getScheduleData(V);
assert(BundleMember &&		assert(BundleMember &&
"no ScheduleData for bundle member (maybe not in same basic block)");		"no ScheduleData for bundle member (maybe not in same basic block)");
if (BundleMember->IsScheduled) {		if (BundleMember->IsScheduled) {
// A bundle member was scheduled as single instruction before and now		// A bundle member was scheduled as single instruction before and now
// needs to be scheduled as part of the bundle. We just get rid of the		// needs to be scheduled as part of the bundle. We just get rid of the
// existing schedule.		// existing schedule.
LLVM_DEBUG(dbgs() << "SLP: reset schedule because " << *BundleMember		LLVM_DEBUG(dbgs() << "SLP: reset schedule because " << *BundleMember
Show All 27 Lines	void BoUpSLP::BlockScheduling::cancelScheduling(ArrayRef<Value *> VL,
Value *OpValue) {		Value *OpValue) {
if (isa<PHINode>(OpValue) \|\| isa<InsertElementInst>(OpValue))		if (isa<PHINode>(OpValue) \|\| isa<InsertElementInst>(OpValue))
return;		return;

ScheduleData *Bundle = getScheduleData(OpValue);		ScheduleData *Bundle = getScheduleData(OpValue);
LLVM_DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");		LLVM_DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");
assert(!Bundle->IsScheduled &&		assert(!Bundle->IsScheduled &&
"Can't cancel bundle which is already scheduled");		"Can't cancel bundle which is already scheduled");
assert(Bundle->isSchedulingEntity() && Bundle->isPartOfBundle() &&		assert(Bundle->isSchedulingEntity() &&
		(Bundle->isPartOfBundle() \|\|
		llvm::count_if(VL, Instruction::classof) == 1) &&
"tried to unbundle something which is not a bundle");		"tried to unbundle something which is not a bundle");

// Un-bundle: make single instructions out of the bundle.		// Un-bundle: make single instructions out of the bundle.
ScheduleData *BundleMember = Bundle;		ScheduleData *BundleMember = Bundle;
while (BundleMember) {		while (BundleMember) {
assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");		assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");
BundleMember->FirstInBundle = BundleMember;		BundleMember->FirstInBundle = BundleMember;
ScheduleData *Next = BundleMember->NextInBundle;		ScheduleData *Next = BundleMember->NextInBundle;
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	void BoUpSLP::scheduleBlock(BlockScheduling *BS) {
// Ensure that all dependency data is updated and fill the ready-list with		// Ensure that all dependency data is updated and fill the ready-list with
// initial instructions.		// initial instructions.
int Idx = 0;		int Idx = 0;
int NumToSchedule = 0;		int NumToSchedule = 0;
for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;		for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;
I = I->getNextNode()) {		I = I->getNextNode()) {
BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {		BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {
assert((isa<InsertElementInst>(SD->Inst) \|\|		assert((isa<InsertElementInst>(SD->Inst) \|\|
SD->isPartOfBundle() == (getTreeEntry(SD->Inst) != nullptr)) &&		SD->isPartOfBundle() ==
		(getTreeEntry(SD->Inst) != nullptr &&
		llvm::count_if(getTreeEntry(SD->Inst)->Scalars,
		Instruction::classof) > 1)) &&
"scheduler and vectorizer bundle mismatch");		"scheduler and vectorizer bundle mismatch");
SD->FirstInBundle->SchedulingPriority = Idx++;		SD->FirstInBundle->SchedulingPriority = Idx++;
if (SD->isSchedulingEntity()) {		if (SD->isSchedulingEntity()) {
BS->calculateDependencies(SD, false, this);		BS->calculateDependencies(SD, false, this);
NumToSchedule++;		NumToSchedule++;
}		}
});		});
}		}
▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::runImpl(Function &F, ScalarEvolution *SE_,
return Changed;		return Changed;
}		}

/// Order may have elements assigned special value (size) which is out of		/// Order may have elements assigned special value (size) which is out of
/// bounds. Such indices only appear on places which correspond to undef values		/// bounds. Such indices only appear on places which correspond to undef values
/// (see canReuseExtract for details) and used in order to avoid undef values		/// (see canReuseExtract for details) and used in order to avoid undef values
/// have effect on operands ordering.		/// have effect on operands ordering.
/// The first loop below simply finds all unused indices and then the next loop		/// The first loop below simply finds all unused indices and then the next loop
/// nest assigns these indices for undef values positions.		/// nest assigns these indices for undef values positions.
		anton-afanasyevUnsubmitted Not Done Reply Inline Actions typo: "indeces" anton-afanasyev: typo: "indeces"
/// As an example below Order has two undef positions and they have assigned		/// As an example below Order has two undef positions and they have assigned
/// values 3 and 7 respectively:		/// values 3 and 7 respectively:
/// before: 6 9 5 4 9 2 1 0		/// before: 6 9 5 4 9 2 1 0
/// after: 6 3 5 4 7 2 1 0		/// after: 6 3 5 4 7 2 1 0
/// \returns Fixed ordering.		/// \returns Fixed ordering.
static BoUpSLP::OrdersType fixupOrderingIndices(ArrayRef<unsigned> Order) {		static BoUpSLP::OrdersType fixupOrderingIndices(ArrayRef<unsigned> Order) {
BoUpSLP::OrdersType NewOrder(Order.begin(), Order.end());		BoUpSLP::OrdersType NewOrder(Order.begin(), Order.end());
const unsigned Sz = NewOrder.size();		const unsigned Sz = NewOrder.size();
SmallBitVector UsedIndices(Sz);		SmallBitVector NonUsedIndices(Sz, /t=/true);
SmallVector<int> MaskedIndices;		SmallVector<int> MaskedIndices;
for (int I = 0, E = NewOrder.size(); I < E; ++I) {		for (int I = 0, E = NewOrder.size(); I < E; ++I) {
if (NewOrder[I] < Sz)		if (NewOrder[I] < Sz)
UsedIndices.set(NewOrder[I]);		NonUsedIndices.reset(NewOrder[I]);
else		else
MaskedIndices.push_back(I);		MaskedIndices.push_back(I);
}		}
if (MaskedIndices.empty())		if (MaskedIndices.empty())
return NewOrder;		return NewOrder;
SmallVector<int> AvailableIndices(MaskedIndices.size());		SmallVector<int> AvailableIndices(MaskedIndices.size());
unsigned Cnt = 0;		unsigned Cnt = 0;
int Idx = UsedIndices.find_first();		int Idx = NonUsedIndices.find_first();
do {		do {
AvailableIndices[Cnt] = Idx;		AvailableIndices[Cnt] = Idx;
Idx = UsedIndices.find_next(Idx);		Idx = NonUsedIndices.find_next(Idx);
++Cnt;		++Cnt;
} while (Idx > 0);		} while (Idx > 0);
assert(Cnt == MaskedIndices.size() && "Non-synced masked/available indices.");		assert(Cnt == MaskedIndices.size() && "Non-synced masked/available indices.");
for (int I = 0, E = MaskedIndices.size(); I < E; ++I)		for (int I = 0, E = MaskedIndices.size(); I < E; ++I)
NewOrder[MaskedIndices[I]] = AvailableIndices[I];		NewOrder[MaskedIndices[I]] = AvailableIndices[I];
return NewOrder;		return NewOrder;
}		}

bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,		bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,
unsigned Idx) {		unsigned Idx) {
LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << Chain.size()		LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << Chain.size()
<< "\n");		<< "\n");
const unsigned Sz = R.getVectorElementSize(Chain[0]);		const unsigned Sz = R.getVectorElementSize(Chain[0]);
const unsigned MinVF = R.getMinVecRegSize() / Sz;
unsigned VF = Chain.size();		unsigned VF = Chain.size();

if (!isPowerOf2_32(Sz) \|\| !isPowerOf2_32(VF) \|\| VF < 2 \|\| VF < MinVF)		if (!isPowerOf2_32(Sz) \|\| VF < 2)
		vdmitrieUnsubmitted Not Done Reply Inline Actions PowerOf2Ceil(VF) < MinVF vdmitrie: PowerOf2Ceil(VF) < MinVF
return false;		return false;

		const unsigned MinVF = R.getMinVecRegSize() / Sz;
		SmallVector<Value *, 8> FixedChain;
		unsigned NewSize = PowerOf2Ceil(std::max(VF, MinVF));
		if (NewSize != VF) {
		FixedChain.reserve(NewSize);
		FixedChain.append(Chain.begin(), Chain.end());
		FixedChain.append(NewSize - Chain.size(),
		UndefValue::get(Chain[0]->getType()));
		Chain = FixedChain;
		VF = NewSize;
		}
LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << Idx		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << Idx
<< "\n");		<< "\n");

R.buildTree(Chain);		R.buildTree(Chain);
Optional<ArrayRef<unsigned>> Order = R.bestOrder();		Optional<ArrayRef<unsigned>> Order = R.bestOrder();
// TODO: Handle orders of size less than number of elements in the vector.		// TODO: Handle orders of size less than number of elements in the vector.
if (Order && Order->size() == Chain.size()) {		if (Order && Order->size() == Chain.size()) {
// TODO: reorder tree nodes without tree rebuilding.		// TODO: reorder tree nodes without tree rebuilding.
SmallVector<Value *, 4> ReorderedOps(Chain.size());		SmallVector<Value *, 4> ReorderedOps(Chain.size());
		RKSimonUnsubmitted Done Reply Inline Actions Would SmallBitVector be cheaper for UsedIndices ? RKSimon: Would SmallBitVector be cheaper for UsedIndices ?
transform(fixupOrderingIndices(*Order), ReorderedOps.begin(),		transform(fixupOrderingIndices(*Order), ReorderedOps.begin(),
[Chain](const unsigned Idx) { return Chain[Idx]; });		[Chain](const unsigned Idx) { return Chain[Idx]; });
R.buildTree(ReorderedOps);		R.buildTree(ReorderedOps);
}		}
if (R.isTreeTinyAndNotFullyVectorizable())		if (R.isTreeTinyAndNotFullyVectorizable())
return false;		return false;
if (R.isLoadCombineCandidate())		if (R.isLoadCombineCandidate())
return false;		return false;
Show All 26 Lines	bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
// We may run into multiple chains that merge into a single chain. We mark the		// We may run into multiple chains that merge into a single chain. We mark the
// stores that we vectorized so that we don't visit the same store twice.		// stores that we vectorized so that we don't visit the same store twice.
BoUpSLP::ValueSet VectorizedStores;		BoUpSLP::ValueSet VectorizedStores;
bool Changed = false;		bool Changed = false;

int E = Stores.size();		int E = Stores.size();
SmallBitVector Tails(E, false);		SmallBitVector Tails(E, false);
int MaxIter = MaxStoreLookup.getValue();		int MaxIter = MaxStoreLookup.getValue();
		unsigned MaxVecRegSize = R.getMaxVecRegSize();
		unsigned EltSize = R.getVectorElementSize(Stores.front());

		unsigned MaxElts = PowerOf2Floor(MaxVecRegSize / EltSize);
SmallVector<std::pair<int, int>, 16> ConsecutiveChain(		SmallVector<std::pair<int, int>, 16> ConsecutiveChain(
E, std::make_pair(E, INT_MAX));		E, std::make_pair(E, INT_MAX));
SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E, false));		SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E, false));
int IterCnt;		int IterCnt;
auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt, MaxIter,		auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt, MaxIter,
&CheckedPairs,		&CheckedPairs,
&ConsecutiveChain](int K, int Idx) {		&ConsecutiveChain](int K, int Idx) {
if (IterCnt >= MaxIter)		if (IterCnt >= MaxIter)
Show All 38 Lines	for (int Offset = 1, F = MaxLookDepth; Offset < F; ++Offset)
if ((Idx >= Offset && FindConsecutiveAccess(Idx - Offset, Idx)) \|\|		if ((Idx >= Offset && FindConsecutiveAccess(Idx - Offset, Idx)) \|\|
(Idx + Offset < E && FindConsecutiveAccess(Idx + Offset, Idx)))		(Idx + Offset < E && FindConsecutiveAccess(Idx + Offset, Idx)))
break;		break;
}		}

// Tracks if we tried to vectorize stores starting from the given tail		// Tracks if we tried to vectorize stores starting from the given tail
// already.		// already.
SmallBitVector TriedTails(E, false);		SmallBitVector TriedTails(E, false);
		// Check if we allow masked stores.
		unsigned MinVF =
		std::max<unsigned>(2U, PowerOf2Ceil(R.getMinVecRegSize() / EltSize));
		unsigned MaxVF =
		std::min(R.getMaximumVF(EltSize, Instruction::Store), MaxElts);
		SmallBitVector MaskedStoresSupported(std::max<int>(MaxVF, MinVF) + 1, false);
		for (unsigned I = MinVF; I <= MaxVF; I *= 2) {
		if (TTI->isLegalMaskedStore(
		FixedVectorType::get(Stores.front()->getValueOperand()->getType(),
		I),
		cast<StoreInst>(Stores.front())->getAlign()))
		MaskedStoresSupported.set(I);
		}

// For stores that start but don't end a link in the chain:		// For stores that start but don't end a link in the chain:
for (int Cnt = E; Cnt > 0; --Cnt) {		for (int Cnt = E; Cnt > 0; --Cnt) {
int I = Cnt - 1;		int I = Cnt - 1;
if (ConsecutiveChain[I].first == E \|\| Tails.test(I))		if (ConsecutiveChain[I].first == E \|\| Tails.test(I))
continue;		continue;
// We found a store instr that starts a chain. Now follow the chain and try		// We found a store instr that starts a chain. Now follow the chain and try
// to vectorize it.		// to vectorize it.
BoUpSLP::ValueList Operands;		BoUpSLP::ValueList Operands;
// Collect the chain into a list.		// Collect the chain into a list.
while (I != E && !VectorizedStores.count(Stores[I])) {		while (I != E && !VectorizedStores.count(Stores[I])) {
Operands.push_back(Stores[I]);		Operands.push_back(Stores[I]);
Tails.set(I);		Tails.set(I);
if (ConsecutiveChain[I].second != 1) {		int VF = std::min(
		MaxVF, std::max<unsigned>(MinVF, PowerOf2Ceil(Operands.size())));
		if (((!MaskedStoresSupported.test(VF) \|\|
		Operands.size() < MinNonPow2StoresSize.getValue()) &&
		ConsecutiveChain[I].second != 1) \|\|
		ConsecutiveChain[I].second >= static_cast<int>(MaxVF)) {
// Mark the new end in the chain and go back, if required. It might be		// Mark the new end in the chain and go back, if required. It might be
// required if the original stores come in reversed order, for example.		// required if the original stores come in reversed order, for example.
if (ConsecutiveChain[I].first != E &&		if (ConsecutiveChain[I].first != E &&
Tails.test(ConsecutiveChain[I].first) && !TriedTails.test(I) &&		Tails.test(ConsecutiveChain[I].first) && !TriedTails.test(I) &&
!VectorizedStores.count(Stores[ConsecutiveChain[I].first])) {		!VectorizedStores.count(Stores[ConsecutiveChain[I].first])) {
TriedTails.set(I);		TriedTails.set(I);
Tails.reset(ConsecutiveChain[I].first);		Tails.reset(ConsecutiveChain[I].first);
if (Cnt < ConsecutiveChain[I].first + 2)		if (Cnt < ConsecutiveChain[I].first + 2)
Cnt = ConsecutiveChain[I].first + 2;		Cnt = ConsecutiveChain[I].first + 2;
}		}
break;		break;
}		}
// Move to the next value in the chain.		// Move to the next value in the chain.
I = ConsecutiveChain[I].first;		I = ConsecutiveChain[I].first;
}		}
assert(!Operands.empty() && "Expected non-empty list of stores.");		assert(!Operands.empty() && "Expected non-empty list of stores.");

unsigned MaxVecRegSize = R.getMaxVecRegSize();
unsigned EltSize = R.getVectorElementSize(Operands[0]);
unsigned MaxElts = llvm::PowerOf2Floor(MaxVecRegSize / EltSize);

unsigned MinVF = std::max(2U, R.getMinVecRegSize() / EltSize);
unsigned MaxVF = std::min(R.getMaximumVF(EltSize, Instruction::Store),
MaxElts);

// FIXME: Is division-by-2 the correct step? Should we assert that the		// FIXME: Is division-by-2 the correct step? Should we assert that the
// register size is a power-of-2?		// register size is a power-of-2?
unsigned StartIdx = 0;		unsigned StartIdx = 0;
for (unsigned Size = MaxVF; Size >= MinVF; Size /= 2) {		unsigned E = Operands.size();
for (unsigned Cnt = StartIdx, E = Operands.size(); Cnt + Size <= E;) {		unsigned StartSize =
ArrayRef<Value *> Slice = makeArrayRef(Operands).slice(Cnt, Size);		std::min(MaxVF, std::max<unsigned>(MinVF, PowerOf2Ceil(E)));
		for (unsigned Size = StartSize; Size >= 2; Size /= 2) {
		bool IsLegalMaskedStores =
		MaskedStoresSupported.test(std::max(MinVF, Size));
		if (!IsLegalMaskedStores && Size < MinVF)
		continue;
		for (unsigned Cnt = StartIdx; Cnt + 1 + Size / 2 <= E;) {
		unsigned NumStores = std::min(Size, E - Cnt);
		// Try vectorization only if it is legal.
		if ((IsLegalMaskedStores &&
		NumStores >= MinNonPow2ValuesSize.getValue()) \|\|
		(NumStores >= MinVF && isPowerOf2_32(NumStores))) {
		ArrayRef<Value *> Slice =
		makeArrayRef(Operands).slice(Cnt, NumStores);
if (!VectorizedStores.count(Slice.front()) &&		if (!VectorizedStores.count(Slice.front()) &&
!VectorizedStores.count(Slice.back()) &&		!VectorizedStores.count(Slice.back()) &&
vectorizeStoreChain(Slice, R, Cnt)) {		vectorizeStoreChain(Slice, R, Cnt)) {
// Mark the vectorized stores so that we don't vectorize them again.		// Mark the vectorized stores so that we don't vectorize them again.
VectorizedStores.insert(Slice.begin(), Slice.end());		VectorizedStores.insert(Slice.begin(), Slice.end());
Changed = true;		Changed = true;
// If we vectorized initial block, no need to try to vectorize it		// If we vectorized initial block, no need to try to vectorize it
// again.		// again.
if (Cnt == StartIdx)		if (Cnt == StartIdx)
StartIdx += Size;		StartIdx += Size;
Cnt += Size;		Cnt += Size;
continue;		continue;
}		}
		}
++Cnt;		++Cnt;
}		}
// Check if the whole array was vectorized already - exit.		// Check if the whole array was vectorized already - exit.
if (StartIdx >= Operands.size())		if (StartIdx >= E)
break;		break;
}		}
}		}

return Changed;		return Changed;
}		}

void SLPVectorizerPass::collectSeedInstructions(BasicBlock *BB) {		void SLPVectorizerPass::collectSeedInstructions(BasicBlock *BB) {
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,
// we permit an alternate opcode via InstructionsState.		// we permit an alternate opcode via InstructionsState.
InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (!S.getOpcode())		if (!S.getOpcode())
return false;		return false;

Instruction *I0 = cast<Instruction>(S.OpValue);		Instruction *I0 = cast<Instruction>(S.OpValue);
// Make sure invalid types (including vector type) are rejected before		// Make sure invalid types (including vector type) are rejected before
// determining vectorization factor for scalar instructions.		// determining vectorization factor for scalar instructions.
for (Value *V : VL) {		for (Value *V : VL) {
		vdmitrieUnsubmitted Not Done Reply Inline Actions at 6187 already checked for VL size. vdmitrie: at 6187 already checked for VL size.
Type *Ty = V->getType();		Type *Ty = V->getType();
if (!isa<InsertElementInst>(V) && !isValidElementType(Ty)) {		if (!isa<InsertElementInst>(V) && !isValidElementType(Ty)) {
// NOTE: the following will give user internal llvm type name, which may		// NOTE: the following will give user internal llvm type name, which may
// not be useful.		// not be useful.
R.getORE()->emit([&]() {		R.getORE()->emit([&]() {
std::string type_str;		std::string type_str;
llvm::raw_string_ostream rso(type_str);		llvm::raw_string_ostream rso(type_str);
Ty->print(rso);		Ty->print(rso);
return OptimizationRemarkMissed(SV_NAME, "UnsupportedType", I0)		return OptimizationRemarkMissed(SV_NAME, "UnsupportedType", I0)
<< "Cannot SLP vectorize list: type "		<< "Cannot SLP vectorize list: type "
<< rso.str() + " is unsupported by vectorizer";		<< rso.str() + " is unsupported by vectorizer";
});		});
return false;		return false;
}		}
}		}

		unsigned NumElts = VL.size();
unsigned Sz = R.getVectorElementSize(I0);		unsigned Sz = R.getVectorElementSize(I0);
unsigned MinVF = std::max(2U, R.getMinVecRegSize() / Sz);		unsigned MinVF = std::max(2U, R.getMinVecRegSize() / Sz);
unsigned MaxVF = std::max<unsigned>(PowerOf2Floor(VL.size()), MinVF);		unsigned MaxVF = std::max<unsigned>(NumElts >= MinNonPow2ValuesSize.getValue()
		? PowerOf2Ceil(NumElts)
		: PowerOf2Floor(NumElts),
		MinVF);
MaxVF = std::min(R.getMaximumVF(Sz, S.getOpcode()), MaxVF);		MaxVF = std::min(R.getMaximumVF(Sz, S.getOpcode()), MaxVF);
if (MaxVF < 2) {		if (MaxVF < 2) {
R.getORE()->emit([&]() {		R.getORE()->emit([&]() {
return OptimizationRemarkMissed(SV_NAME, "SmallVF", I0)		return OptimizationRemarkMissed(SV_NAME, "SmallVF", I0)
<< "Cannot SLP vectorize list: vectorization factor "		<< "Cannot SLP vectorize list: vectorization factor "
<< "less than 2 is not supported";		<< "less than 2 is not supported";
});		});
return false;		return false;
}		}

bool Changed = false;		bool Changed = false;
bool CandidateFound = false;		bool CandidateFound = false;
InstructionCost MinCost = SLPCostThreshold.getValue();		InstructionCost MinCost = SLPCostThreshold.getValue();
Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))		if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))
ScalarTy = IE->getOperand(1)->getType();		ScalarTy = IE->getOperand(1)->getType();

unsigned NextInst = 0, MaxInst = VL.size();		SmallVector<Value *> NormalizedVL;
		if (!isa<InsertElementInst>(VL.front()) && MaxVF > VL.size()) {
		NormalizedVL.append(VL.begin(), VL.end());
		NormalizedVL.append(MaxVF - VL.size(), UndefValue::get(I0->getType()));
		VL = NormalizedVL;
		}

		unsigned NextInst = 0, MaxInst = NumElts;
		bool Width3Tried = MaxVF < 4;
for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {		for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {
// No actual vectorization should happen, if number of parts is the same as		// No actual vectorization should happen, if number of parts is the same as
// provided vectorization factor (i.e. the scalar type is used for vector		// provided vectorization factor (i.e. the scalar type is used for vector
// code during codegen).		// code during codegen).
auto *VecTy = FixedVectorType::get(ScalarTy, VF);		auto *VecTy = FixedVectorType::get(ScalarTy, VF);
if (TTI->getNumberOfParts(VecTy) == VF)		if (TTI->getNumberOfParts(VecTy) == VF)
continue;		continue;
		int Width = VF;
		// Try the vectorization factor 4 once again if tried VF 4 already, but try
		// to vectorize bundles of 3 elements. Try VF 2 after bundles size 3.
		if (VF == 2 && !Width3Tried) {
		VF = 4;
		Width = 3;
		Width3Tried = true;
		}
for (unsigned I = NextInst; I < MaxInst; ++I) {		for (unsigned I = NextInst; I < MaxInst; ++I) {
unsigned OpsWidth = 0;		unsigned OpsWidth = 0;

if (I + VF > MaxInst)		if (I + Width > MaxInst)
OpsWidth = MaxInst - I;		OpsWidth = MaxInst - I;
else		else
OpsWidth = VF;		OpsWidth = Width;

if (!isPowerOf2_32(OpsWidth))		if ((Width == 3 && OpsWidth != 3) \|\| (VF > MinVF && OpsWidth <= VF / 2) \|\|
continue;		(VF == MinVF && OpsWidth < 2))

if ((VF > MinVF && OpsWidth <= VF / 2) \|\| (VF == MinVF && OpsWidth < 2))
break;		break;

ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);		ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);
// Check that a previous iteration of this loop did not delete the Value.		// Check that a previous iteration of this loop did not delete the Value.
if (llvm::any_of(Ops, [&R](Value *V) {		if (llvm::any_of(Ops, [&R](Value *V) {
auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
return I && R.isDeleted(I);		return I && R.isDeleted(I);
}))		}))
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "
<< "\n");		<< "\n");
		SmallVector<Value *, 8> FixedChain;
		if (OpsWidth != VF) {
		unsigned NewSize = VF;
		FixedChain.reserve(NewSize);
		FixedChain.append(Ops.begin(), Ops.end());
		FixedChain.append(NewSize - Ops.size(),
		UndefValue::get(Ops[0]->getType()));
		Ops = FixedChain;
		}
		assert(Ops.size() == VF &&
		"Operations must have same size as vectorization factor.");

R.buildTree(Ops);		R.buildTree(Ops);
if (AllowReorder) {		if (AllowReorder) {
Optional<ArrayRef<unsigned>> Order = R.bestOrder();		Optional<ArrayRef<unsigned>> Order = R.bestOrder();
if (Order) {		if (Order) {
		vdmitrieUnsubmitted Not Done Reply Inline Actions if -else bodies are exactly the same. With OpsWidth !=VF there is still possibility to bypass it depending on UserCost and AllowReorder values. It should be either assertion to ensure it never happens or "break". vdmitrie: 1) if -else bodies are exactly the same. 2) With OpsWidth !=VF there is still possibility to…
// TODO: reorder tree nodes without tree rebuilding.		// TODO: reorder tree nodes without tree rebuilding.
SmallVector<Value *, 4> ReorderedOps(Ops.size());		SmallVector<Value *, 4> ReorderedOps(Ops.size());
transform(fixupOrderingIndices(*Order), ReorderedOps.begin(),		transform(fixupOrderingIndices(*Order), ReorderedOps.begin(),
[Ops](const unsigned Idx) { return Ops[Idx]; });		[Ops](const unsigned Idx) { return Ops[Idx]; });
R.buildTree(ReorderedOps);		R.buildTree(ReorderedOps);
}		}
}		}
if (R.isTreeTinyAndNotFullyVectorizable())		if (R.isTreeTinyAndNotFullyVectorizable())
▲ Show 20 Lines • Show All 556 Lines • ▼ Show 20 Lines	while (!Stack.empty()) {
}		}
// I is an extra argument for TreeN (its parent operation).		// I is an extra argument for TreeN (its parent operation).
markExtraArg(Stack.back(), EdgeInst);		markExtraArg(Stack.back(), EdgeInst);
}		}
return true;		return true;
}		}

/// Attempt to vectorize the tree found by matchAssociativeReduction.		/// Attempt to vectorize the tree found by matchAssociativeReduction.
bool tryToReduce(BoUpSLP &V, TargetTransformInfo *TTI) {		bool tryToReduce(BoUpSLP &V, TargetTransformInfo *TTI, const DataLayout &DL) {
// If there are a sufficient number of reduction values, reduce		// If there are a sufficient number of reduction values, extend
// to a nearby power-of-2. We can safely generate oversized		// to a nearby power-of-2. We can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.		// vectors and rely on the backend to split them to legal sizes.
unsigned NumReducedVals = ReducedVals.size();		unsigned NumReducedVals = ReducedVals.size();
if (NumReducedVals < 4)		if (NumReducedVals < 3)
return false;		return false;

// Intersect the fast-math-flags from all reduction operations.		// Intersect the fast-math-flags from all reduction operations.
FastMathFlags RdxFMF;		FastMathFlags RdxFMF;
RdxFMF.set();		RdxFMF.set();
for (ReductionOpsType &RdxOp : ReductionOps) {		for (ReductionOpsType &RdxOp : ReductionOps) {
for (Value *RdxVal : RdxOp) {		for (Value *RdxVal : RdxOp) {
if (auto *FPMO = dyn_cast<FPMathOperator>(RdxVal))		if (auto *FPMO = dyn_cast<FPMathOperator>(RdxVal))
Show All 21 Lines	auto getCmpForMinMaxReduction = [](Instruction *RdxRootInst) {
assert(isa<Instruction>(ScalarCond) &&		assert(isa<Instruction>(ScalarCond) &&
"Expected min/max reduction to have compare condition");		"Expected min/max reduction to have compare condition");
return cast<Instruction>(ScalarCond);		return cast<Instruction>(ScalarCond);
};		};

// The reduction root is used as the insertion point for new instructions,		// The reduction root is used as the insertion point for new instructions,
// so set it as externally used to prevent it from being deleted.		// so set it as externally used to prevent it from being deleted.
ExternallyUsedValues[ReductionRoot];		ExternallyUsedValues[ReductionRoot];
SmallVector<Value *, 16> IgnoreList;		SmallVector<Value *, 16> PostoponedIndicies;
for (ReductionOpsType &RdxOp : ReductionOps)		for (ReductionOpsType &RdxOp : ReductionOps)
IgnoreList.append(RdxOp.begin(), RdxOp.end());		PostoponedIndicies.append(RdxOp.begin(), RdxOp.end());

unsigned ReduxWidth = PowerOf2Floor(NumReducedVals);		unsigned ReduxWidth = PowerOf2Floor(NumReducedVals);
if (NumReducedVals > ReduxWidth) {		if (NumReducedVals > ReduxWidth) {
// In the loop below, we are building a tree based on a window of		// In the loop below, we are building a tree based on a window of
// 'ReduxWidth' values.		// 'ReduxWidth' values.
// If the operands of those values have common traits (compare predicate,		// If the operands of those values have common traits (compare predicate,
// constant operand, etc), then we want to group those together to		// constant operand, etc), then we want to group those together to
// minimize the cost of the reduction.		// minimize the cost of the reduction.
Show All 16 Lines	if (NumReducedVals > ReduxWidth) {
return PredCountMap[PredA] > PredCountMap[PredB];		return PredCountMap[PredA] > PredCountMap[PredB];
}		}
return false;		return false;
});		});
}		}

Value *VectorizedTree = nullptr;		Value *VectorizedTree = nullptr;
unsigned i = 0;		unsigned i = 0;
while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {		ReduxWidth = PowerOf2Ceil(NumReducedVals);
ArrayRef<Value *> VL(&ReducedVals[i], ReduxWidth);		// Try once the non-power-2 vectorization and only if it is unsuccessfull,
V.buildTree(VL, ExternallyUsedValues, IgnoreList);		// try to split it for less power-2 chunks.
		while (
		(ReduxWidth > NumReducedVals \|\| i < NumReducedVals - ReduxWidth + 1) &&
		ReduxWidth > 2) {
		ArrayRef<Value *> VL;
		SmallVector<Value *, 4> NormalizedVL;
		// Still need to normalize to power-of-2 size.
		if (ReduxWidth > NumReducedVals) {
		NormalizedVL.append(&ReducedVals[i],
		&ReducedVals[i] + ReducedVals.size() - i);
		NormalizedVL.append(ReduxWidth - NormalizedVL.size(),
		UndefValue::get(ReducedVals[i]->getType()));
		VL = NormalizedVL;
		} else {
		VL = makeArrayRef(&ReducedVals[i], ReduxWidth);
		}
		V.buildTree(VL, ExternallyUsedValues, PostoponedIndicies);
Optional<ArrayRef<unsigned>> Order = V.bestOrder();		Optional<ArrayRef<unsigned>> Order = V.bestOrder();
if (Order) {		if (Order) {
assert(Order->size() == VL.size() &&		assert(Order->size() == VL.size() &&
"Order size must be the same as number of vectorized "		"Order size must be the same as number of vectorized "
		RKSimonUnsubmitted Done Reply Inline Actions SmallBitVector ? RKSimon: SmallBitVector ?
"instructions.");		"instructions.");
// TODO: reorder tree nodes without tree rebuilding.		// TODO: reorder tree nodes without tree rebuilding.
SmallVector<Value *, 4> ReorderedOps(VL.size());		SmallVector<Value *, 4> ReorderedOps(VL.size());
transform(fixupOrderingIndices(*Order), ReorderedOps.begin(),		transform(fixupOrderingIndices(*Order), ReorderedOps.begin(),
[VL](const unsigned Idx) { return VL[Idx]; });		[VL](const unsigned Idx) { return VL[Idx]; });
V.buildTree(ReorderedOps, ExternallyUsedValues, IgnoreList);		V.buildTree(ReorderedOps, ExternallyUsedValues, PostoponedIndicies);
		}
		if (V.isTreeTinyAndNotFullyVectorizable() \|\|
		V.isLoadCombineReductionCandidate(RdxKind)) {
		// Try with smaller reductions.
		if (ReduxWidth > NumReducedVals) {
		ReduxWidth /= 2;
		continue;
}		}
if (V.isTreeTinyAndNotFullyVectorizable())
break;
if (V.isLoadCombineReductionCandidate(RdxKind))
break;		break;
		}

// For a poison-safe boolean logic reduction, do not replace select		// For a poison-safe boolean logic reduction, do not replace select
// instructions with logic ops. All reduced values will be frozen (see		// instructions with logic ops. All reduced values will be frozen (see
// below) to prevent leaking poison.		// below) to prevent leaking poison.
if (isa<SelectInst>(ReductionRoot) &&		if (isa<SelectInst>(ReductionRoot) &&
isBoolLogicOp(cast<Instruction>(ReductionRoot)) &&		isBoolLogicOp(cast<Instruction>(ReductionRoot)) &&
NumReducedVals != ReduxWidth)		NumReducedVals != ReduxWidth)
break;		break;

V.computeMinimumValueSizes();		V.computeMinimumValueSizes();

// Estimate cost.		// Estimate cost.
InstructionCost TreeCost =		InstructionCost TreeCost =
V.getTreeCost(makeArrayRef(&ReducedVals[i], ReduxWidth));		V.getTreeCost(makeArrayRef(&ReducedVals[i], ReduxWidth));
InstructionCost ReductionCost =		InstructionCost ReductionCost = getReductionCost(
getReductionCost(TTI, ReducedVals[i], ReduxWidth);		TTI, ReducedVals[i],
		ReduxWidth > NumReducedVals ? NumReducedVals : VL.size(), ReduxWidth);
InstructionCost Cost = TreeCost + ReductionCost;		InstructionCost Cost = TreeCost + ReductionCost;
if (!Cost.isValid()) {		if (!Cost.isValid()) {
LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");		LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");
		// Try with smaller reductions.
		if (ReduxWidth > NumReducedVals) {
		ReduxWidth /= 2;
		continue;
		}
return false;		return false;
}		}
if (Cost >= -SLPCostThreshold) {		if (Cost >= -SLPCostThreshold) {
V.getORE()->emit([&]() {		V.getORE()->emit([&]() {
return OptimizationRemarkMissed(SV_NAME, "HorSLPNotBeneficial",		return OptimizationRemarkMissed(SV_NAME, "HorSLPNotBeneficial",
cast<Instruction>(VL[0]))		cast<Instruction>(VL[0]))
<< "Vectorizing horizontal reduction is possible"		<< "Vectorizing horizontal reduction is possible"
<< "but not beneficial with cost " << ore::NV("Cost", Cost)		<< "but not beneficial with cost " << ore::NV("Cost", Cost)
<< " and threshold "		<< " and threshold "
<< ore::NV("Threshold", -SLPCostThreshold);		<< ore::NV("Threshold", -SLPCostThreshold);
});		});
		// Try with smaller reductions.
		if (ReduxWidth > NumReducedVals) {
		ReduxWidth /= 2;
		continue;
		}
break;		break;
}		}

LLVM_DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:"		LLVM_DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:"
<< Cost << ". (HorRdx)\n");		<< Cost << ". (HorRdx)\n");
V.getORE()->emit([&]() {		V.getORE()->emit([&]() {
return OptimizationRemark(SV_NAME, "VectorizedHorizontalReduction",		return OptimizationRemark(SV_NAME, "VectorizedHorizontalReduction",
cast<Instruction>(VL[0]))		cast<Instruction>(VL[0]))
Show All 14 Lines	while (
else		else
Builder.SetInsertPoint(RdxRootInst);		Builder.SetInsertPoint(RdxRootInst);

// To prevent poison from leaking across what used to be sequential, safe,		// To prevent poison from leaking across what used to be sequential, safe,
// scalar boolean logic operations, the reduction operand must be frozen.		// scalar boolean logic operations, the reduction operand must be frozen.
if (isa<SelectInst>(RdxRootInst) && isBoolLogicOp(RdxRootInst))		if (isa<SelectInst>(RdxRootInst) && isBoolLogicOp(RdxRootInst))
VectorizedRoot = Builder.CreateFreeze(VectorizedRoot);		VectorizedRoot = Builder.CreateFreeze(VectorizedRoot);

		// Check if we reduced non-power-2 number of elements and need to extend
		// the scalars with the elements that does not affect the result (0 for
		// add, or, xor, 1 for mul, ~0 for and, min for max and max for min).
		if (ReduxWidth > NumReducedVals) {
		Value *ShuffleOp = nullptr;
		Type *ScalarTy = ReducedVals[i]->getType();
		switch (RdxKind) {
		case RecurKind::Add:
		case RecurKind::Or:
		case RecurKind::FAdd:
		case RecurKind::Xor:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		Constant::getNullValue(ScalarTy));
		break;
		case RecurKind::And:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		Constant::getAllOnesValue(ScalarTy));
		break;
		case RecurKind::Mul:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, 1));
		break;
		case RecurKind::FMul:
		ShuffleOp =
		ConstantVector::getSplat(ElementCount::getFixed(ReduxWidth),
		ConstantFP::get(ScalarTy, 1.0));
		break;
		case RecurKind::UMax:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getMinValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::SMax:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getSignedMinValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::UMin:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getMaxValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::SMin:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantInt::get(ScalarTy, APInt::getSignedMaxValue(
		DL.getTypeSizeInBits(ScalarTy))));
		break;
		case RecurKind::FMax:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantFP::get(ScalarTy,
		APFloat::getLargest(ScalarTy->getFltSemantics(),
		/Negative=/true)));
		break;
		case RecurKind::FMin:
		ShuffleOp = ConstantVector::getSplat(
		ElementCount::getFixed(ReduxWidth),
		ConstantFP::get(ScalarTy,
		APFloat::getLargest(ScalarTy->getFltSemantics(),
		/Negative=/false)));
		break;
		default:
		llvm_unreachable(
		"Expected arithmetic or min/max reduction operation");
		}
		SmallVector<int, 4> Mask(ReduxWidth);
		std::iota(Mask.begin(), Mask.begin() + NumReducedVals, 0);
		std::iota(Mask.begin() + NumReducedVals, Mask.end(), ReduxWidth);
		VectorizedRoot = Builder.CreateShuffleVector(
		VectorizedRoot, ShuffleOp, Mask, "reduction.normalization");
		}

Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);		emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);

if (!VectorizedTree) {		if (!VectorizedTree) {
// Initialize the final value in the reduction.		// Initialize the final value in the reduction.
VectorizedTree = ReducedSubTree;		VectorizedTree = ReducedSubTree;
} else {		} else {
// Update the final value in the reduction.		// Update the final value in the reduction.
Builder.SetCurrentDebugLocation(Loc);		Builder.SetCurrentDebugLocation(Loc);
VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,		VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,
ReducedSubTree, "op.rdx", ReductionOps);		ReducedSubTree, "op.rdx", ReductionOps);
}		}
i += ReduxWidth;		i += ReduxWidth;
		if (ReduxWidth > NumReducedVals)
		ReduxWidth /= 2;
		else
ReduxWidth = PowerOf2Floor(NumReducedVals - i);		ReduxWidth = PowerOf2Floor(NumReducedVals - i);
}		}

if (VectorizedTree) {		if (VectorizedTree) {
// Finish the reduction.		// Finish the reduction.
for (; i < NumReducedVals; ++i) {		for (; i < NumReducedVals; ++i) {
auto *I = cast<Instruction>(ReducedVals[i]);		auto *I = cast<Instruction>(ReducedVals[i]);
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
VectorizedTree =		VectorizedTree =
createOp(Builder, RdxKind, VectorizedTree, I, "", ReductionOps);		createOp(Builder, RdxKind, VectorizedTree, I, "", ReductionOps);
}		}
for (auto &Pair : ExternallyUsedValues) {		for (auto &Pair : ExternallyUsedValues) {
// Add each externally used value to the final reduction.		// Add each externally used value to the final reduction.
for (auto *I : Pair.second) {		for (auto *I : Pair.second) {
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,		VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,
Pair.first, "op.extra", I);		Pair.first, "op.extra", I);
}		}
}		}

ReductionRoot->replaceAllUsesWith(VectorizedTree);		ReductionRoot->replaceAllUsesWith(VectorizedTree);

// Mark all scalar reduction ops for deletion, they are replaced by the		// Mark all scalar reduction ops for deletion, they are replaced by the
// vector reductions.		// vector reductions.
V.eraseInstructions(IgnoreList);		V.eraseInstructions(PostoponedIndicies);
}		}
return VectorizedTree != nullptr;		return VectorizedTree != nullptr;
}		}

		/// Extracts extra argument values to the vector to try to use them as
		/// the vectorization roots.
		SmallVector<Value *, 4> getCopyOfExtraArgValues() const {
		SmallVector<Value *, 4> Args(ExtraArgs.size());
		transform(
		spatelUnsubmitted Not Done Reply Inline Actions Is it necessary to copy these? If so, it would be better to name this function something like "getCopyOfExtraArgValues" to make that explicit. If not, we can just make this a standard 'get' method: const MapVector<Instruction , Value > &getExtraArgs() const { return ExtraArgs; } And then access the 'second' data in the user code? spatel: Is it necessary to copy these? If so, it would be better to name this function something like…
		ABataevAuthorUnsubmitted Done Reply Inline Actions We don't need to expose the `first` element of the `MapVector` here, it is not good from the general design point of view. I'll rename the member function. ABataev: We don't need to expose the `first` element of the `MapVector` here, it is not good from the…
		ExtraArgs, Args.begin(),
		[](const std::pair<Instruction , Value > &P) { return P.second; });
		return Args;
		}

unsigned numReductionValues() const { return ReducedVals.size(); }		unsigned numReductionValues() const { return ReducedVals.size(); }

private:		private:
/// Calculate the cost of a reduction.		/// Calculate the cost of a reduction.
InstructionCost getReductionCost(TargetTransformInfo *TTI,		InstructionCost getReductionCost(TargetTransformInfo *TTI,
Value *FirstReducedVal,		Value *FirstReducedVal,
unsigned ReduxWidth) {		unsigned NumOfScalars, unsigned ReduxWidth) {
Type *ScalarTy = FirstReducedVal->getType();		Type *ScalarTy = FirstReducedVal->getType();
FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth);		FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth);
InstructionCost VectorCost, ScalarCost;		InstructionCost VectorCost, ScalarCost;
switch (RdxKind) {		switch (RdxKind) {
case RecurKind::Add:		case RecurKind::Add:
case RecurKind::Mul:		case RecurKind::Mul:
case RecurKind::Or:		case RecurKind::Or:
case RecurKind::And:		case RecurKind::And:
Show All 30 Lines	case RecurKind::UMin: {
CmpInst::makeCmpResultType(ScalarTy));		CmpInst::makeCmpResultType(ScalarTy));
break;		break;
}		}
default:		default:
llvm_unreachable("Expected arithmetic or min/max reduction operation");		llvm_unreachable("Expected arithmetic or min/max reduction operation");
}		}

// Scalar cost is repeated for N-1 elements.		// Scalar cost is repeated for N-1 elements.
ScalarCost *= (ReduxWidth - 1);		ScalarCost *= (NumOfScalars - 1);
		// Need to reshuffle elements to replace undefs with the real constant
		// values.
		if (NumOfScalars != ReduxWidth)
		VectorCost +=
		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, VectorTy);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << VectorCost - ScalarCost		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << VectorCost - ScalarCost
<< " for reduction that starts with " << *FirstReducedVal		<< " for reduction that starts with " << *FirstReducedVal
<< " (It is a splitting reduction)\n");		<< " (It is a splitting reduction)\n");
return VectorCost - ScalarCost;		return VectorCost - ScalarCost;
}		}

/// Emit a horizontal reduction of the vectorized value.		/// Emit a horizontal reduction of the vectorized value.
Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,		Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
/// attempted.		/// attempted.
/// \returns true if a horizontal reduction was matched and reduced or operands		/// \returns true if a horizontal reduction was matched and reduced or operands
/// of one of the binary instruction were vectorized.		/// of one of the binary instruction were vectorized.
/// \returns false if a horizontal reduction was not matched (or not possible)		/// \returns false if a horizontal reduction was not matched (or not possible)
/// or no vectorization of any binary operation feeding \a Root instruction was		/// or no vectorization of any binary operation feeding \a Root instruction was
/// performed.		/// performed.
static bool tryToVectorizeHorReductionOrInstOperands(		static bool tryToVectorizeHorReductionOrInstOperands(
PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,		PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI,		TargetTransformInfo *TTI, const DataLayout &DL,
const function_ref<bool(Instruction *, BoUpSLP &)> Vectorize) {		const function_ref<bool(Instruction *, BoUpSLP &)> Vectorize) {
if (!ShouldVectorizeHor)		if (!ShouldVectorizeHor)
return false;		return false;

if (!Root)		if (!Root)
return false;		return false;

if (Root->getParent() != BB \|\| isa<PHINode>(Root))		if (Root->getParent() != BB \|\| isa<PHINode>(Root))
return false;		return false;
// Start analysis starting from Root instruction. If horizontal reduction is		// Start analysis starting from Root instruction. If horizontal reduction is
// found, try to vectorize it. If it is not a horizontal reduction or		// found, try to vectorize it. If it is not a horizontal reduction or
// vectorization is not possible or not effective, and currently analyzed		// vectorization is not possible or not effective, and currently analyzed
// instruction is a binary operation, try to vectorize the operands, using		// instruction is a binary operation, try to vectorize the operands, using
// pre-order DFS traversal order. If the operands were not vectorized, repeat		// pre-order DFS traversal order. If the operands were not vectorized, repeat
// the same procedure considering each operand as a possible root of the		// the same procedure considering each operand as a possible root of the
// horizontal reduction.		// horizontal reduction.
// Interrupt the process if the Root instruction itself was vectorized or all		// Interrupt the process if the Root instruction itself was vectorized or all
// sub-trees not higher that RecursionMaxDepth were analyzed/vectorized.		// sub-trees not higher that RecursionMaxDepth were analyzed/vectorized.
// Skip the analysis of CmpInsts.Compiler implements postanalysis of the		// Skip the analysis of CmpInsts. Compiler implements postanalysis of the
// CmpInsts so we can skip extra attempts in		// CmpInsts so we can skip extra attempts in
// tryToVectorizeHorReductionOrInstOperands and save compile time.		// tryToVectorizeHorReductionOrInstOperands and save compile time.
SmallVector<std::pair<Instruction *, unsigned>, 8> Stack(1, {Root, 0});		SmallVector<std::pair<Instruction *, unsigned>, 8> Stack(1, {Root, 0});
SmallPtrSet<Value *, 8> VisitedInstrs;		SmallPtrSet<Value *, 8> VisitedInstrs;
bool Res = false;		bool Res = false;
while (!Stack.empty()) {		while (!Stack.empty()) {
Instruction *Inst;		Instruction *Inst;
unsigned Level;		unsigned Level;
std::tie(Inst, Level) = Stack.pop_back_val();		std::tie(Inst, Level) = Stack.pop_back_val();
// Do not try to analyze instruction that has already been vectorized.		// Do not try to analyze instruction that has already been vectorized.
// This may happen when we vectorize instruction operands on a previous		// This may happen when we vectorize instruction operands on a previous
// iteration while stack was populated before that happened.		// iteration while stack was populated before that happened.
if (R.isDeleted(Inst))		if (R.isDeleted(Inst))
continue;		continue;
Value B0, B1;		Value B0, B1;
bool IsBinop = matchRdxBop(Inst, B0, B1);		bool IsBinop = matchRdxBop(Inst, B0, B1);
bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));		bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));
if (IsBinop \|\| IsSelect) {		if (IsBinop \|\| IsSelect) {
HorizontalReduction HorRdx;		HorizontalReduction HorRdx;
if (HorRdx.matchAssociativeReduction(P, Inst)) {		if (HorRdx.matchAssociativeReduction(P, Inst)) {
if (HorRdx.tryToReduce(R, TTI)) {		if (HorRdx.tryToReduce(R, TTI, DL)) {
Res = true;		Res = true;
// Set P to nullptr to avoid re-analysis of phi node in		// Set P to nullptr to avoid re-analysis of phi node in
// matchAssociativeReduction function unless this is the root node.		// matchAssociativeReduction function unless this is the root node.
P = nullptr;		P = nullptr;
		// Try to vectorize ExtraArgs.
		// Continue analysis for the instruction from the same basic block
		// only to save compile time.
		if (++Level < RecursionMaxDepth)
		for (auto *Op : HorRdx.getCopyOfExtraArgValues())
		if (VisitedInstrs.insert(Op).second)
		if (auto *I = dyn_cast<Instruction>(Op))
		if (!isa<PHINode>(I) && !isa<CmpInst>(I) && !R.isDeleted(I) &&
		I->getParent() == BB)
		Stack.emplace_back(I, Level);
continue;		continue;
}		}
}		}
if (P && IsBinop) {		if (P && IsBinop) {
Inst = dyn_cast<Instruction>(B0);		Inst = dyn_cast<Instruction>(B0);
if (Inst == P)		if (Inst == P)
Inst = dyn_cast<Instruction>(B1);		Inst = dyn_cast<Instruction>(B1);
if (!Inst) {		if (!Inst) {
Show All 37 Lines	if (!I)
return false;		return false;

if (!isa<BinaryOperator>(I))		if (!isa<BinaryOperator>(I))
P = nullptr;		P = nullptr;
// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {		auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {
return tryToVectorize(I, R);		return tryToVectorize(I, R);
};		};
return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI,		return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI, *DL,
ExtraVectorization);		ExtraVectorization);
}		}

bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,		bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,
BasicBlock *BB, BoUpSLP &R) {		BasicBlock *BB, BoUpSLP &R) {
const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();
if (!R.canMapToVector(IVI->getType(), DL))		if (!R.canMapToVector(IVI->getType(), DL))
return false;		return false;
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),
// So allow tryToVectorizeList to reorder them if it is beneficial. This		// So allow tryToVectorizeList to reorder them if it is beneficial. This
// is done when there are exactly two elements since tryToVectorizeList		// is done when there are exactly two elements since tryToVectorizeList
// asserts that there are only two values when AllowReorder is true.		// asserts that there are only two values when AllowReorder is true.
if (NumElts > 1 && tryToVectorizeList(makeArrayRef(IncIt, NumElts), R,		if (NumElts > 1 && tryToVectorizeList(makeArrayRef(IncIt, NumElts), R,
/AllowReorder=/true)) {		/AllowReorder=/true)) {
// Success start over because instructions might have been changed.		// Success start over because instructions might have been changed.
HaveVectorizedPhiNodes = true;		HaveVectorizedPhiNodes = true;
Changed = true;		Changed = true;
} else if (NumElts < 4 &&		} else if ((NumElts == 1 \|\| NumElts < MinNonPow2ValuesSize.getValue()) &&
(Candidates.empty() \|\|		(Candidates.empty() \|\|
Candidates.front()->getType() == (*IncIt)->getType())) {		Candidates.front()->getType() == (*IncIt)->getType())) {
Candidates.append(IncIt, std::next(IncIt, NumElts));		Candidates.append(IncIt, std::next(IncIt, NumElts));
}		}
// Final attempt to vectorize phis with the same types.		// Final attempt to vectorize phis with the same types.
if (SameTypeIt == E \|\| (SameTypeIt)->getType() != (IncIt)->getType()) {		if (SameTypeIt == E \|\| (SameTypeIt)->getType() != (IncIt)->getType()) {
if (Candidates.size() > 1 &&		if (Candidates.size() > 1 &&
tryToVectorizeList(Candidates, R, /AllowReorder=/true)) {		tryToVectorizeList(Candidates, R, /AllowReorder=/true)) {
▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s --check-prefix=DEFAULT			; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s --check-prefix=DEFAULT
	; RUN: opt < %s -slp-schedule-budget=0 -slp-min-tree-size=0 -slp-threshold=-30 -slp-vectorizer -S \| FileCheck %s --check-prefix=GATHER			; RUN: opt < %s -slp-schedule-budget=0 -slp-min-tree-size=0 -slp-threshold=-30 -slp-vectorizer -S \| FileCheck %s --check-prefix=GATHER
	; RUN: opt < %s -slp-schedule-budget=0 -slp-threshold=-30 -slp-vectorizer -S \| FileCheck %s --check-prefix=MAX-COST			; RUN: opt < %s -slp-schedule-budget=0 -slp-threshold=-32 -slp-vectorizer -S \| FileCheck %s --check-prefix=MAX-COST

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	@a = common global [80 x i8] zeroinitializer, align 16			@a = common global [80 x i8] zeroinitializer, align 16

	define void @PR28330(i32 %n) {			define void @PR28330(i32 %n) {
	; DEFAULT-LABEL: @PR28330(			; DEFAULT-LABEL: @PR28330(
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0			; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0
	; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1			; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0			; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0
	; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8			; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0			; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[TMP2:%.*]] = extractelement <4 x i1> [[TMP1]], i32 3			; MAX-COST-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[TMP4:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2
	; MAX-COST-NEXT: [[TMP5:%.*]] = extractelement <4 x i1> [[TMP1]], i32 1
	; MAX-COST-NEXT: [[TMP6:%.*]] = extractelement <4 x i1> [[TMP1]], i32 0
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP3]])			; MAX-COST-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP2]])
	; MAX-COST-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], [[P27]]			; MAX-COST-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], [[P27]]
	; MAX-COST-NEXT: [[TMP9:%.*]] = add i32 [[TMP8]], [[P29]]			; MAX-COST-NEXT: [[TMP5:%.*]] = add i32 [[TMP4]], [[P29]]
	; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP9]], -5			; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP5]], -5
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]			; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]			; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-threshold=-6 -S -pass-remarks-output=%t < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-threshold=-5 -S -pass-remarks-output=%t < %s \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=YAML %s			; RUN: cat %t \| FileCheck -check-prefix=YAML %s


	; FIXME: The threshold is changed to keep this test case a bit smaller.			; FIXME: The threshold is changed to keep this test case a bit smaller.
	; The AArch64 cost model should not give such high costs to select statements.			; The AArch64 cost model should not give such high costs to select statements.

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux"			target triple = "aarch64--linux"
	▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"		target triple = "aarch64--linux-gnu"

define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {		define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {
; CHECK-LABEL: @build_vec_v2i64(		; CHECK-LABEL: @build_vec_v2i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = sub <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP6]], [[TMP3]]
; CHECK-NEXT: ret <2 x i64> [[TMP9]]		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP7]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
		; CHECK-NEXT: ret <2 x i64> [[TMP8]]
;		;
%v0.0 = extractelement <2 x i64> %v0, i32 0		%v0.0 = extractelement <2 x i64> %v0, i32 0
%v0.1 = extractelement <2 x i64> %v0, i32 1		%v0.1 = extractelement <2 x i64> %v0, i32 1
%v1.0 = extractelement <2 x i64> %v1, i32 0		%v1.0 = extractelement <2 x i64> %v1, i32 0
%v1.1 = extractelement <2 x i64> %v1, i32 1		%v1.1 = extractelement <2 x i64> %v1, i32 1
%tmp0.0 = add i64 %v0.0, %v1.0		%tmp0.0 = add i64 %v0.0, %v1.0
%tmp0.1 = add i64 %v0.1, %v1.1		%tmp0.1 = add i64 %v0.1, %v1.1
%tmp1.0 = sub i64 %v0.0, %v1.0		%tmp1.0 = sub i64 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
store i64 %tmp2.0, i64* %c.0, align 8		store i64 %tmp2.0, i64* %c.0, align 8
store i64 %tmp2.1, i64* %c.1, align 8		store i64 %tmp2.1, i64* %c.1, align 8
ret void		ret void
}		}

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: ret <4 x i32> [[TMP9]]		; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: ret <4 x i32> [[TMP10]]
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
%v0.2 = extractelement <4 x i32> %v0, i32 2		%v0.2 = extractelement <4 x i32> %v0, i32 2
%v0.3 = extractelement <4 x i32> %v0, i32 3		%v0.3 = extractelement <4 x i32> %v0, i32 3
%v1.0 = extractelement <4 x i32> %v1, i32 0		%v1.0 = extractelement <4 x i32> %v1, i32 0
%v1.1 = extractelement <4 x i32> %v1, i32 1		%v1.1 = extractelement <4 x i32> %v1, i32 1
%v1.2 = extractelement <4 x i32> %v1, i32 2		%v1.2 = extractelement <4 x i32> %v1, i32 2
Show All 14 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP6]], [[TMP3]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[SHUFFLE2]], <4 x i32> poison, <4 x i32> <i32 2, i32 0, i32 3, i32 1>
		; CHECK-NEXT: ret <4 x i32> [[TMP8]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
%tmp1.1 = sub i32 %v0.1, %v1.1		%tmp1.1 = sub i32 %v0.1, %v1.1
%tmp2.0 = add i32 %tmp0.0, %tmp0.1		%tmp2.0 = add i32 %tmp0.0, %tmp0.1
%tmp2.1 = add i32 %tmp1.0, %tmp1.1		%tmp2.1 = add i32 %tmp1.0, %tmp1.1
%tmp3.0 = insertelement <4 x i32> poison, i32 %tmp2.0, i32 0		%tmp3.0 = insertelement <4 x i32> poison, i32 %tmp2.0, i32 0
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_1(		; CHECK-LABEL: @build_vec_v4i32_reuse_1(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 undef>
; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i32> [[V0]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 1
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0_1]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 6, i32 5, i32 undef>
; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP8:%.*]] = sub <4 x i32> [[TMP5]], [[TMP7]]
; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]		; CHECK-NEXT: ret <4 x i32> [[TMP8]]
; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
%tmp0.3 = xor i32 %v0.1, %v1.1		%tmp0.3 = xor i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %tmp0.0, %tmp0.1		%tmp1.0 = sub i32 %tmp0.0, %tmp0.1
%tmp1.1 = sub i32 %tmp0.0, %tmp0.1		%tmp1.1 = sub i32 %tmp0.0, %tmp0.1
%tmp1.2 = sub i32 %tmp0.2, %tmp0.3		%tmp1.2 = sub i32 %tmp0.2, %tmp0.3
%tmp1.3 = sub i32 %tmp0.3, %tmp0.2		%tmp1.3 = sub i32 %tmp0.3, %tmp0.2
%tmp2.0 = insertelement <4 x i32> poison, i32 %tmp1.0, i32 0		%tmp2.0 = insertelement <4 x i32> poison, i32 %tmp1.0, i32 0
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[V0]], <2 x i32> undef, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[V1]], <2 x i32> undef, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP6:%.*]] = mul <4 x i32> [[SHUFFLE2]], [[SHUFFLE3]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <4 x i32> [[SHUFFLE2]], [[SHUFFLE3]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP3]], [[TMP8]]
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]		; CHECK-NEXT: ret <4 x i32> [[TMP9]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
Show All 10 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]		; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]
; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]		; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])		; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])
; CHECK-NEXT: ret i32 [[TMP15]]		; CHECK-NEXT: ret i32 [[TMP15]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"		target triple = "aarch64--linux-gnu"

define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {		define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {
; CHECK-LABEL: @build_vec_v2i64(		; CHECK-LABEL: @build_vec_v2i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = sub <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP6]], [[TMP3]]
; CHECK-NEXT: ret <2 x i64> [[TMP9]]		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP7]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
		; CHECK-NEXT: ret <2 x i64> [[TMP8]]
;		;
%v0.0 = extractelement <2 x i64> %v0, i32 0		%v0.0 = extractelement <2 x i64> %v0, i32 0
%v0.1 = extractelement <2 x i64> %v0, i32 1		%v0.1 = extractelement <2 x i64> %v0, i32 1
%v1.0 = extractelement <2 x i64> %v1, i32 0		%v1.0 = extractelement <2 x i64> %v1, i32 0
%v1.1 = extractelement <2 x i64> %v1, i32 1		%v1.1 = extractelement <2 x i64> %v1, i32 1
%tmp0.0 = add i64 %v0.0, %v1.0		%tmp0.0 = add i64 %v0.0, %v1.0
%tmp0.1 = add i64 %v0.1, %v1.1		%tmp0.1 = add i64 %v0.1, %v1.1
%tmp1.0 = sub i64 %v0.0, %v1.0		%tmp1.0 = sub i64 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
store i64 %tmp2.0, i64* %c.0, align 8		store i64 %tmp2.0, i64* %c.0, align 8
store i64 %tmp2.1, i64* %c.1, align 8		store i64 %tmp2.1, i64* %c.1, align 8
ret void		ret void
}		}

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: ret <4 x i32> [[TMP9]]		; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: ret <4 x i32> [[TMP10]]
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
%v0.2 = extractelement <4 x i32> %v0, i32 2		%v0.2 = extractelement <4 x i32> %v0, i32 2
%v0.3 = extractelement <4 x i32> %v0, i32 3		%v0.3 = extractelement <4 x i32> %v0, i32 3
%v1.0 = extractelement <4 x i32> %v1, i32 0		%v1.0 = extractelement <4 x i32> %v1, i32 0
%v1.1 = extractelement <4 x i32> %v1, i32 1		%v1.1 = extractelement <4 x i32> %v1, i32 1
%v1.2 = extractelement <4 x i32> %v1, i32 2		%v1.2 = extractelement <4 x i32> %v1, i32 2
Show All 14 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP6]], [[TMP3]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[SHUFFLE2]], <4 x i32> poison, <4 x i32> <i32 2, i32 0, i32 3, i32 1>
		; CHECK-NEXT: ret <4 x i32> [[TMP8]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
%tmp1.1 = sub i32 %v0.1, %v1.1		%tmp1.1 = sub i32 %v0.1, %v1.1
%tmp2.0 = add i32 %tmp0.0, %tmp0.1		%tmp2.0 = add i32 %tmp0.0, %tmp0.1
%tmp2.1 = add i32 %tmp1.0, %tmp1.1		%tmp2.1 = add i32 %tmp1.0, %tmp1.1
%tmp3.0 = insertelement <4 x i32> undef, i32 %tmp2.0, i32 0		%tmp3.0 = insertelement <4 x i32> undef, i32 %tmp2.0, i32 0
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_1(		; CHECK-LABEL: @build_vec_v4i32_reuse_1(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 undef>
; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i32> [[V0]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 1
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0_1]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 6, i32 5, i32 undef>
; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP8:%.*]] = sub <4 x i32> [[TMP5]], [[TMP7]]
; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]		; CHECK-NEXT: ret <4 x i32> [[TMP8]]
; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
%tmp0.3 = xor i32 %v0.1, %v1.1		%tmp0.3 = xor i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %tmp0.0, %tmp0.1		%tmp1.0 = sub i32 %tmp0.0, %tmp0.1
%tmp1.1 = sub i32 %tmp0.0, %tmp0.1		%tmp1.1 = sub i32 %tmp0.0, %tmp0.1
%tmp1.2 = sub i32 %tmp0.2, %tmp0.3		%tmp1.2 = sub i32 %tmp0.2, %tmp0.3
%tmp1.3 = sub i32 %tmp0.3, %tmp0.2		%tmp1.3 = sub i32 %tmp0.3, %tmp0.2
%tmp2.0 = insertelement <4 x i32> undef, i32 %tmp1.0, i32 0		%tmp2.0 = insertelement <4 x i32> undef, i32 %tmp1.0, i32 0
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[V0]], <2 x i32> undef, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[V1]], <2 x i32> undef, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP6:%.*]] = mul <4 x i32> [[SHUFFLE2]], [[SHUFFLE3]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <4 x i32> [[SHUFFLE2]], [[SHUFFLE3]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP3]], [[TMP8]]
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]		; CHECK-NEXT: ret <4 x i32> [[TMP9]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
Show All 10 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]		; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]
; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]		; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])		; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])
; CHECK-NEXT: ret i32 [[TMP15]]		; CHECK-NEXT: ret i32 [[TMP15]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines

	; More complex case where the extracted lanes are directly from a vector			; More complex case where the extracted lanes are directly from a vector
	; register on AArch64 and should be considered free, because we can			; register on AArch64 and should be considered free, because we can
	; directly use the source vector register.			; directly use the source vector register.
	define void @noop_extracts_existing_vector_4_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extracts_existing_vector_4_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extracts_existing_vector_4_lanes(			; CHECK-LABEL: @noop_extracts_existing_vector_4_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 0>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = fmul <4 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[TMP0]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_0]], i32 2			; CHECK-NEXT: [[A_INS_32:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP1]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_1]], i32 3			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: call void @use(double [[TMP2]])
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_0]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP5]], <4 x double> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP3]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x double> [[TMP6]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: [[A_INS_31:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP7]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 4, i32 5, i32 6, i32 7, i32 8>			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[TMP5]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: store <9 x double> [[A_INS_32]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[A_INS_31]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	Show All 18 Lines
	}			}

	; Extracted lanes are not used in the right order, so we cannot reuse the			; Extracted lanes are not used in the right order, so we cannot reuse the
	; source vector registers directly.			; source vector registers directly.
	define void @extracts_jumbled_4_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extracts_jumbled_4_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extracts_jumbled_4_lanes(			; CHECK-LABEL: @extracts_jumbled_4_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <4 x i32> <i32 2, i32 1, i32 2, i32 0>
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[TMP0:%.*]] = fmul <4 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[TMP0]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[A_INS_32:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP1]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[A_LANE_2:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: call void @use(double [[TMP2]])
	; CHECK-NEXT: [[A_LANE_3:%.*]] = fmul double [[V1_LANE_3]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[A_LANE_2]], i32 2			; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[A_LANE_3]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[TMP5]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: store <9 x double> [[A_INS_32]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[A_INS_3]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	Show All 20 Lines

	; Even more complex case where the extracted lanes are directly from a vector			; Even more complex case where the extracted lanes are directly from a vector
	; register on AArch64 and should be considered free, because we can			; register on AArch64 and should be considered free, because we can
	; directly use the source vector register.			; directly use the source vector register.
	define void @noop_extracts_9_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extracts_9_lanes(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extracts_9_lanes(			; CHECK-LABEL: @noop_extracts_9_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <16 x i32> <i32 6, i32 7, i32 8, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V1_LANE_4:%.*]] = extractelement <9 x double> [[V_1]], i32 4
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <16 x i32> <i32 2, i32 1, i32 0, i32 2, i32 1, i32 0, i32 2, i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_3]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_4]], i32 1			; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_5]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x double> poison, double [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_6]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x double> [[TMP3]], double [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_7]], i32 4			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x double> [[TMP4]], double [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_8]], i32 5			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x double> [[TMP5]], double [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_0]], i32 6			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x double> [[TMP6]], double [[TMP1]], i32 4
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_1]], i32 7			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x double> [[TMP7]], double [[TMP2]], i32 5
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_0]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x double> [[TMP8]], double [[TMP0]], i32 6
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x double> [[TMP9]], double [[TMP1]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x double> [[TMP10]], double [[TMP0]], i32 8
	; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 1, i32 0, i32 1, i32 2>			; CHECK-NEXT: [[TMP12:%.*]] = fmul <16 x double> [[SHUFFLE3]], [[TMP11]]
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE2]]			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <16 x double> [[TMP12]], <16 x double> poison, <9 x i32> <i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_INS_54:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP13]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP14:%.*]] = fmul <16 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_INS_73:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP12]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <16 x double> [[TMP14]], <16 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_73]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_82:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP15]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_54]], [[B_INS_82]]
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_7]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x double> [[TMP23]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 1, i32 2, i32 0, i32 1>
	; CHECK-NEXT: [[TMP24:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <8 x double> [[TMP24]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_71:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP25]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_71]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	}			}

	; Extracted lanes used in first fmul chain are not used in the right order, so			; Extracted lanes used in first fmul chain are not used in the right order, so
	; we cannot reuse the source vector registers directly.			; we cannot reuse the source vector registers directly.
	define void @first_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @first_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @first_mul_chain_jumbled(			; CHECK-LABEL: @first_mul_chain_jumbled(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <16 x i32> <i32 6, i32 7, i32 8, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V1_LANE_4:%.*]] = extractelement <9 x double> [[V_1]], i32 4
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <16 x i32> <i32 1, i32 0, i32 2, i32 0, i32 2, i32 1, i32 0, i32 2, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_6]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x double> poison, double [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_5]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x double> [[TMP3]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x double> [[TMP4]], double [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x double> [[TMP5]], double [[TMP1]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x double> [[TMP6]], double [[TMP0]], i32 4
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x double> [[TMP7]], double [[TMP1]], i32 5
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_1]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x double> [[TMP8]], double [[TMP2]], i32 6
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_0]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x double> [[TMP9]], double [[TMP0]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_2]], i32 2			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x double> [[TMP10]], double [[TMP2]], i32 8
	; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 2, i32 0, i32 1, i32 2>			; CHECK-NEXT: [[TMP12:%.*]] = fmul <16 x double> [[SHUFFLE3]], [[TMP11]]
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE2]]			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <16 x double> [[TMP12]], <16 x double> poison, <9 x i32> <i32 4, i32 3, i32 6, i32 5, i32 8, i32 7, i32 1, i32 0, i32 2>
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[A_INS_44:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP13]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP14:%.*]] = fmul <16 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_INS_73:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP12]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <16 x double> [[TMP14]], <16 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_73]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_82:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP15]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_44]], [[B_INS_82]]
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_7]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP21:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <8 x double> [[TMP21]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_71:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP22]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_71]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	}			}

	; Extracted lanes used in both fmul chain are not used in the right order, so			; Extracted lanes used in both fmul chain are not used in the right order, so
	; we cannot reuse the source vector registers directly.			; we cannot reuse the source vector registers directly.
	define void @first_and_second_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @first_and_second_mul_chain_jumbled(<9 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @first_and_second_mul_chain_jumbled(			; CHECK-LABEL: @first_and_second_mul_chain_jumbled(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <9 x double>, <9 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <16 x i32> <i32 7, i32 6, i32 8, i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V1_LANE_4:%.*]] = extractelement <9 x double> [[V_1]], i32 4
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x double> [[V_2]], <4 x double> poison, <16 x i32> <i32 2, i32 1, i32 0, i32 2, i32 0, i32 2, i32 1, i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <9 x double> [[V_1]], <9 x double> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_5]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x double> poison, double [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_6]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x double> [[TMP3]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x double> [[TMP4]], double [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x double> [[TMP5]], double [[TMP1]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x double> [[TMP6]], double [[TMP0]], i32 4
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x double> [[TMP7]], double [[TMP2]], i32 5
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_0]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x double> [[TMP8]], double [[TMP1]], i32 6
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x double> [[TMP9]], double [[TMP0]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x double> [[TMP10]], double [[TMP2]], i32 8
	; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <8 x double> [[TMP10]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 1, i32 2, i32 0, i32 1, i32 2>			; CHECK-NEXT: [[TMP12:%.*]] = fmul <16 x double> [[SHUFFLE3]], [[TMP11]]
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <8 x double> [[TMP7]], [[SHUFFLE2]]			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <16 x double> [[TMP12]], <16 x double> poison, <9 x i32> <i32 4, i32 3, i32 5, i32 6, i32 8, i32 7, i32 1, i32 0, i32 2>
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_INS_44:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP13]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x double> [[TMP11]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>			; CHECK-NEXT: [[TMP14:%.*]] = fmul <16 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[A_INS_73:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP12]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <16 x double> [[TMP14]], <16 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_73]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[B_INS_82:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP15]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_7]], i32 0			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_44]], [[B_INS_82]]
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V1_LANE_6]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x double> [[TMP15]], double [[V1_LANE_1]], i32 3
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x double> [[TMP16]], double [[V1_LANE_0]], i32 4
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> [[TMP17]], double [[V1_LANE_3]], i32 5
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_2]], i32 6
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_5]], i32 7
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x double> [[TMP23]], <8 x double> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 2, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP24:%.*]] = fmul <8 x double> [[TMP20]], [[SHUFFLE]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_2]]
	; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <8 x double> [[TMP24]], <8 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_71:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP25]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_71]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> poison, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> poison, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <3 x i16> [[ARG0:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.*]] = extractelement <3 x i16> [[ARG0]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <3 x i16> [[ARG1:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG1_2:%.*]] = extractelement <3 x i16> [[ARG1]], i64 2
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[INS_11:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>			; GFX8-NEXT: [[INS_12:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_11]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_12]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	%arg1.1 = extractelement <3 x i16> %arg1, i64 1			%arg1.1 = extractelement <3 x i16> %arg1, i64 1
	Show All 25 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP2:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP3:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP2]])
	; GFX8-NEXT: [[INS_32:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_33:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_32]]			; GFX8-NEXT: ret <4 x i16> [[INS_33]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX7-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> undef, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> undef, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: ret <3 x i16> [[INS_2]]			; GFX7-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <3 x i16> [[ARG0:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.*]] = extractelement <3 x i16> [[ARG0]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <3 x i16> [[ARG1:%.]], <3 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG1_2:%.*]] = extractelement <3 x i16> [[ARG1]], i64 2
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[INS_11:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>			; GFX8-NEXT: [[INS_12:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> poison, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_11]], i16 [[ADD_2]], i64 2			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_12]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	%arg1.1 = extractelement <3 x i16> %arg1, i64 1			%arg1.1 = extractelement <3 x i16> %arg1, i64 1
	Show All 25 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP0:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[SHUFFLE]], <2 x i16> [[SHUFFLE1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP2:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP3:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP2]])
	; GFX8-NEXT: [[INS_32:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_33:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_32]]			; GFX8-NEXT: ret <4 x i16> [[INS_33]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=systemz-unknown -mcpu=z13 -slp-vectorizer -S < %s \| FileCheck %s			; RUN: opt -mtriple=systemz-unknown -mcpu=z13 -slp-vectorizer -S < %s \| FileCheck %s

	@bar = external global [4 x [4 x i32]], align 4			@bar = external global [4 x [4 x i32]], align 4
	@dct_luma = external global [4 x [4 x i32]], align 4			@dct_luma = external global [4 x [4 x i32]], align 4

	define void @foo() local_unnamed_addr {			define void @foo() local_unnamed_addr {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD277:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD277:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: store i32 [[ADD277]], i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4			; CHECK-NEXT: store i32 [[ADD277]], i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4
	; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0			; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0
	; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1			; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2), align 4
	; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2			; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 3), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0) to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[ADD277]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP1]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ADD277]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[TMP2]], i32 3			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> poison, [[TMP6]]			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = ashr <4 x i32> [[TMP7]], <i32 6, i32 6, i32 6, i32 6>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> poison, [[TMP5]]
				; CHECK-NEXT: [[TMP7:%.*]] = ashr <4 x i32> [[TMP6]], <i32 6, i32 6, i32 6, i32 6>
	; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3			; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%add277 = add nsw i32 undef, undef			%add277 = add nsw i32 undef, undef
	store i32 %add277, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4			store i32 %add277, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4
	%0 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4			%0 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4
	%sub355 = add nsw i32 undef, %0			%sub355 = add nsw i32 undef, %0
	%shr.i = ashr i32 %sub355, 6			%shr.i = ashr i32 %sub355, 6
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> undef, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	Show All 37 Lines
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP10]], i32 1
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP12:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 undef, i32 undef>
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>			; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 poison, i32 poison>
	; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496			; FORCE_REDUCTION-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP3]])			; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
	Show All 14 Lines
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529			; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
	; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]			; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]
	; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685			; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP2]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = and <2 x i32> [[TMP8]], [[TMP9]]			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = and <2 x i32> [[TMP6]], [[TMP7]]
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP8]], [[TMP9]]			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP6]], [[TMP7]]
	; FORCE_REDUCTION-NEXT: [[TMP12]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], <2 x i32> <i32 0, i32 3>			; FORCE_REDUCTION-NEXT: [[TMP10]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]			%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]
	%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]			%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
				; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
				; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>
	; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SSE-NEXT: [[TMP3:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP2]])
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SSE-NEXT: [[AB6:%.*]] = call float @llvm.floor.f32(float [[A6]])
	; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SSE-NEXT: [[AB7:%.*]] = call float @llvm.floor.f32(float [[A7]])
	; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0			; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R22:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP4]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R22]], float [[AB3]], i32 3
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R51]], float [[AB6]], i32 6
	; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
	; SSE-NEXT: ret <8 x float> [[R71]]			; SSE-NEXT: ret <8 x float> [[R7]]
	;			;
	; SLM-LABEL: @ceil_floor(			; SLM-LABEL: @ceil_floor(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
				; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
				; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
	; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SLM-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SLM-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SLM-NEXT: [[TMP3:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP2]])
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[AB6:%.*]] = call float @llvm.floor.f32(float [[A6]])
	; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SLM-NEXT: [[AB7:%.*]] = call float @llvm.floor.f32(float [[A7]])
	; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0			; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R22:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP4]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R22]], float [[AB3]], i32 3
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SLM-NEXT: [[R51:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R51]], float [[AB6]], i32 6
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R7]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 undef>
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP2]])
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP5:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP4]])
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP6]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; AVX-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R23]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 undef, i32 undef>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: ret <8 x float> [[R71]]			; AVX-NEXT: ret <8 x float> [[R71]]
	;			;
				; CHECK-LABEL: @ceil_floor(
				; CHECK-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
				; CHECK-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
				; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
				; CHECK-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
				; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 undef>
				; CHECK-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP3]])
				; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
				; CHECK-NEXT: [[TMP7:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP6]])
				; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
				; CHECK-NEXT: [[R2:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[R5:%.*]] = shufflevector <8 x float> [[R2]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 undef, i32 undef>
				; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[R5]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
				; CHECK-NEXT: ret <8 x float> [[R7]]
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	%a1 = extractelement <8 x float> %a, i32 1			%a1 = extractelement <8 x float> %a, i32 1
	%a2 = extractelement <8 x float> %a, i32 2			%a2 = extractelement <8 x float> %a, i32 2
	%a3 = extractelement <8 x float> %a, i32 3			%a3 = extractelement <8 x float> %a, i32 3
	%a4 = extractelement <8 x float> %a, i32 4			%a4 = extractelement <8 x float> %a, i32 4
	%a5 = extractelement <8 x float> %a, i32 5			%a5 = extractelement <8 x float> %a, i32 5
	%a6 = extractelement <8 x float> %a, i32 6			%a6 = extractelement <8 x float> %a, i32 6
	%a7 = extractelement <8 x float> %a, i32 7			%a7 = extractelement <8 x float> %a, i32 7
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=AVX

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
				; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
				; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SSE-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>
	; SSE-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SSE-NEXT: [[TMP3:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP2]])
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SSE-NEXT: [[AB6:%.*]] = call float @llvm.floor.f32(float [[A6]])
	; SSE-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SSE-NEXT: [[AB7:%.*]] = call float @llvm.floor.f32(float [[A7]])
	; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0			; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R22:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP4]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R22]], float [[AB3]], i32 3
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R51]], float [[AB6]], i32 6
	; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
	; SSE-NEXT: ret <8 x float> [[R71]]			; SSE-NEXT: ret <8 x float> [[R7]]
	;			;
	; SLM-LABEL: @ceil_floor(			; SLM-LABEL: @ceil_floor(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
				; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
				; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
	; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SLM-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; SLM-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SLM-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SLM-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; SLM-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>
	; SLM-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; SLM-NEXT: [[TMP3:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP2]])
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; SLM-NEXT: [[AB6:%.*]] = call float @llvm.floor.f32(float [[A6]])
	; SLM-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; SLM-NEXT: [[AB7:%.*]] = call float @llvm.floor.f32(float [[A7]])
	; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0			; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
	; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R22:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP4]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R22]], float [[AB3]], i32 3
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; SLM-NEXT: [[R51:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R51]], float [[AB6]], i32 6
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R7]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 undef>
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP2]])
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP5:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP4]])
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP6]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i32 3			; AVX-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R23]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 undef, i32 undef>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>			; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: ret <8 x float> [[R71]]			; AVX-NEXT: ret <8 x float> [[R71]]
	;			;
				; CHECK-LABEL: @ceil_floor(
				; CHECK-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
				; CHECK-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
				; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
				; CHECK-NEXT: [[TMP1:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[SHRINK_SHUFFLE]])
				; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 3, i32 4, i32 5, i32 undef>
				; CHECK-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP3]])
				; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <2 x i32> <i32 6, i32 7>
				; CHECK-NEXT: [[TMP7:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP6]])
				; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
				; CHECK-NEXT: [[R2:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[R5:%.*]] = shufflevector <8 x float> [[R2]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 undef, i32 undef>
				; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[R5]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
				; CHECK-NEXT: ret <8 x float> [[R7]]
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	%a1 = extractelement <8 x float> %a, i32 1			%a1 = extractelement <8 x float> %a, i32 1
	%a2 = extractelement <8 x float> %a, i32 2			%a2 = extractelement <8 x float> %a, i32 2
	%a3 = extractelement <8 x float> %a, i32 3			%a3 = extractelement <8 x float> %a, i32 3
	%a4 = extractelement <8 x float> %a, i32 4			%a4 = extractelement <8 x float> %a, i32 4
	%a5 = extractelement <8 x float> %a, i32 5			%a5 = extractelement <8 x float> %a, i32 5
	%a6 = extractelement <8 x float> %a, i32 6			%a6 = extractelement <8 x float> %a, i32 6
	%a7 = extractelement <8 x float> %a, i32 7			%a7 = extractelement <8 x float> %a, i32 7
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512

define <8 x float> @sitofp_uitofp(<8 x i32> %a) {		define <8 x float> @sitofp_uitofp(<8 x i32> %a) {
; CHECK-LABEL: @sitofp_uitofp(		; CHECK-LABEL: @sitofp_uitofp(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; CHECK-LABEL: @sitofp_4i32_8i16(
		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[SHUFFLE]] to <4 x float>
; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; CHECK-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R72]]		; CHECK-NEXT: ret <8 x float> [[R72]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
Show All 15 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>		; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; SSE-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE]] to <2 x float>
; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; SSE-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE]] to <2 x float>
; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>		; SSE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>		; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>		; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; CHECK-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R53:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R52]], float [[AB6]], i32 6
; CHECK-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R53]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; CHECK-NEXT: ret <8 x float> [[R72]]		; SSE-NEXT: ret <8 x float> [[R7]]
		;
		; SLM-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
		; SLM-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
		; SLM-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; SLM-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; SLM-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; SLM-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; SLM-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
		; SLM-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
		; SLM-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R52]], float [[AB6]], i32 6
		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
		; SLM-NEXT: ret <8 x float> [[R7]]
		;
		; AVX-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; AVX-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; AVX-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
		; AVX-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
		; AVX-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; AVX-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; AVX-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; AVX-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; AVX-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; AVX-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
		; AVX-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
		; AVX-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; AVX-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R52]], float [[AB6]], i32 6
		; AVX-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
		; AVX-NEXT: ret <8 x float> [[R7]]
		;
		; AVX2-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; AVX2-NEXT: [[SHUFFLE3:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; AVX2-NEXT: [[SHUFFLE:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
		; AVX2-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX2-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; AVX2-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX2-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; AVX2-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX2-NEXT: [[TMP8:%.*]] = uitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX2-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x i32> <i32 0, i32 3>
		; AVX2-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX2-NEXT: [[R54:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; AVX2-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX2-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R54]], <8 x float> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
		; AVX2-NEXT: ret <8 x float> [[R72]]
		;
		; AVX512-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; AVX512-NEXT: [[SHUFFLE3:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
		; AVX512-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; AVX512-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX512-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; AVX512-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX512-NEXT: [[TMP8:%.*]] = uitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX512-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x i32> <i32 0, i32 3>
		; AVX512-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: [[R54:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; AVX512-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R54]], <8 x float> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
		; AVX512-NEXT: ret <8 x float> [[R72]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%c0 = extractelement <16 x i8> %c, i32 0		%c0 = extractelement <16 x i8> %c, i32 0
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512

define <8 x float> @sitofp_uitofp(<8 x i32> %a) {		define <8 x float> @sitofp_uitofp(<8 x i32> %a) {
; CHECK-LABEL: @sitofp_uitofp(		; CHECK-LABEL: @sitofp_uitofp(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; CHECK-LABEL: @sitofp_4i32_8i16(
		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[SHUFFLE]] to <4 x float>
; CHECK-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; CHECK-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R72]]		; CHECK-NEXT: ret <8 x float> [[R72]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
Show All 15 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>		; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; CHECK-NEXT: [[TMP4:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; SSE-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x float>		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; CHECK-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x float>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE]] to <2 x float>
; CHECK-NEXT: [[TMP8:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; SSE-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE]] to <2 x float>
; CHECK-NEXT: [[TMP9:%.*]] = sitofp <2 x i8> [[TMP8]] to <2 x float>		; SSE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP10:%.*]] = uitofp <2 x i8> [[TMP8]] to <2 x float>		; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> [[TMP10]], <2 x i32> <i32 0, i32 3>		; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; CHECK-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R53:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R52]], float [[AB6]], i32 6
; CHECK-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R53]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; CHECK-NEXT: ret <8 x float> [[R72]]		; SSE-NEXT: ret <8 x float> [[R7]]
		;
		; SLM-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
		; SLM-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
		; SLM-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; SLM-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; SLM-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; SLM-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; SLM-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
		; SLM-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
		; SLM-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R52]], float [[AB6]], i32 6
		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
		; SLM-NEXT: ret <8 x float> [[R7]]
		;
		; AVX-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; AVX-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; AVX-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
		; AVX-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
		; AVX-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; AVX-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; AVX-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; AVX-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE]] to <2 x float>
		; AVX-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; AVX-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
		; AVX-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
		; AVX-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; AVX-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R52]], float [[AB6]], i32 6
		; AVX-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
		; AVX-NEXT: ret <8 x float> [[R7]]
		;
		; AVX2-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; AVX2-NEXT: [[SHUFFLE3:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; AVX2-NEXT: [[SHUFFLE:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
		; AVX2-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX2-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; AVX2-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX2-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; AVX2-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX2-NEXT: [[TMP8:%.*]] = uitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX2-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x i32> <i32 0, i32 3>
		; AVX2-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX2-NEXT: [[R54:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; AVX2-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX2-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R54]], <8 x float> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
		; AVX2-NEXT: ret <8 x float> [[R72]]
		;
		; AVX512-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
		; AVX512-NEXT: [[SHUFFLE3:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> poison, <2 x i32> <i32 0, i32 1>
		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <16 x i8> [[C:%.]], <16 x i8> poison, <2 x i32> <i32 0, i32 1>
		; AVX512-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; AVX512-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX512-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[SHUFFLE3]] to <2 x float>
		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; AVX512-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX512-NEXT: [[TMP8:%.*]] = uitofp <2 x i8> [[SHUFFLE]] to <2 x float>
		; AVX512-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x i32> <i32 0, i32 3>
		; AVX512-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: [[R54:%.*]] = shufflevector <8 x float> [[R31]], <8 x float> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
		; AVX512-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: [[R72:%.*]] = shufflevector <8 x float> [[R54]], <8 x float> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
		; AVX512-NEXT: ret <8 x float> [[R72]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%c0 = extractelement <16 x i8> %c, i32 0		%c0 = extractelement <16 x i8> %c, i32 0
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2			; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
				; SLM-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[TMP1:%.*]] = fmul <2 x float> [[SHUFFLE]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; AVX-NEXT: ret <4 x float> [[TMP1]]			; AVX-NEXT: ret <4 x float> [[TMP1]]
	;			;
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2			; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
				; SLM-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>			; SLM-NEXT: [[TMP1:%.*]] = fmul <2 x float> [[SHUFFLE]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; AVX-NEXT: ret <4 x float> [[TMP1]]			; AVX-NEXT: ret <4 x float> [[TMP1]]
	;			;
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
}		}

define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_shl_v8i32(		; SSE-LABEL: @ashr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R72:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>		; SSE-NEXT: [[R73:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; SSE-NEXT: ret <8 x i32> [[R72]]		; SSE-NEXT: ret <8 x i32> [[R73]]
;		;
; SLM-LABEL: @ashr_shl_v8i32(		; SLM-LABEL: @ashr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x i32> [[TMP3]]		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP1:%.*]] = ashr <4 x i32> [[SHUFFLE]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP3:%.*]] = shl <4 x i32> [[TMP2]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R72]]		; SSE-NEXT: ret <8 x i32> [[R72]]
;		;
; SLM-LABEL: @ashr_shl_v8i32_const(		; SLM-LABEL: @ashr_shl_v8i32_const(
; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x i32> [[TMP3]]		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; SSE-NEXT: [[TMP3:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef>		; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef>
; SSE-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R72:%.*]] = shufflevector <8 x i32> [[R51]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R72]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: ret <8 x i32> [[TMP3]]
; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: ret <8 x i32> [[TMP3]]
; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: ret <8 x i32> [[TMP3]]
; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: ret <8 x i32> [[R72]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {		define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {
; SSE-LABEL: @sdiv_v8i32_undefs(		; SSE-LABEL: @sdiv_v8i32_undefs(
; SSE-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[TMP1:%.*]] = sdiv <4 x i32> [[SHUFFLE]], <i32 4, i32 4, i32 8, i32 16>
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; SSE-NEXT: [[TMP3:%.*]] = sdiv <4 x i32> [[TMP2]], <i32 4, i32 4, i32 8, i32 16>
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: ret <8 x i32> [[R72]]
; SSE-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; SSE-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; SSE-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; SSE-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; SSE-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
; SSE-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @sdiv_v8i32_undefs(		; SLM-LABEL: @sdiv_v8i32_undefs(
; SLM-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; SLM-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SLM-NEXT: ret <8 x i32> [[TMP1]]
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SLM-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; SLM-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; SLM-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; SLM-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; SLM-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
; SLM-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SLM-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX1-LABEL: @sdiv_v8i32_undefs(		; AVX1-LABEL: @sdiv_v8i32_undefs(
; AVX1-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX1-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX1-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX1-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX1-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8		; AVX1-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; AVX1-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16		; AVX1-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; AVX1-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8		; AVX1-NEXT: [[TMP2:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 4, i32 4, i32 8, i32 16>
; AVX1-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX1-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP3]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: ret <8 x i32> [[R71]]
; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @sdiv_v8i32_undefs(		; AVX2-LABEL: @sdiv_v8i32_undefs(
; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX2-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX2-NEXT: ret <8 x i32> [[TMP1]]
; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>
; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @sdiv_v8i32_undefs(		; AVX512-LABEL: @sdiv_v8i32_undefs(
; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX512-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX512-NEXT: ret <8 x i32> [[TMP1]]
; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>
; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> poison, i32 [[AB1]], i32 1
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX512-NEXT: ret <8 x i32> [[R71]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
}		}

define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_shl_v8i32(		; SSE-LABEL: @ashr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R72:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>		; SSE-NEXT: [[R73:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; SSE-NEXT: ret <8 x i32> [[R72]]		; SSE-NEXT: ret <8 x i32> [[R73]]
;		;
; SLM-LABEL: @ashr_shl_v8i32(		; SLM-LABEL: @ashr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x i32> [[TMP3]]		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP1:%.*]] = ashr <4 x i32> [[SHUFFLE]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP3:%.*]] = shl <4 x i32> [[TMP2]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R72]]		; SSE-NEXT: ret <8 x i32> [[R72]]
;		;
; SLM-LABEL: @ashr_shl_v8i32_const(		; SLM-LABEL: @ashr_shl_v8i32_const(
; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x i32> [[TMP3]]		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; SSE-NEXT: [[TMP3:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef>		; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef>
; SSE-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R72:%.*]] = shufflevector <8 x i32> [[R51]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R72]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: ret <8 x i32> [[TMP3]]
; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX1-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: ret <8 x i32> [[TMP3]]
; AVX1-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX1-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX2-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX2-NEXT: ret <8 x i32> [[TMP3]]
; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX512-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: ret <8 x i32> [[R72]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {		define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {
; SSE-LABEL: @sdiv_v8i32_undefs(		; SSE-LABEL: @sdiv_v8i32_undefs(
; SSE-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; SSE-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[TMP1:%.*]] = sdiv <4 x i32> [[SHUFFLE]], <i32 4, i32 4, i32 8, i32 16>
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; SSE-NEXT: [[TMP3:%.*]] = sdiv <4 x i32> [[TMP2]], <i32 4, i32 4, i32 8, i32 16>
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: ret <8 x i32> [[R72]]
; SSE-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; SSE-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; SSE-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; SSE-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; SSE-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
; SSE-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @sdiv_v8i32_undefs(		; SLM-LABEL: @sdiv_v8i32_undefs(
; SLM-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; SLM-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SLM-NEXT: ret <8 x i32> [[TMP1]]
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SLM-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; SLM-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; SLM-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; SLM-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; SLM-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
; SLM-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SLM-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX1-LABEL: @sdiv_v8i32_undefs(		; AVX1-LABEL: @sdiv_v8i32_undefs(
; AVX1-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX1-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX1-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; AVX1-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX1-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8		; AVX1-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; AVX1-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16		; AVX1-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; AVX1-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8		; AVX1-NEXT: [[TMP2:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 4, i32 4, i32 8, i32 16>
; AVX1-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX1-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP3]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: ret <8 x i32> [[R71]]
; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @sdiv_v8i32_undefs(		; AVX2-LABEL: @sdiv_v8i32_undefs(
; AVX2-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX2-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
; AVX2-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX2-NEXT: ret <8 x i32> [[TMP1]]
; AVX2-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX2-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>
; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1
; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX2-NEXT: ret <8 x i32> [[R71]]
;		;
; AVX512-LABEL: @sdiv_v8i32_undefs(		; AVX512-LABEL: @sdiv_v8i32_undefs(
; AVX512-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; AVX512-NEXT: [[TMP1:%.]] = sdiv <8 x i32> [[A:%.]], <i32 4, i32 4, i32 8, i32 16, i32 4, i32 4, i32 8, i32 16>
; AVX512-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; AVX512-NEXT: ret <8 x i32> [[TMP1]]
; AVX512-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 8, i32 16>
; AVX512-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>
; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 8, i32 16>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> <i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[AB1]], i32 1
; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R32:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP5]], <8 x i32> <i32 undef, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R32]], i32 [[AB5]], i32 5
; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX512-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 undef, i32 5, i32 8, i32 9>
; AVX512-NEXT: ret <8 x i32> [[R71]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0			; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
	; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2			; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
	; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4			; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
	; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5			; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
	; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6			; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
	; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7			; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
	; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0			; SLM-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
	; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2			; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
	; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3			; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
	; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4			; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
	; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5			; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
	; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6			; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
	; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7			; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
	; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; SLM-NEXT: [[TMP1:%.*]] = fdiv <2 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1			; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
	; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0			; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[A3]], i32 1
	; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1			; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
	; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]			; SLM-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[B3]], i32 1
	; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0			; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP3]], [[TMP5]]
	; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1			; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
	; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0			; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[A5]], i32 1
	; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1			; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
	; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]			; SLM-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP9]], double [[B5]], i32 1
	; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0			; SLM-NEXT: [[TMP11:%.*]] = fdiv <2 x double> [[TMP8]], [[TMP10]]
	; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1			; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
	; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0			; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> [[TMP12]], double [[A7]], i32 1
	; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1			; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
	; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]			; SLM-NEXT: [[TMP15:%.*]] = insertelement <2 x double> [[TMP14]], double [[B7]], i32 1
	; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0			; SLM-NEXT: [[TMP16:%.*]] = fdiv <2 x double> [[TMP13]], [[TMP15]]
	; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1			; SLM-NEXT: [[TMP17:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0			; SLM-NEXT: [[R12:%.*]] = shufflevector <8 x double> poison, <8 x double> [[TMP17]], <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1			; SLM-NEXT: [[TMP18:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]			; SLM-NEXT: [[R33:%.*]] = shufflevector <8 x double> [[R12]], <8 x double> [[TMP18]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP19:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R11:%.*]] = shufflevector <8 x double> poison, <8 x double> [[TMP21]], <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R54:%.*]] = shufflevector <8 x double> [[R33]], <8 x double> [[TMP19]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP20:%.*]] = shufflevector <2 x double> [[TMP16]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R32:%.*]] = shufflevector <8 x double> [[R11]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R75:%.*]] = shufflevector <8 x double> [[R54]], <8 x double> [[TMP20]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: ret <8 x double> [[R75]]
	; SLM-NEXT: [[R53:%.*]] = shufflevector <8 x double> [[R32]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R74:%.*]] = shufflevector <8 x double> [[R53]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R74]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX512-NEXT: ret <8 x double> [[TMP1]]			; AVX512-NEXT: ret <8 x double> [[TMP1]]
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

	Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines
	}			}

	define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {			define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
	; SSE-LABEL: @buildvector_div_8f64(			; SSE-LABEL: @buildvector_div_8f64(
	; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: ret <8 x double> [[TMP1]]			; SSE-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; SLM-LABEL: @buildvector_div_8f64(			; SLM-LABEL: @buildvector_div_8f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0			; SLM-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x double> [[A:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
	; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2			; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
	; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4			; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
	; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5			; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
	; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6			; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
	; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7			; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
	; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0			; SLM-NEXT: [[SHUFFLE1:%.]] = shufflevector <8 x double> [[B:%.]], <8 x double> poison, <2 x i32> <i32 0, i32 1>
	; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
	; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2			; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
	; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3			; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
	; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4			; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
	; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5			; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
	; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6			; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
	; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7			; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
	; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; SLM-NEXT: [[TMP1:%.*]] = fdiv <2 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1			; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
	; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0			; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[A3]], i32 1
	; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1			; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
	; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]			; SLM-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[B3]], i32 1
	; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0			; SLM-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP3]], [[TMP5]]
	; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1			; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
	; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0			; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[A5]], i32 1
	; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1			; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
	; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]			; SLM-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP9]], double [[B5]], i32 1
	; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0			; SLM-NEXT: [[TMP11:%.*]] = fdiv <2 x double> [[TMP8]], [[TMP10]]
	; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1			; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
	; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0			; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> [[TMP12]], double [[A7]], i32 1
	; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1			; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
	; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]			; SLM-NEXT: [[TMP15:%.*]] = insertelement <2 x double> [[TMP14]], double [[B7]], i32 1
	; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0			; SLM-NEXT: [[TMP16:%.*]] = fdiv <2 x double> [[TMP13]], [[TMP15]]
	; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1			; SLM-NEXT: [[TMP17:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0			; SLM-NEXT: [[R12:%.*]] = shufflevector <8 x double> undef, <8 x double> [[TMP17]], <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1			; SLM-NEXT: [[TMP18:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]			; SLM-NEXT: [[R33:%.*]] = shufflevector <8 x double> [[R12]], <8 x double> [[TMP18]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP19:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R11:%.*]] = shufflevector <8 x double> undef, <8 x double> [[TMP21]], <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R54:%.*]] = shufflevector <8 x double> [[R33]], <8 x double> [[TMP19]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[TMP20:%.*]] = shufflevector <2 x double> [[TMP16]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R32:%.*]] = shufflevector <8 x double> [[R11]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R75:%.*]] = shufflevector <8 x double> [[R54]], <8 x double> [[TMP20]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: ret <8 x double> [[R75]]
	; SLM-NEXT: [[R53:%.*]] = shufflevector <8 x double> [[R32]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; SLM-NEXT: [[R74:%.*]] = shufflevector <8 x double> [[R53]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x double> [[R74]]
	;			;
	; AVX-LABEL: @buildvector_div_8f64(			; AVX-LABEL: @buildvector_div_8f64(
	; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: ret <8 x double> [[TMP1]]			; AVX-NEXT: ret <8 x double> [[TMP1]]
	;			;
	; AVX512-LABEL: @buildvector_div_8f64(			; AVX512-LABEL: @buildvector_div_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
	; AVX512-NEXT: ret <8 x double> [[TMP1]]			; AVX512-NEXT: ret <8 x double> [[TMP1]]
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	;
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = add i8 %1, %2		%3 = add i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @j(<4 x i8> %x, <4 x i8> %y) {		define i8 @j(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @j(		; CHECK-LABEL: @j(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.]] = mul <4 x i8> [[Y:%.]], [[Y]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[Y1Y1]], [[Y2Y2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> undef, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i32 0
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i32 1
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	;
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = add i8 %1, %2		%3 = add i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @j(<4 x i8> %x, <4 x i8> %y) {		define i8 @j(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @j(		; CHECK-LABEL: @j(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.]] = mul <4 x i8> [[Y:%.]], [[Y]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[Y1Y1]], [[Y2Y2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %y1y1, %y2y2		%2 = add i8 %y1y1, %y2y2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k(<4 x i8> %x) {		define i8 @k(<4 x i8> %x) {
; CHECK-LABEL: @k(		; CHECK-LABEL: @k(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i8> [[TMP1]], i32 0
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i8> [[TMP1]], i32 3
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP9:%.*]] = sdiv i8 [[TMP5]], [[TMP8]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: ret i8 [[TMP9]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> undef, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> undef, <2 x i32> <i32 2, i32 3>
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i32 0
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i32 1
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
	; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4			; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4			; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
	; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]			; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fcmp uno <2 x float> [[TMP2]], [[SHRINK_SHUFFLE]]
	; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]			; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]
	; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> poison, i1 [[C0]], i32 0			; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> poison, i1 [[C0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i1> [[TMP3]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>			; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3			; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3
	; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>			; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>
	; CHECK-NEXT: ret <4 x i32> [[R]]			; CHECK-NEXT: ret <4 x i32> [[R]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%a2 = extractelement <4 x float> %a, i32 2			%a2 = extractelement <4 x float> %a, i32 2
	%a3 = extractelement <4 x float> %a, i32 3			%a3 = extractelement <4 x float> %a, i32 3
	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
	; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4			; CHECK-NEXT: [[B0:%.]] = load float, float [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4			; CHECK-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
	; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]			; CHECK-NEXT: [[C0:%.*]] = fcmp ord float [[A0]], [[B0]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fcmp uno <2 x float> [[TMP2]], [[SHRINK_SHUFFLE]]
	; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]			; CHECK-NEXT: [[C3:%.*]] = fcmp ord float [[A3]], [[B3]]
	; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> undef, i1 [[C0]], i32 0			; CHECK-NEXT: [[D0:%.*]] = insertelement <4 x i1> undef, i1 [[C0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i1> [[TMP3]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>			; CHECK-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3			; CHECK-NEXT: [[D3:%.*]] = insertelement <4 x i1> [[D21]], i1 [[C3]], i32 3
	; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>			; CHECK-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>
	; CHECK-NEXT: ret <4 x i32> [[R]]			; CHECK-NEXT: ret <4 x i32> [[R]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%a2 = extractelement <4 x float> %a, i32 2			%a2 = extractelement <4 x float> %a, i32 2
	%a3 = extractelement <4 x float> %a, i32 3			%a3 = extractelement <4 x float> %a, i32 3
	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i8 [[TMP14]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 13), align 1			; SSE-NEXT: store i8 [[TMP14]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 13), align 1
	; SSE-NEXT: [[TMP15:%.*]] = xor i8 [[A]], [[C]]			; SSE-NEXT: [[TMP15:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: store i8 [[TMP15]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 14), align 1			; SSE-NEXT: store i8 [[TMP15]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 14), align 1
	; SSE-NEXT: [[TMP16:%.*]] = xor i8 [[A]], [[C]]			; SSE-NEXT: [[TMP16:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: store i8 [[TMP16]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1			; SSE-NEXT: store i8 [[TMP16]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @splat(			; AVX-LABEL: @splat(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <16 x i8> [[TMP1]], i8 [[C]], i32 1			; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <16 x i8> [[TMP2]], i8 [[C]], i32 2			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 [[C]], i32 3			; AVX-NEXT: [[TMP3:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 [[C]], i32 4			; AVX-NEXT: [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 [[C]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 [[C]], i32 5			; AVX-NEXT: [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 [[C]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <16 x i8> [[TMP6]], i8 [[C]], i32 6			; AVX-NEXT: [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 [[C]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <16 x i8> [[TMP7]], i8 [[C]], i32 7			; AVX-NEXT: [[TMP7:%.*]] = insertelement <16 x i8> [[TMP6]], i8 [[C]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = insertelement <16 x i8> [[TMP8]], i8 [[C]], i32 8			; AVX-NEXT: [[TMP8:%.*]] = insertelement <16 x i8> [[TMP7]], i8 [[C]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = insertelement <16 x i8> [[TMP9]], i8 [[C]], i32 9			; AVX-NEXT: [[TMP9:%.*]] = insertelement <16 x i8> [[TMP8]], i8 [[C]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <16 x i8> [[TMP10]], i8 [[C]], i32 10			; AVX-NEXT: [[TMP10:%.*]] = insertelement <16 x i8> [[TMP9]], i8 [[C]], i32 7
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <16 x i8> [[TMP11]], i8 [[C]], i32 11			; AVX-NEXT: [[TMP11:%.*]] = insertelement <16 x i8> [[TMP10]], i8 [[C]], i32 8
	; AVX-NEXT: [[TMP13:%.*]] = insertelement <16 x i8> [[TMP12]], i8 [[C]], i32 12			; AVX-NEXT: [[TMP12:%.*]] = insertelement <16 x i8> [[TMP11]], i8 [[C]], i32 9
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <16 x i8> [[TMP13]], i8 [[C]], i32 13			; AVX-NEXT: [[TMP13:%.*]] = insertelement <16 x i8> [[TMP12]], i8 [[C]], i32 10
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <16 x i8> [[TMP14]], i8 [[C]], i32 14			; AVX-NEXT: [[TMP14:%.*]] = insertelement <16 x i8> [[TMP13]], i8 [[C]], i32 11
	; AVX-NEXT: [[TMP16:%.*]] = insertelement <16 x i8> [[TMP15]], i8 [[C]], i32 15			; AVX-NEXT: [[TMP15:%.*]] = insertelement <16 x i8> [[TMP14]], i8 [[C]], i32 12
	; AVX-NEXT: [[TMP17:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0			; AVX-NEXT: [[TMP16:%.*]] = insertelement <16 x i8> [[TMP15]], i8 [[C]], i32 13
	; AVX-NEXT: [[TMP18:%.]] = insertelement <16 x i8> [[TMP17]], i8 [[B:%.]], i32 1			; AVX-NEXT: [[TMP17:%.*]] = insertelement <16 x i8> [[TMP16]], i8 [[C]], i32 14
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP18]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; AVX-NEXT: [[TMP18:%.*]] = insertelement <16 x i8> [[TMP17]], i8 [[C]], i32 15
	; AVX-NEXT: [[TMP19:%.*]] = xor <16 x i8> [[TMP16]], [[SHUFFLE]]			; AVX-NEXT: [[TMP19:%.*]] = xor <16 x i8> [[SHUFFLE]], [[TMP18]]
	; AVX-NEXT: store <16 x i8> [[TMP19]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16			; AVX-NEXT: store <16 x i8> [[TMP19]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%1 = xor i8 %c, %a			%1 = xor i8 %c, %a
	store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16			store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16
	%2 = xor i8 %a, %c			%2 = xor i8 %a, %c
	store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)			store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)
	%3 = xor i8 %a, %c			%3 = xor i8 %a, %c
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP5:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0			; AVX-NEXT: [[TMP5:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 1			; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[A]], i32 2			; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[A]], i32 2
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[A]], i32 3			; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[A]], i32 3
	; AVX-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]			; AVX-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]
	; AVX-NEXT: [[TMP10:%.]] = insertelement <4 x i32> [[TMP5]], i32 [[B:%.]], i32 1			; AVX-NEXT: [[TMP10:%.]] = insertelement <4 x i32> [[TMP5]], i32 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[C]], i32 2			; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[C]], i32 2
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[A]], i32 3			; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[A]], i32 3
	; AVX-NEXT: [[TMP13:%.*]] = xor <4 x i32> [[TMP9]], [[TMP12]]			; AVX-NEXT: [[TMP13:%.*]] = xor <4 x i32> [[TMP12]], [[TMP9]]
	; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16			; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%add1 = add i32 %c, %a			%add1 = add i32 %c, %a
	%add2 = add i32 %c, %a			%add2 = add i32 %c, %a
	%add3 = add i32 %a, %c			%add3 = add i32 %a, %c
	%add4 = add i32 %c, %a			%add4 = add i32 %c, %a
	%1 = xor i32 %add1, %a			%1 = xor i32 %add1, %a
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/compare-reduce.ll

Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	;
ret i32 %r		ret i32 %r
}		}

; Operand/predicate swapping allows forming a reduction, but the		; Operand/predicate swapping allows forming a reduction, but the
; ideal reduction groups all of the original 'sgt' ops together.		; ideal reduction groups all of the original 'sgt' ops together.

define i32 @merge_anyof_v4i32_wrong_middle_better_rdx(<4 x i32> %x, <4 x i32> %y) {		define i32 @merge_anyof_v4i32_wrong_middle_better_rdx(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: @merge_anyof_v4i32_wrong_middle_better_rdx(		; CHECK-LABEL: @merge_anyof_v4i32_wrong_middle_better_rdx(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[Y:%.]], i32 3		; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[X:%.]], i32 3		; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[CMP3WRONG:%.*]] = icmp slt i32 [[TMP2]], [[TMP1]]		; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[X]], [[Y]]		; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
; CHECK-NEXT: [[TMP4:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP3]])		; CHECK-NEXT: [[Y0:%.]] = extractelement <4 x i32> [[Y:%.]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = or i1 [[TMP4]], [[CMP3WRONG]]		; CHECK-NEXT: [[Y1:%.*]] = extractelement <4 x i32> [[Y]], i32 1
; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP5]], i32 -1, i32 1		; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i32> [[Y]], i32 2
		; CHECK-NEXT: [[Y3:%.*]] = extractelement <4 x i32> [[Y]], i32 3
		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[X3]], i32 1
		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[X2]], i32 2
		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[X1]], i32 3
		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[Y3]], i32 4
		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> poison, i32 [[Y0]], i32 0
		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[Y3]], i32 1
		; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[Y2]], i32 2
		; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[Y1]], i32 3
		; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[X3]], i32 4
		; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt <8 x i32> [[TMP5]], [[TMP10]]
		; CHECK-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i1> [[TMP11]], <8 x i1> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 10>
		; CHECK-NEXT: [[TMP12:%.*]] = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> [[REDUCTION_NORMALIZATION]])
		; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP12]], i32 -1, i32 1
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%y0 = extractelement <4 x i32> %y, i32 0		%y0 = extractelement <4 x i32> %y, i32 0
%y1 = extractelement <4 x i32> %y, i32 1		%y1 = extractelement <4 x i32> %y, i32 1
Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 28 Lines
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s

	define i32 @crash_reordering_undefs() {			define i32 @crash_reordering_undefs() {
	; CHECK-LABEL: @crash_reordering_undefs(			; CHECK-LABEL: @crash_reordering_undefs(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[OR0:%.*]] = or i64 undef, undef			; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> poison)
	; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i64 undef, [[OR0]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP0]], undef
	; CHECK-NEXT: [[ADD0:%.*]] = select i1 [[CMP0]], i32 65536, i32 65537			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = add i32 [[OP_EXTRA]], undef
	; CHECK-NEXT: [[ADD1:%.*]] = add i32 undef, [[ADD0]]			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = add i32 [[OP_EXTRA1]], undef
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add i32 [[OP_EXTRA2]], undef
	; CHECK-NEXT: [[ADD2:%.*]] = select i1 [[CMP1]], i32 65536, i32 65537			; CHECK-NEXT: [[OP_EXTRA4:%.*]] = add i32 [[OP_EXTRA3]], undef
	; CHECK-NEXT: [[ADD3:%.*]] = add i32 [[ADD1]], [[ADD2]]			; CHECK-NEXT: ret i32 [[OP_EXTRA4]]
	; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i64 undef, undef
	; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[CMP2]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD5:%.*]] = add i32 [[ADD3]], [[ADD4]]
	; CHECK-NEXT: [[ADD6:%.*]] = add i32 [[ADD5]], undef
	; CHECK-NEXT: [[ADD7:%.*]] = add i32 [[ADD6]], undef
	; CHECK-NEXT: [[ADD8:%.*]] = add i32 [[ADD7]], undef
	; CHECK-NEXT: [[OR1:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i64 undef, [[OR1]]
	; CHECK-NEXT: [[ADD9:%.*]] = select i1 [[CMP3]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD10:%.*]] = add i32 [[ADD8]], [[ADD9]]
	; CHECK-NEXT: [[ADD11:%.*]] = add i32 [[ADD10]], undef
	; CHECK-NEXT: ret i32 [[ADD11]]
	;			;
	entry:			entry:
	%or0 = or i64 undef, undef			%or0 = or i64 undef, undef
	%cmp0 = icmp eq i64 undef, %or0			%cmp0 = icmp eq i64 undef, %or0
	%add0 = select i1 %cmp0, i32 65536, i32 65537			%add0 = select i1 %cmp0, i32 65536, i32 65537
	%add1 = add i32 undef, %add0			%add1 = add i32 undef, %add0
	%cmp1 = icmp eq i64 undef, undef			%cmp1 = icmp eq i64 undef, undef
	%add2 = select i1 %cmp1, i32 65536, i32 65537			%add2 = select i1 %cmp1, i32 65536, i32 65537
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll

	Show All 19 Lines
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[TMP0:%.]], %0* undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[TMP0:%.]], %0* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[TMP0]], %0 undef, i64 0, i32 1, i32 1
	; CHECK-NEXT: br label [[TMP7:%.*]]			; CHECK-NEXT: br label [[TMP7:%.*]]
	; CHECK: [[TMP8:%.]] = phi <2 x double> [ <double 1.800000e+01, double 2.800000e+01>, [[TMP0]] ], [ [[TMP11:%.]], [[TMP21:%.]] ], [ [[TMP11]], [[TMP18:%.]] ], [ [[TMP11]], [[TMP18]] ]			; CHECK: 7:
				; CHECK-NEXT: [[TMP8:%.]] = phi <2 x double> [ <double 1.800000e+01, double 2.800000e+01>, [[TMP0]] ], [ [[TMP11:%.]], [[TMP21:%.]] ], [ [[TMP11]], [[TMP18:%.]] ], [ [[TMP11]], [[TMP18]] ]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP1]] to <2 x double>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP1]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP9]], align 8			; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP9]], align 8
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[TMP3]] to <2 x double>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[TMP3]] to <2 x double>*
	; CHECK-NEXT: [[TMP11]] = load <2 x double>, <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: [[TMP11]] = load <2 x double>, <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: br i1 undef, label [[TMP12:%.]], label [[TMP13:%.]]			; CHECK-NEXT: br i1 undef, label [[TMP12:%.]], label [[TMP13:%.]]
	; CHECK: ret void			; CHECK: 12:
	; CHECK: [[TMP14:%.]] = bitcast double [[TMP5]] to <2 x double>*			; CHECK-NEXT: ret void
				; CHECK: 13:
				; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[TMP5]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: br i1 undef, label [[TMP15:%.]], label [[TMP16:%.]]			; CHECK-NEXT: br i1 undef, label [[TMP15:%.]], label [[TMP16:%.]]
	; CHECK: br label [[TMP16]]			; CHECK: 15:
	; CHECK: br i1 undef, label [[TMP17:%.*]], label [[TMP18]]			; CHECK-NEXT: br label [[TMP16]]
	; CHECK: unreachable			; CHECK: 16:
	; CHECK: [[TMP19:%.*]] = extractelement <2 x double> [[TMP11]], i32 0			; CHECK-NEXT: br i1 undef, label [[TMP17:%.*]], label [[TMP18]]
				; CHECK: 17:
				; CHECK-NEXT: unreachable
				; CHECK: 18:
				; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x double> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x double> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x double> [[TMP11]], i32 1
	; CHECK-NEXT: switch i32 undef, label [[TMP21]] [			; CHECK-NEXT: switch i32 undef, label [[TMP21]] [
	; CHECK-NEXT: i32 32, label [[TMP7]]			; CHECK-NEXT: i32 32, label [[TMP7]]
	; CHECK-NEXT: i32 103, label [[TMP7]]			; CHECK-NEXT: i32 103, label [[TMP7]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: br i1 undef, label [[TMP7]], label [[TMP22:%.*]]			; CHECK: 21:
	; CHECK: unreachable			; CHECK-NEXT: br i1 undef, label [[TMP7]], label [[TMP22:%.*]]
				; CHECK: 22:
				; CHECK-NEXT: unreachable
	;			;
	%1 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0			%1 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0
	%2 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1			%2 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1
	%3 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0			%3 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0
	%4 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1			%4 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1
	%5 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0			%5 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 0
	%6 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1			%6 = getelementptr inbounds %0, %0* undef, i64 0, i32 1, i32 1
	br label %7			br label %7
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 12 Lines

	define i32 @test(double* nocapture %G) {			define i32 @test(double* nocapture %G) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = fmul <4 x double> [[SHUFFLE]], <double 4.000000e+00, double 3.000000e+00, double 4.000000e+00, double poison>
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 2>
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[SHUFFLE1]], <double 1.000000e+00, double 6.000000e+00, double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[MUL11]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <4 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: store <4 x double> [[TMP3]], <4 x double>* [[TMP4]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %G, i64 5			%arrayidx = getelementptr inbounds double, double* %G, i64 5
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	%add = fadd double %mul, 1.000000e+00			%add = fadd double %mul, 1.000000e+00
	store double %add, double* %G, align 8			store double %add, double* %G, align 8
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> [[TMP19]], double [[TMP18]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> [[TMP19]], double [[TMP18]], i32 1
	; CHECK-NEXT: [[TMP21:%.*]] = fadd <2 x double> [[TMP20]], <double 7.000000e+00, double 8.000000e+00>			; CHECK-NEXT: [[TMP21:%.*]] = fadd <2 x double> [[TMP20]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[G]], i64 3			; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP15]] to <2 x double>*			; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP15]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP21]], <2 x double>* [[TMP23]], align 8			; CHECK-NEXT: store <2 x double> [[TMP21]], <2 x double>* [[TMP23]], align 8
	; CHECK-NEXT: br label [[TMP24]]			; CHECK-NEXT: br label [[TMP24]]
	; CHECK: 24:			; CHECK: 24:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
				RKSimonUnsubmitted Not Done Reply Inline Actions The test2 changes look superfluous (maybe precommit them?). RKSimon: The test2 changes look superfluous (maybe precommit them?).
				ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, will do this later ABataev: Yes, will do this later
	;			;
	%1 = icmp eq i32 %k, 0			%1 = icmp eq i32 %k, 0
	%2 = getelementptr inbounds double, double* %G, i64 5			%2 = getelementptr inbounds double, double* %G, i64 5
	%3 = load double, double* %2, align 8			%3 = load double, double* %2, align 8
	%4 = fmul double %3, 4.000000e+00			%4 = fmul double %3, 4.000000e+00
	br i1 %1, label %12, label %5			br i1 %1, label %12, label %5

	; <label>:5 ; preds = %0			; <label>:5 ; preds = %0
	▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/dot-product.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7 -basic-aa -slp-vectorizer -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE42
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX2

;		;
; dot4(float x, float y) - ((x[0]y[0])+(x[1]y[1])+(x[2]y[2])+(x[3]y[3]))		; dot4(float x, float y) - ((x[0]y[0])+(x[1]y[1])+(x[2]y[2])+(x[3]y[3]))
;		;

define double @dotf64(double* dereferenceable(32) %ptrx, double* dereferenceable(32) %ptry) {		define double @dotf64(double* dereferenceable(32) %ptrx, double* dereferenceable(32) %ptry) {
; CHECK-LABEL: @dotf64(		; CHECK-LABEL: @dotf64(
; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds double, double [[PTRX:%.*]], i64 1		; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds double, double [[PTRX:%.*]], i64 1
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	;
%mul1 = fmul double %x1, %y1		%mul1 = fmul double %x1, %y1
%mul2 = fmul double %x2, %y2		%mul2 = fmul double %x2, %y2
%dot01 = fadd fast double %mul0, %mul1		%dot01 = fadd fast double %mul0, %mul1
%dot012 = fadd fast double %dot01, %mul2		%dot012 = fadd fast double %dot01, %mul2
ret double %dot012		ret double %dot012
}		}

define float @dot3f32_fast(float* dereferenceable(16) %ptrx, float* dereferenceable(16) %ptry) {		define float @dot3f32_fast(float* dereferenceable(16) %ptrx, float* dereferenceable(16) %ptry) {
; CHECK-LABEL: @dot3f32_fast(		; SSE-LABEL: @dot3f32_fast(
; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1		; SSE-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1
; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1		; SSE-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1
; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2		; SSE-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2
; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2		; SSE-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2
; CHECK-NEXT: [[X0:%.]] = load float, float [[PTRX]], align 4		; SSE-NEXT: [[X0:%.]] = load float, float [[PTRX]], align 4
; CHECK-NEXT: [[Y0:%.]] = load float, float [[PTRY]], align 4		; SSE-NEXT: [[Y0:%.]] = load float, float [[PTRY]], align 4
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTRX1]] to <2 x float>*		; SSE-NEXT: [[TMP1:%.]] = bitcast float [[PTRX1]] to <2 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4		; SSE-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[PTRY1]] to <2 x float>*		; SSE-NEXT: [[TMP3:%.]] = bitcast float [[PTRY1]] to <2 x float>*
; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4		; SSE-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
; CHECK-NEXT: [[MUL0:%.*]] = fmul float [[X0]], [[Y0]]		; SSE-NEXT: [[MUL0:%.*]] = fmul float [[X0]], [[Y0]]
; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]		; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0		; SSE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
; CHECK-NEXT: [[DOT01:%.*]] = fadd fast float [[MUL0]], [[TMP6]]		; SSE-NEXT: [[DOT01:%.*]] = fadd fast float [[MUL0]], [[TMP6]]
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1		; SSE-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
; CHECK-NEXT: [[DOT012:%.*]] = fadd fast float [[DOT01]], [[TMP7]]		; SSE-NEXT: [[DOT012:%.*]] = fadd fast float [[DOT01]], [[TMP7]]
; CHECK-NEXT: ret float [[DOT012]]		; SSE-NEXT: ret float [[DOT012]]
		;
		; SSE42-LABEL: @dot3f32_fast(
		; SSE42-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1
		; SSE42-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1
		; SSE42-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2
		; SSE42-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2
		; SSE42-NEXT: [[X0:%.]] = load float, float [[PTRX]], align 4
		; SSE42-NEXT: [[Y0:%.]] = load float, float [[PTRY]], align 4
		; SSE42-NEXT: [[TMP1:%.]] = bitcast float [[PTRX1]] to <2 x float>*
		; SSE42-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
		; SSE42-NEXT: [[TMP3:%.]] = bitcast float [[PTRY1]] to <2 x float>*
		; SSE42-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
		; SSE42-NEXT: [[MUL0:%.*]] = fmul float [[X0]], [[Y0]]
		; SSE42-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
		; SSE42-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
		; SSE42-NEXT: [[DOT01:%.*]] = fadd fast float [[MUL0]], [[TMP6]]
		; SSE42-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
		; SSE42-NEXT: [[DOT012:%.*]] = fadd fast float [[DOT01]], [[TMP7]]
		; SSE42-NEXT: ret float [[DOT012]]
		;
		; AVX-LABEL: @dot3f32_fast(
		; AVX-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1
		; AVX-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1
		; AVX-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2
		; AVX-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2
		; AVX-NEXT: [[TMP1:%.]] = bitcast float [[PTRX]] to <4 x float>*
		; AVX-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP1]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
		; AVX-NEXT: [[TMP3:%.]] = bitcast float [[PTRY]] to <4 x float>*
		; AVX-NEXT: [[TMP4:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
		; AVX-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP2]], [[TMP4]]
		; AVX-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 4>
		; AVX-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[REDUCTION_NORMALIZATION]])
		; AVX-NEXT: ret float [[TMP6]]
		;
		; AVX2-LABEL: @dot3f32_fast(
		; AVX2-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1
		; AVX2-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1
		; AVX2-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2
		; AVX2-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2
		; AVX2-NEXT: [[TMP1:%.]] = bitcast float [[PTRX]] to <4 x float>*
		; AVX2-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP1]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
		; AVX2-NEXT: [[TMP3:%.]] = bitcast float [[PTRY]] to <4 x float>*
		; AVX2-NEXT: [[TMP4:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
		; AVX2-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP2]], [[TMP4]]
		; AVX2-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 4>
		; AVX2-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[REDUCTION_NORMALIZATION]])
		; AVX2-NEXT: ret float [[TMP6]]
;		;
%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1		%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1
%ptry1 = getelementptr inbounds float, float* %ptry, i64 1		%ptry1 = getelementptr inbounds float, float* %ptry, i64 1
%ptrx2 = getelementptr inbounds float, float* %ptrx, i64 2		%ptrx2 = getelementptr inbounds float, float* %ptrx, i64 2
%ptry2 = getelementptr inbounds float, float* %ptry, i64 2		%ptry2 = getelementptr inbounds float, float* %ptry, i64 2
%x0 = load float, float* %ptrx, align 4		%x0 = load float, float* %ptrx, align 4
%y0 = load float, float* %ptry, align 4		%y0 = load float, float* %ptry, align 4
%x1 = load float, float* %ptrx1, align 4		%x1 = load float, float* %ptrx1, align 4
▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

define void @fextr2(double* %ptr) {		define void @fextr2(double* %ptr) {
; CHECK-LABEL: @fextr2(		; CHECK-LABEL: @fextr2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32		; CHECK-NEXT: [[LD:%.]] = load <4 x double>, <4 x double> undef, align 32
; CHECK-NEXT: [[V0:%.*]] = extractelement <4 x double> [[LD]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[LD]], <4 x double> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[V1:%.*]] = extractelement <4 x double> [[LD]], i32 1
; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0		; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 0
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0]], i32 0		; CHECK-NEXT: [[TMP0:%.*]] = fadd <2 x double> [[SHUFFLE]], <double 5.500000e+00, double 6.600000e+00>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1]], i32 1		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[P0]] to <2 x double>*
; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 5.500000e+00, double 6.600000e+00>		; CHECK-NEXT: store <2 x double> [[TMP0]], <2 x double>* [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[P0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%LD = load <4 x double>, <4 x double>* undef		%LD = load <4 x double>, <4 x double>* undef
%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.		%V0 = extractelement <4 x double> %LD, i32 0 ; <--- invalid size.
%V1 = extractelement <4 x double> %LD, i32 1		%V1 = extractelement <4 x double> %LD, i32 1
%P0 = getelementptr inbounds double, double* %ptr, i64 0		%P0 = getelementptr inbounds double, double* %ptr, i64 0
%P1 = getelementptr inbounds double, double* %ptr, i64 1		%P1 = getelementptr inbounds double, double* %ptr, i64 1
%A0 = fadd double %V0, 5.5		%A0 = fadd double %V0, 5.5
%A1 = fadd double %V1, 6.6		%A1 = fadd double %V1, 6.6
store double %A0, double* %P0, align 4		store double %A0, double* %P0, align 4
store double %A1, double* %P1, align 4		store double %A1, double* %P1, align 4
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f64_8i8() #0 {		define void @fptosi_8f64_8i8() #0 {
; CHECK-LABEL: @fptosi_8f64_8i8(		; SSE-LABEL: @fptosi_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptosi_8f64_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
		; AVX256NODQ-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
		; AVX256NODQ-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
		; AVX256NODQ-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
		; AVX256NODQ-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
		; AVX256NODQ-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
		; AVX256NODQ-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f64_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f64_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f32_8i8() #0 {		define void @fptosi_8f32_8i8() #0 {
; CHECK-LABEL: @fptosi_8f32_8i8(		; SSE-LABEL: @fptosi_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		; SSE-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		; SSE-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		; SSE-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4		; SSE-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f32_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f32_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f64_8i8() #0 {		define void @fptosi_8f64_8i8() #0 {
; CHECK-LABEL: @fptosi_8f64_8i8(		; SSE-LABEL: @fptosi_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptosi_8f64_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
		; AVX256NODQ-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
		; AVX256NODQ-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
		; AVX256NODQ-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
		; AVX256NODQ-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
		; AVX256NODQ-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
		; AVX256NODQ-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f64_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f64_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f32_8i8() #0 {		define void @fptosi_8f32_8i8() #0 {
; CHECK-LABEL: @fptosi_8f32_8i8(		; SSE-LABEL: @fptosi_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		; SSE-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		; SSE-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		; SSE-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4		; SSE-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX512-LABEL: @fptosi_8f32_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptosi_8f32_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f64_8i8() #0 {		define void @fptoui_8f64_8i8() #0 {
; CHECK-LABEL: @fptoui_8f64_8i8(		; SSE-LABEL: @fptoui_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptoui_8f64_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
		; AVX256NODQ-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
		; AVX256NODQ-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
		; AVX256NODQ-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
		; AVX256NODQ-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
		; AVX256NODQ-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
		; AVX256NODQ-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512F-LABEL: @fptoui_8f64_8i8(
		; AVX512F-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX512F-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
		; AVX512F-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512F-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512F-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptoui_8f64_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f32_8i8() #0 {		define void @fptoui_8f32_8i8() #0 {
; CHECK-LABEL: @fptoui_8f32_8i8(		; SSE-LABEL: @fptoui_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		; SSE-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		; SSE-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		; SSE-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4		; SSE-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptoui_8f32_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
		; AVX256NODQ-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
		; AVX256NODQ-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
		; AVX256NODQ-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
		; AVX256NODQ-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
		; AVX256NODQ-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
		; AVX256NODQ-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
		; AVX256NODQ-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512F-LABEL: @fptoui_8f32_8i8(
		; AVX512F-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX512F-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i8>
		; AVX512F-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX512F-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX512F-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptoui_8f32_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i8> [[TMP2]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; AVX256DQ-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[SHUFFLE]], <16 x i8>* bitcast ([64 x i8]* @dst8 to <16 x i8>*), i32 1, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=haswell < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=haswell < %s -slp-min-non-power2-values-size=2 \| FileCheck %s
	@e = dso_local local_unnamed_addr global i32 0, align 4			@e = dso_local local_unnamed_addr global i32 0, align 4
	@f = dso_local local_unnamed_addr global i32 0, align 4			@f = dso_local local_unnamed_addr global i32 0, align 4

	; Function Attrs: nofree norecurse nounwind uwtable			; Function Attrs: nofree norecurse nounwind uwtable
	define dso_local i32 @g() local_unnamed_addr {			define dso_local i32 @g() local_unnamed_addr {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @e, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @e, align 4
	; CHECK-NEXT: [[TOBOOL_NOT19:%.*]] = icmp eq i32 [[TMP0]], 0			; CHECK-NEXT: [[TOBOOL_NOT19:%.*]] = icmp eq i32 [[TMP0]], 0
	; CHECK-NEXT: br i1 [[TOBOOL_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY:%.]]			; CHECK-NEXT: br i1 [[TOBOOL_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY:%.]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[C_022:%.]] = phi i32 [ [[C_022_BE:%.]], [[WHILE_BODY_BACKEDGE:%.]] ], [ undef, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <4 x i32> [ [[TMP16:%.]], [[WHILE_BODY_BACKEDGE:%.]] ], [ poison, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP14:%.*]], [[WHILE_BODY_BACKEDGE]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP1]], i32 0
	; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 1			; CHECK-NEXT: [[TMP3:%.]] = ptrtoint i32 [[TMP2]] to i64
	; CHECK-NEXT: [[TMP2:%.]] = ptrtoint i32 [[C_022]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32			; CHECK-NEXT: [[TMP5:%.]] = getelementptr i32, <4 x i32> [[TMP1]], <4 x i64> <i64 1, i64 1, i64 1, i64 poison>
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 1, i64 1>			; CHECK-NEXT: switch i32 [[TMP4]], label [[WHILE_BODY_BACKEDGE]] [
	; CHECK-NEXT: switch i32 [[TMP3]], label [[WHILE_BODY_BACKEDGE]] [
	; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]			; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]
	; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]			; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: sw.bb:			; CHECK: sw.bb:
	; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = extractelement <4 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP6:%.]] = ptrtoint i32 [[TMP5]] to i64			; CHECK-NEXT: [[TMP7:%.]] = ptrtoint i32 [[TMP6]] to i64
	; CHECK-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[TMP7]] to i32
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[TMP9:%.]] = extractelement <4 x i32> [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP9:%.]] = extractelement <2 x i32> [[TMP4]], i32 1			; CHECK-NEXT: store i32 [[TMP8]], i32* [[TMP9]], align 4
	; CHECK-NEXT: store i32 [[TMP7]], i32* [[TMP9]], align 4			; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, <4 x i32> [[TMP1]], <4 x i64> <i64 2, i64 2, i64 2, i64 poison>
	; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: sw.bb6:			; CHECK: sw.bb6:
	; CHECK-NEXT: [[INCDEC_PTR8:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2			; CHECK-NEXT: [[TMP11:%.]] = extractelement <4 x i32> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP10:%.]] = ptrtoint i32 [[INCDEC_PTR]] to i64			; CHECK-NEXT: [[TMP12:%.]] = ptrtoint i32 [[TMP11]] to i64
	; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP10]] to i32			; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[TMP12]] to i32
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[TMP14:%.]] = getelementptr i32, <4 x i32> [[TMP1]], <4 x i64> <i64 2, i64 2, i64 2, i64 poison>
	; CHECK-NEXT: [[TMP13:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP15:%.]] = extractelement <4 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4			; CHECK-NEXT: store i32 [[TMP13]], i32* [[TMP15]], align 4
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: while.body.backedge:			; CHECK: while.body.backedge:
	; CHECK-NEXT: [[C_022_BE]] = phi i32* [ [[INCDEC_PTR]], [[WHILE_BODY]] ], [ [[INCDEC_PTR8]], [[SW_BB6]] ], [ [[INCDEC_PTR5]], [[SW_BB]] ]			; CHECK-NEXT: [[TMP16]] = phi <4 x i32*> [ [[TMP5]], [[WHILE_BODY]] ], [ [[TMP14]], [[SW_BB6]] ], [ [[TMP10]], [[SW_BB]] ]
	; CHECK-NEXT: [[TMP14]] = phi <2 x i32*> [ [[TMP4]], [[WHILE_BODY]] ], [ [[TMP12]], [[SW_BB6]] ], [ [[TMP8]], [[SW_BB]] ]
	; CHECK-NEXT: br label [[WHILE_BODY]]			; CHECK-NEXT: br label [[WHILE_BODY]]
	; CHECK: while.end:			; CHECK: while.end:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* @e, align 4			%0 = load i32, i32* @e, align 4
	%tobool.not19 = icmp eq i32 %0, 0			%tobool.not19 = icmp eq i32 %0, 0
	br i1 %tobool.not19, label %while.end, label %while.body			br i1 %tobool.not19, label %while.end, label %while.body
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

Show First 20 Lines • Show All 790 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i32 [[TMP24]], i32* @var, align 8		; SSE-NEXT: store i32 [[TMP24]], i32* @var, align 8
; SSE-NEXT: ret i32 [[TMP23]]		; SSE-NEXT: ret i32 [[TMP23]]
;		;
; AVX-LABEL: @maxi8_mutiple_uses(		; AVX-LABEL: @maxi8_mutiple_uses(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX-NEXT: [[TMP6:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])		; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP7]], [[TMP5]]
; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]		; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP7]], i32 [[TMP5]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]		; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]		; AVX-NEXT: store i32 [[TMP8]], i32* @var, align 8
; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; AVX-NEXT: store i32 [[TMP14]], i32* @var, align 8
; AVX-NEXT: ret i32 [[OP_EXTRA1]]		; AVX-NEXT: ret i32 [[OP_EXTRA1]]
;		;
; AVX2-LABEL: @maxi8_mutiple_uses(		; AVX2-LABEL: @maxi8_mutiple_uses(
; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX2-NEXT: [[TMP6:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX2-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX2-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
; AVX2-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])		; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP7]], [[TMP5]]
; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]		; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP7]], i32 [[TMP5]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]		; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; AVX2-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]		; AVX2-NEXT: store i32 [[TMP8]], i32* @var, align 8
; AVX2-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; AVX2-NEXT: store i32 [[TMP14]], i32* @var, align 8
; AVX2-NEXT: ret i32 [[OP_EXTRA1]]		; AVX2-NEXT: ret i32 [[OP_EXTRA1]]
;		;
; THRESH-LABEL: @maxi8_mutiple_uses(		; THRESH-LABEL: @maxi8_mutiple_uses(
; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16		; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0		; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1		; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
; THRESH-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
; THRESH-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; THRESH-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP3]], i32 [[TMP4]]
; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; THRESH-NEXT: [[TMP7:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])		; THRESH-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP7]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; THRESH-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP6]]		; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
; THRESH-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP6]]		; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP8]], [[TMP6]]
; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP10]], i32 0		; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP8]], i32 [[TMP6]]
; THRESH-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP3]], i32 1		; THRESH-NEXT: [[TMP9:%.*]] = select i1 [[TMP5]], i32 3, i32 4
; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0		; THRESH-NEXT: store i32 [[TMP9]], i32* @var, align 8
; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP4]], i32 1
; THRESH-NEXT: [[TMP15:%.*]] = icmp sgt <2 x i32> [[TMP12]], [[TMP14]]
; THRESH-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP15]], <2 x i32> [[TMP12]], <2 x i32> [[TMP14]]
; THRESH-NEXT: [[TMP17:%.*]] = extractelement <2 x i32> [[TMP16]], i32 1
; THRESH-NEXT: [[TMP18:%.*]] = extractelement <2 x i32> [[TMP16]], i32 0
; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP18]], [[TMP17]]
; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP18]], i32 [[TMP17]]
; THRESH-NEXT: [[TMP19:%.*]] = extractelement <2 x i1> [[TMP15]], i32 1
; THRESH-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 3, i32 4
; THRESH-NEXT: store i32 [[TMP20]], i32* @var, align 8
; THRESH-NEXT: ret i32 [[OP_EXTRA1]]		; THRESH-NEXT: ret i32 [[OP_EXTRA1]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
Show All 36 Lines
; DEFAULT-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; DEFAULT-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
; DEFAULT-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; DEFAULT-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; DEFAULT-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; DEFAULT-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; DEFAULT-NEXT: [[TMP18:%.*]] = select i1 [[TMP10]], i32 3, i32 4		; DEFAULT-NEXT: [[TMP18:%.*]] = select i1 [[TMP10]], i32 3, i32 4
; DEFAULT-NEXT: store i32 [[TMP18]], i32* @var, align 8		; DEFAULT-NEXT: store i32 [[TMP18]], i32* @var, align 8
; DEFAULT-NEXT: ret i32 [[TMP17]]		; DEFAULT-NEXT: ret i32 [[TMP17]]
;		;
; THRESH-LABEL: @maxi8_mutiple_uses2(		; THRESH-LABEL: @maxi8_mutiple_uses2(
; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16		; THRESH-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([32 x i32]* @arr to <4 x i32>*), align 1
; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0		; THRESH-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <4 x i32> <i32 0, i32 1, i32 2, i32 4>
; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1		; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[REDUCTION_NORMALIZATION]])
		; THRESH-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4
; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]		; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
; THRESH-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP3]], i32 [[TMP4]]		; THRESH-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP3]], i32 [[TMP4]]
; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16
; THRESH-NEXT: [[TMP8:%.*]] = icmp sgt i32 [[TMP6]], [[TMP7]]		; THRESH-NEXT: [[TMP8:%.*]] = icmp sgt i32 [[TMP6]], [[TMP7]]
; THRESH-NEXT: [[TMP9:%.*]] = select i1 [[TMP8]], i32 [[TMP6]], i32 [[TMP7]]		; THRESH-NEXT: [[TMP9:%.*]] = select i1 [[TMP8]], i32 [[TMP6]], i32 [[TMP7]]
; THRESH-NEXT: [[TMP10:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; THRESH-NEXT: [[TMP10:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
; THRESH-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP9]], [[TMP10]]		; THRESH-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP9]], [[TMP10]]
; THRESH-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP9]], i32 [[TMP10]]		; THRESH-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP9]], i32 [[TMP10]]
; THRESH-NEXT: [[TMP13:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; THRESH-NEXT: [[TMP13:%.*]] = select i1 [[TMP5]], i32 3, i32 4
; THRESH-NEXT: [[TMP14:%.*]] = icmp sgt i32 [[TMP12]], [[TMP13]]		; THRESH-NEXT: store i32 [[TMP13]], i32* @var, align 8
; THRESH-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP12]], i32 [[TMP13]]		; THRESH-NEXT: ret i32 [[TMP12]]
; THRESH-NEXT: [[TMP16:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
; THRESH-NEXT: [[TMP17:%.*]] = icmp sgt i32 [[TMP15]], [[TMP16]]
; THRESH-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], i32 [[TMP15]], i32 [[TMP16]]
; THRESH-NEXT: [[TMP19:%.*]] = select i1 [[TMP11]], i32 3, i32 4
; THRESH-NEXT: store i32 [[TMP19]], i32* @var, align 8
; THRESH-NEXT: ret i32 [[TMP18]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
;		;
; AVX-LABEL: @maxi8_wrong_parent(		; AVX-LABEL: @maxi8_wrong_parent(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX-NEXT: br label [[PP:%.*]]		; AVX-NEXT: br label [[PP:%.*]]
; AVX: pp:		; AVX: pp:
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX-NEXT: [[TMP6:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])		; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP7]], [[TMP5]]
; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]		; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP7]], i32 [[TMP5]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]
; AVX-NEXT: ret i32 [[OP_EXTRA1]]		; AVX-NEXT: ret i32 [[OP_EXTRA1]]
;		;
; AVX2-LABEL: @maxi8_wrong_parent(		; AVX2-LABEL: @maxi8_wrong_parent(
; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX2-NEXT: br label [[PP:%.*]]		; AVX2-NEXT: br label [[PP:%.*]]
; AVX2: pp:		; AVX2: pp:
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX2-NEXT: [[TMP6:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX2-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX2-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
; AVX2-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])		; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP7]], [[TMP5]]
; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]		; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP7]], i32 [[TMP5]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
; AVX2-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; AVX2-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]
; AVX2-NEXT: ret i32 [[OP_EXTRA1]]		; AVX2-NEXT: ret i32 [[OP_EXTRA1]]
;		;
; THRESH-LABEL: @maxi8_wrong_parent(		; THRESH-LABEL: @maxi8_wrong_parent(
; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16		; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0		; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1		; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]		; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
; THRESH-NEXT: br label [[PP:%.*]]		; THRESH-NEXT: br label [[PP:%.*]]
; THRESH: pp:		; THRESH: pp:
; THRESH-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; THRESH-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP3]], i32 [[TMP4]]
; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; THRESH-NEXT: [[TMP7:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <8 x i32>*), i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; THRESH-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP7]], <8 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])		; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]		; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP8]], [[TMP6]]
; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]		; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP8]], i32 [[TMP6]]
; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> poison, i1 [[TMP12]], i32 0
; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1
; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0
; THRESH-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1
; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
; THRESH-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP4]], i32 1
; THRESH-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]
; THRESH-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1
; THRESH-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0
; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP21]], [[TMP20]]
; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP21]], i32 [[TMP20]]
; THRESH-NEXT: ret i32 [[OP_EXTRA1]]		; THRESH-NEXT: ret i32 [[OP_EXTRA1]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
br label %pp		br label %pp

pp:		pp:
▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	;
%mh = tail call i8 @llvm.umin.i8(i8 %mfeba, i8 %mdc98)		%mh = tail call i8 @llvm.umin.i8(i8 %mfeba, i8 %mdc98)
%m = tail call i8 @llvm.umin.i8(i8 %mh, i8 %ml)		%m = tail call i8 @llvm.umin.i8(i8 %mh, i8 %ml)
ret i8 %m		ret i8 %m
}		}

; This should not crash.		; This should not crash.

define void @PR49730() {		define void @PR49730() {
; CHECK-LABEL: @PR49730(		; SSE-LABEL: @PR49730(
; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)		; SSE-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]		; SSE-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]
; CHECK-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef		; SSE-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])		; SSE-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])		; SSE-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])
; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)		; SSE-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)
; CHECK-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)		; SSE-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX-LABEL: @PR49730(
		; AVX-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
		; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 undef, i32 undef, i32 undef>
		; AVX-NEXT: [[TMP2:%.*]] = sub nsw <8 x i32> poison, [[SHUFFLE]]
		; AVX-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 10>
		; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
		; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 undef)
		; AVX-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 93)
		; AVX-NEXT: ret void
		;
		; AVX2-LABEL: @PR49730(
		; AVX2-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
		; AVX2-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 undef, i32 undef, i32 undef>
		; AVX2-NEXT: [[TMP2:%.*]] = sub nsw <8 x i32> poison, [[SHUFFLE]]
		; AVX2-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 10>
		; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
		; AVX2-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 undef)
		; AVX2-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 93)
		; AVX2-NEXT: ret void
		;
		; THRESH-LABEL: @PR49730(
		; THRESH-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
		; THRESH-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 undef, i32 undef, i32 undef>
		; THRESH-NEXT: [[TMP2:%.*]] = sub nsw <8 x i32> poison, [[SHUFFLE]]
		; THRESH-NEXT: [[REDUCTION_NORMALIZATION:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 10>
		; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[REDUCTION_NORMALIZATION]])
		; THRESH-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 undef)
		; THRESH-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 93)
		; THRESH-NEXT: ret void
;		;
%t = call i32 @llvm.smin.i32(i32 undef, i32 2)		%t = call i32 @llvm.smin.i32(i32 undef, i32 2)
%t1 = sub nsw i32 undef, %t		%t1 = sub nsw i32 undef, %t
%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)		%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)
%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)		%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)
%t4 = sub nsw i32 undef, %t3		%t4 = sub nsw i32 undef, %t3
%t5 = call i32 @llvm.umin.i32(i32 %t2, i32 %t4)		%t5 = call i32 @llvm.umin.i32(i32 %t2, i32 %t4)
%t6 = call i32 @llvm.smin.i32(i32 undef, i32 1)		%t6 = call i32 @llvm.smin.i32(i32 undef, i32 1)
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE6:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[SHUFFLE7:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE8:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <2 x i32> [[SHUFFLE6]], zeroinitializer
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[SHRINK_SHUFFLE]], zeroinitializer
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP1]], <2 x float> [[SHUFFLE7]], <2 x float> [[SHUFFLE8]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[SHUFFLE1]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE4:%.*]] = shufflevector <4 x float> [[SHUFFLE3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> [[SHRINK_SHUFFLE4]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer		; CHECK-NEXT: [[RB9:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP5]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1		; CHECK-NEXT: [[RD5:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer		; CHECK-NEXT: ret <4 x float> [[RD5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RB2:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP17]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x float> %rc, float %s3, i32 3		%rd = insertelement <4 x float> %rc, float %s3, i32 3
call void @v4f32_user(<4 x float> %rd) #0		call void @v4f32_user(<4 x float> %rd) #0
ret <4 x float> %rd		ret <4 x float> %rd
}		}

; Unused insertelement		; Unused insertelement
define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_no_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_no_users(		; CHECK-LABEL: @simple_select_no_users(
; CHECK-NEXT: [[C0:%.]] = extractelement <4 x i32> [[C:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE6:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C1:%.*]] = extractelement <4 x i32> [[C]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[C]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[C2:%.*]] = extractelement <4 x i32> [[C]], i32 2		; CHECK-NEXT: [[SHUFFLE7:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[C3:%.*]] = extractelement <4 x i32> [[C]], i32 3		; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE8:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1		; CHECK-NEXT: [[SHUFFLE3:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2		; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <2 x i32> [[SHUFFLE6]], zeroinitializer
; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i32> [[SHRINK_SHUFFLE]], zeroinitializer
; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = select <2 x i1> [[TMP1]], <2 x float> [[SHUFFLE7]], <2 x float> [[SHUFFLE8]]
; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2		; CHECK-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[SHUFFLE1]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3		; CHECK-NEXT: [[SHRINK_SHUFFLE4:%.*]] = shufflevector <4 x float> [[SHUFFLE3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP2]], <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> [[SHRINK_SHUFFLE4]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer		; CHECK-NEXT: [[RB9:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP5]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1		; CHECK-NEXT: [[RD5:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer		; CHECK-NEXT: ret <4 x float> [[RD5]]
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RB2:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP17]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
		RKSimonUnsubmitted Not Done Reply Inline Actions There isn't a ANY check-prefix atm (it was cleaned out in rG119e4550ddedc75e4 as part of the unused prefix cleanup) - please can you review? RKSimon: There isn't a ANY check-prefix atm (it was cleaned out in rG119e4550ddedc75e4 as part of the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, need to remove it, I think. Most probably, caused but not quite clean merge. ABataev: Yes, need to remove it, I think. Most probably, caused but not quite clean merge.
%b3 = extractelement <4 x float> %b, i32 3		%b3 = extractelement <4 x float> %b, i32 3
%cmp0 = icmp ne i32 %c0, 0		%cmp0 = icmp ne i32 %c0, 0
%cmp1 = icmp ne i32 %c1, 0		%cmp1 = icmp ne i32 %c1, 0
%cmp2 = icmp ne i32 %c2, 0		%cmp2 = icmp ne i32 %c2, 0
%cmp3 = icmp ne i32 %c3, 0		%cmp3 = icmp ne i32 %c3, 0
%s0 = select i1 %cmp0, float %a0, float %b0		%s0 = select i1 %cmp0, float %a0, float %b0
%s1 = select i1 %cmp1, float %a1, float %b1		%s1 = select i1 %cmp1, float %a1, float %b1
%s2 = select i1 %cmp2, float %a2, float %b2		%s2 = select i1 %cmp2, float %a2, float %b2
▲ Show 20 Lines • Show All 307 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0		; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1		; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP1]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]
; CHECK-NEXT: [[I11:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP3]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I11]], float [[X2]], i32 2
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0		%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%x0 = load float, float* %gep0		%x0 = load float, float* %gep0
%x1 = load float, float* %gep1		%x1 = load float, float* %gep1
%x2 = load float, float* %gep2		%x2 = load float, float* %gep2
%i0 = insertelement <4 x float> poison, float %x0, i32 0		%i0 = insertelement <4 x float> poison, float %x0, i32 0
Show All 29 Lines
; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16		; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16
; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*		; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*
; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8		; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8
; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32		; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float		; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> poison, float [[T6]], i32 0		; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> poison, float [[T6]], i32 0
; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32		; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[T8]], i32 0
; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[T4]], i32 1
; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i32>
; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32		; CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <2 x float>
; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>
; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2		; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]
; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3
; CHECK-NEXT: ret <4 x float> [[T15]]
;		;
%t0 = bitcast <4 x float>* %x to i64*		%t0 = bitcast <4 x float>* %x to i64*
%t1 = load i64, i64* %t0, align 16		%t1 = load i64, i64* %t0, align 16
%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%t3 = bitcast float* %t2 to i64*		%t3 = bitcast float* %t2 to i64*
%t4 = load i64, i64* %t3, align 8		%t4 = load i64, i64* %t3, align 8
%t5 = trunc i64 %t1 to i32		%t5 = trunc i64 %t1 to i32
%t6 = bitcast i32 %t5 to float		%t6 = bitcast i32 %t5 to float
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0		; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1		; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP1]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]
; CHECK-NEXT: [[I11:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP3]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I11]], float [[X2]], i32 2
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0		%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%x0 = load float, float* %gep0		%x0 = load float, float* %gep0
%x1 = load float, float* %gep1		%x1 = load float, float* %gep1
%x2 = load float, float* %gep2		%x2 = load float, float* %gep2
%i0 = insertelement <4 x float> undef, float %x0, i32 0		%i0 = insertelement <4 x float> undef, float %x0, i32 0
Show All 29 Lines
; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16		; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16
; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2		; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*		; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*
; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8		; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8
; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32		; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float		; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> undef, float [[T6]], i32 0		; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> undef, float [[T6]], i32 0
; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32		; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[T8]], i32 0
; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[T4]], i32 1
; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i32>
; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32		; CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <2 x float>
; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>
; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2		; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]
; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3
; CHECK-NEXT: ret <4 x float> [[T15]]
;		;
%t0 = bitcast <4 x float>* %x to i64*		%t0 = bitcast <4 x float>* %x to i64*
%t1 = load i64, i64* %t0, align 16		%t1 = load i64, i64* %t0, align 16
%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%t3 = bitcast float* %t2 to i64*		%t3 = bitcast float* %t2 to i64*
%t4 = load i64, i64* %t3, align 8		%t4 = load i64, i64* %t3, align 8
%t5 = trunc i64 %t1 to i32		%t5 = trunc i64 %t1 to i32
%t6 = bitcast i32 %t5 to float		%t6 = bitcast i32 %t5 to float
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

	Show All 31 Lines
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[IDX6:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 6			; CHECK-NEXT: [[IDX6:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 6
	; CHECK-NEXT: [[IDX7:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 7			; CHECK-NEXT: [[IDX7:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP13]], [[TMP10]]			; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP10]], [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8			; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0			; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0			; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0
	; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0			; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0
	; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0			; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0
	; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1			; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
	; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2			; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2
	; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2			; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2
	; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1			; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1
	; CHECK-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8			; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8
	; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8			; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8
	; CHECK-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*
				; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8			; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8
	; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8			; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[A1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[B2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = fsub fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[A2]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP8]], [[TMP1]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP9]], [[TMP6]]
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
	; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1			; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: store double [[A1]], double* [[EXT1:%.*]], align 8			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
				; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%IdxA0 = getelementptr inbounds double, double* %A, i64 0			%IdxA0 = getelementptr inbounds double, double* %A, i64 0
	%IdxB0 = getelementptr inbounds double, double* %B, i64 0			%IdxB0 = getelementptr inbounds double, double* %B, i64 0
	%IdxC0 = getelementptr inbounds double, double* %C, i64 0			%IdxC0 = getelementptr inbounds double, double* %C, i64 0
	%IdxD0 = getelementptr inbounds double, double* %D, i64 0			%IdxD0 = getelementptr inbounds double, double* %D, i64 0

	▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines

	; Same as @ChecksExtractScores, but the extratelement vector operands do not match.			; Same as @ChecksExtractScores, but the extratelement vector operands do not match.
	define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {			define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {
	; CHECK-LABEL: @ChecksExtractScores_different_vectors(			; CHECK-LABEL: @ChecksExtractScores_different_vectors(
	; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0			; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
	; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1			; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4			; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4			; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
	; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0			; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
	; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1			; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
	; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4			; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4			; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
	; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0			; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
	; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1			; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP6]], double [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP10]], double [[EXTRB1]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP9]], double [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP2]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP7]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP5]], [[TMP12]]
	; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0			; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
	; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1			; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
	; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[SIDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[SIDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

	Show All 12 Lines
	; zero-extend the roots back to their original sizes.			; zero-extend the roots back to their original sizes.
	;			;
	define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {			define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {
	; CHECK-LABEL: @PR31243_zext(			; CHECK-LABEL: @PR31243_zext(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>			; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i32>
	; CHECK-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i8> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i64			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP_5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
				; CHECK-NEXT: [[TMP_5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP_6:%.]] = load i8, i8 [[TMP_4]], align 1			; CHECK-NEXT: [[TMP_6:%.]] = load i8, i8 [[TMP_4]], align 1
	; CHECK-NEXT: [[TMP_7:%.]] = load i8, i8 [[TMP_5]], align 1			; CHECK-NEXT: [[TMP_7:%.]] = load i8, i8 [[TMP_5]], align 1
	; CHECK-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]			; CHECK-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]
	; CHECK-NEXT: ret i8 [[TMP_8]]			; CHECK-NEXT: ret i8 [[TMP_8]]
	;			;
	entry:			entry:
	%tmp_0 = zext i8 %v0 to i32			%tmp_0 = zext i8 %v0 to i32
	%tmp_1 = zext i8 %v1 to i32			%tmp_1 = zext i8 %v1 to i32
	Show All 34 Lines
	; SSE-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]			; SSE-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; SSE-NEXT: ret i8 [[TMP8]]			; SSE-NEXT: ret i8 [[TMP8]]
	;			;
	; AVX-LABEL: @PR31243_sext(			; AVX-LABEL: @PR31243_sext(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0			; AVX-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0
	; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1			; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>			; AVX-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i16>			; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP3]], i32 0			; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = sext i16 [[TMP4]] to i64			; AVX-NEXT: [[TMP5:%.*]] = sext i32 [[TMP4]] to i64
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]			; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <2 x i16> [[TMP3]], i32 1			; AVX-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = sext i16 [[TMP6]] to i64			; AVX-NEXT: [[TMP7:%.*]] = sext i32 [[TMP6]] to i64
	; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]			; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]
	; AVX-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1			; AVX-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1
	; AVX-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1			; AVX-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1
	; AVX-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]			; AVX-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; AVX-NEXT: ret i8 [[TMP8]]			; AVX-NEXT: ret i8 [[TMP8]]
	;			;
	entry:			entry:
	%tmp0 = sext i8 %v0 to i32			%tmp0 = sext i8 %v0 to i32
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/no_alternate_divrem.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -slp-threshold=-200 -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -slp-threshold=-250 -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -S -slp-min-non-power2-stores-size=1 -slp-min-non-power2-values-size=1 \| FileCheck %s

	define void @test_add_sdiv(i32 %arr1, i32 %arr2, i32 %a0, i32 %a1, i32 %a2, i32 %a3) {			define void @test_add_sdiv(i32 %arr1, i32 %arr2, i32 %a0, i32 %a1, i32 %a2, i32 %a3) {
	; CHECK-LABEL: @test_add_sdiv(			; CHECK-LABEL: @test_add_sdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0			; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0
	; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1			; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1
	; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2			; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2
	; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3			; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3
	; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0			; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0
	; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1			; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1
	; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2			; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2
	; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3			; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3
	; CHECK-NEXT: [[V0:%.]] = load i32, i32 [[GEP1_0]]			; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]], align 4
	; CHECK-NEXT: [[V1:%.]] = load i32, i32 [[GEP1_1]]			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[GEP1_0]] to <4 x i32>*
	; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]]			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 undef>
	; CHECK-NEXT: [[Y0:%.]] = add nsw i32 [[A0:%.]], 1146
	; CHECK-NEXT: [[Y1:%.]] = add nsw i32 [[A1:%.]], 146
	; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42			; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42
	; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
	; CHECK-NEXT: [[RES0:%.*]] = add nsw i32 [[V0]], [[Y0]]			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A1:%.]], i32 1
	; CHECK-NEXT: [[RES1:%.*]] = add nsw i32 [[V1]], [[Y1]]			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <4 x i32> [[TMP4]], <i32 1146, i32 146, i32 0, i32 poison>
	; CHECK-NEXT: [[RES2:%.*]] = sdiv i32 [[V2]], [[Y2]]			; CHECK-NEXT: [[RES2:%.*]] = sdiv i32 [[V2]], [[Y2]]
	; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[SHUFFLE]], [[TMP5]]
	; CHECK-NEXT: store i32 [[RES0]], i32* [[GEP2_0]]			; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]], align 4
	; CHECK-NEXT: store i32 [[RES1]], i32* [[GEP2_1]]			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 2>
	; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]]			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[GEP2_0]] to <4 x i32>*
	; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]]			; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[SHUFFLE1]], <4 x i32>* [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 false, i1 true>)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep1.0 = getelementptr i32, i32* %arr1, i32 0			%gep1.0 = getelementptr i32, i32* %arr1, i32 0
	%gep1.1 = getelementptr i32, i32* %arr1, i32 1			%gep1.1 = getelementptr i32, i32* %arr1, i32 1
	%gep1.2 = getelementptr i32, i32* %arr1, i32 2			%gep1.2 = getelementptr i32, i32* %arr1, i32 2
	%gep1.3 = getelementptr i32, i32* %arr1, i32 3			%gep1.3 = getelementptr i32, i32* %arr1, i32 3
	%gep2.0 = getelementptr i32, i32* %arr2, i32 0			%gep2.0 = getelementptr i32, i32* %arr2, i32 0
	Show All 32 Lines
	; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0			; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0
	; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1			; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1
	; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2			; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2
	; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3			; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3
	; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0			; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0
	; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1			; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1
	; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2			; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2
	; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3			; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3
	; CHECK-NEXT: [[V0:%.]] = load i32, i32 [[GEP1_0]]			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[GEP1_0]] to <4 x i32>*
	; CHECK-NEXT: [[V1:%.]] = load i32, i32 [[GEP1_1]]			; CHECK-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP0]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x i32> undef)
	; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 0>
	; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]]			; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]], align 4
	; CHECK-NEXT: [[Y0:%.]] = add nsw i32 [[A0:%.]], 1146			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
	; CHECK-NEXT: [[Y1:%.]] = add nsw i32 [[A1:%.]], 146			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A1:%.]], i32 1
	; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A2:%.]], i32 2
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <4 x i32> [[TMP4]], <i32 1146, i32 146, i32 42, i32 poison>
				; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 0>
	; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0			; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0
	; CHECK-NEXT: [[RES0:%.*]] = urem i32 [[V0]], [[Y0]]			; CHECK-NEXT: [[TMP6:%.*]] = urem <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[RES1:%.*]] = urem i32 [[V1]], [[Y1]]
	; CHECK-NEXT: [[RES2:%.*]] = urem i32 [[V2]], [[Y2]]
	; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]			; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]
	; CHECK-NEXT: store i32 [[RES0]], i32* [[GEP2_0]]			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[GEP2_0]] to <4 x i32>*
	; CHECK-NEXT: store i32 [[RES1]], i32* [[GEP2_1]]			; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[TMP6]], <4 x i32>* [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>)
	; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]]			; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]], align 4
	; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep1.0 = getelementptr i32, i32* %arr1, i32 0			%gep1.0 = getelementptr i32, i32* %arr1, i32 0
	%gep1.1 = getelementptr i32, i32* %arr1, i32 1			%gep1.1 = getelementptr i32, i32* %arr1, i32 1
	%gep1.2 = getelementptr i32, i32* %arr1, i32 2			%gep1.2 = getelementptr i32, i32* %arr1, i32 2
	%gep1.3 = getelementptr i32, i32* %arr1, i32 3			%gep1.3 = getelementptr i32, i32* %arr1, i32 3
	%gep2.0 = getelementptr i32, i32* %arr2, i32 0			%gep2.0 = getelementptr i32, i32* %arr2, i32 0
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -instcombine -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -instcombine -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx -slp-min-non-power2-stores-size=5 \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"

	; Make sure we order the operands of commutative operations so that we get			; Make sure we order the operands of commutative operations so that we get
	; bigger vectorizable trees.			; bigger vectorizable trees.

	define void @shuffle_operands1(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @shuffle_operands1(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_operands1(			; CHECK-LABEL: @shuffle_operands1(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i32 1			; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%from_1 = getelementptr double, double *%from, i64 1			%from_1 = getelementptr double, double *%from, i64 1
	%v0_1 = load double , double * %from			%v0_1 = load double , double * %from
	%v0_2 = load double , double * %from_1			%v0_2 = load double , double * %from_1
	%v1_1 = fadd double %v0_1, %v1			%v1_1 = fadd double %v0_1, %v1
	▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	}			}

	define void @shuffle_nodes_match1(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @shuffle_nodes_match1(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_nodes_match1(			; CHECK-LABEL: @shuffle_nodes_match1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 15 Lines
	define void @vecload_vs_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast4(			; CHECK-LABEL: @vecload_vs_broadcast4(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 14 Lines


	define void @shuffle_nodes_match2(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @shuffle_nodes_match2(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @shuffle_nodes_match2(			; CHECK-LABEL: @shuffle_nodes_match2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP2]]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	Show All 20 Lines
	define void @good_load_order() {			define void @good_load_order() {
	; CHECK-LABEL: @good_load_order(			; CHECK-LABEL: @good_load_order(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; CHECK: for.cond1.preheader:			; CHECK: for.cond1.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float getelementptr inbounds ([32000 x float], [32000 x float]* @a, i32 0, i32 0), align 16			; CHECK-NEXT: [[TMP0:%.]] = load float, float getelementptr inbounds ([32000 x float], [32000 x float]* @a, i32 0, i32 0), align 16
	; CHECK-NEXT: br label [[FOR_BODY3:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3:%.*]]
	; CHECK: for.body3:			; CHECK: for.body3:
	; CHECK-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP14:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi float [ [[TMP0]], [[FOR_COND1_PREHEADER]] ], [ [[TMP12:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1			; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 4
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX5]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX]] to <8 x float>*
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [32000 x float], [32000 x float] @a, i32 0, i32 [[TMP13]]			; CHECK-NEXT: [[TMP6:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> nonnull [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false>, <8 x float> undef)
	; CHECK-NEXT: [[TMP14]] = load float, float* [[ARRAYIDX41]], align 4			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x float> poison, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP8]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[TMP7]], <8 x float> [[TMP6]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[MUL45:%.*]] = fmul float [[TMP14]], [[TMP15]]			; CHECK-NEXT: [[TMP9:%.*]] = fmul <8 x float> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: store float [[MUL45]], float* [[ARRAYIDX31]], align 4			; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[ARRAYIDX5]] to <8 x float>*
	; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP9]], <8 x float>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false>)
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP16]], 31995			; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
				; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[TMP11]], 31995
				; CHECK-NEXT: [[TMP12]] = extractelement <8 x float> [[TMP6]], i32 4
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader:			for.cond1.preheader:
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; c[1] = b[1]+a[1]; // swapped b[1] and a[1]			; c[1] = b[1]+a[1]; // swapped b[1] and a[1]

	define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){			define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){
	; CHECK-LABEL: @load_reorder_double(			; CHECK-LABEL: @load_reorder_double(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = load double, double* %a			%1 = load double, double* %a
	%2 = load double, double* %b			%2 = load double, double* %b
	%3 = fadd double %1, %2			%3 = fadd double %1, %2
	store double %3, double* %c			store double %3, double* %c
	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

	Show All 25 Lines
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
	; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP8]], 0			; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP8]], 0
	; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP8]], 8			; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP8]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i64> [[TMP10]], i64 [[TMP9]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i64> [[TMP10]], i64 [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i64> [[TMP11]], [[TMP7]]			; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i64> [[TMP7]], [[TMP11]]
	; CHECK-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8			; CHECK-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8			%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8
	%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8			%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8
	Show All 31 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; }			; }
	;			;
	; return R+G+B+Y+P;			; return R+G+B+Y+P;
	; }			; }

	define float @foo3(float* nocapture readonly %A) #0 {			define float @foo3(float* nocapture readonly %A) #0 {
	; CHECK-LABEL: @foo3(			; CHECK-LABEL: @foo3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A:%.]] to <8 x float>
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP0]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false>, <8 x float> undef)
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <8 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP14:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x float> [ [[SHUFFLE]], [[ENTRY]] ], [ [[SHRINK_SHUFFLE:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP4]]
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX14]] to <4 x float>*
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP6:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP5]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
	; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x float> poison, float [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> poison, float [[TMP13]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP10]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x float> [[TMP8]], float [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x float> [[TMP10]], <8 x float> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 6, i32 7>
	; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[TMP17]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>			; CHECK-NEXT: [[TMP13:%.*]] = fmul <8 x float> [[TMP12]], <float 7.000000e+00, float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01, float poison, float poison, float poison>
	; CHECK-NEXT: [[TMP19]] = fadd <4 x float> [[TMP6]], [[TMP18]]			; CHECK-NEXT: [[TMP14]] = fadd <8 x float> [[TMP2]], [[TMP13]]
	; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP15:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP20]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP15]], 121
				; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <4 x i32> <i32 1, i32 2, i32 undef, i32 undef>
				; CHECK-NEXT: [[SHRINK_SHUFFLE]] = shufflevector <4 x float> [[SHUFFLE1]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP19]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[TMP14]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <8 x float> [[TMP14]], i32 1
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP19]], i32 1			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[TMP16]], [[TMP17]]
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x float> [[TMP14]], i32 2
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP19]], i32 2			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP18]]
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <8 x float> [[TMP14]], i32 3
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x float> [[TMP19]], i32 3			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP19]]
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x float> [[TMP14]], i32 4
				; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP20]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	for.body.lr.ph.i:
ret void		ret void
}		}

; Function Attrs: norecurse nounwind uwtable		; Function Attrs: norecurse nounwind uwtable
define void @pr35497() local_unnamed_addr #0 {		define void @pr35497() local_unnamed_addr #0 {
; SSE-LABEL: @pr35497(		; SSE-LABEL: @pr35497(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1		; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
; SSE-NEXT: [[AND:%.*]] = shl i64 [[TMP0]], 2
; SSE-NEXT: [[SHL:%.*]] = and i64 [[AND]], 20
; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef		; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef
; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1		; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1
; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5		; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
; SSE-NEXT: [[AND_1:%.*]] = shl i64 undef, 2		; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
; SSE-NEXT: [[SHL_1:%.*]] = and i64 [[AND_1]], 20		; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
; SSE-NEXT: [[SHR_1:%.*]] = lshr i64 undef, 6		; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
; SSE-NEXT: [[ADD_1:%.*]] = add nuw nsw i64 [[SHL]], [[SHR_1]]
; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4		; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
; SSE-NEXT: [[SHR_2:%.*]] = lshr i64 undef, 6		; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
; SSE-NEXT: [[ADD_2:%.*]] = add nuw nsw i64 [[SHL_1]], [[SHR_2]]
; SSE-NEXT: [[AND_4:%.*]] = shl i64 [[ADD]], 2
; SSE-NEXT: [[SHL_4:%.*]] = and i64 [[AND_4]], 20
; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1		; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
; SSE-NEXT: store i64 [[ADD_1]], i64* [[ARRAYIDX2_5]], align 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
; SSE-NEXT: [[AND_5:%.*]] = shl nuw nsw i64 [[ADD_1]], 2		; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0
; SSE-NEXT: [[SHL_5:%.*]] = and i64 [[AND_5]], 20		; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1
; SSE-NEXT: [[SHR_5:%.*]] = lshr i64 [[ADD_1]], 6		; SSE-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>
; SSE-NEXT: [[ADD_5:%.*]] = add nuw nsw i64 [[SHL_4]], [[SHR_5]]		; SSE-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>
; SSE-NEXT: store i64 [[ADD_5]], i64* [[ARRAYIDX2_1]], align 1
; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0		; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
; SSE-NEXT: store i64 [[ADD_2]], i64* [[ARRAYIDX2_6]], align 1		; SSE-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
; SSE-NEXT: [[SHR_6:%.*]] = lshr i64 [[ADD_2]], 6		; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
; SSE-NEXT: [[ADD_6:%.*]] = add nuw nsw i64 [[SHL_5]], [[SHR_6]]		; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
; SSE-NEXT: store i64 [[ADD_6]], i64* [[ARRAYIDX2_2]], align 1		; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]
		; SSE-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @pr35497(		; AVX-LABEL: @pr35497(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1		; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef		; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1		; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1
; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5		; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]			; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 23 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 26 Lines
	; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4			; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4			; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4
	; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01			; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01
	; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01			; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01
	; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01			; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01
	; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01			; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[FADD1]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[FADD1]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1
	; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2			; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2
	; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]			; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	Show All 30 Lines
	; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6			; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6
	; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7			; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] %StructIn0, i16 [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN0]], i16 [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] %StructIn2, i16 [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN2]], i16 [[TMP7]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0			; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] %StructIn4, i16 [[TMP9]], 1			; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN4]], i16 [[TMP9]], 1
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0			; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] %StructIn6, i16 [[TMP11]], 1			; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN6]], i16 [[TMP11]], 1
	; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] %StructIn1, 0			; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In0, [[STRUCT1TY]] %StructIn3, 1			; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN0]], [[STRUCT1TY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] %StructIn5, 0			; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] [[STRUCTIN5]], 0
	; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In2, [[STRUCT1TY]] %StructIn7, 1			; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN2]], [[STRUCT1TY]] [[STRUCTIN7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] %Struct2In1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] [[STRUCT2IN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] %Struct2In3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] [[STRUCT2IN3]], 1
	; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0			%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0
	%L0 = load i16, i16 * %GEP0			%L0 = load i16, i16 * %GEP0
	%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1			%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1
	%L1 = load i16, i16 * %GEP1			%L1 = load i16, i16 * %GEP1
	%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2			%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2
	%L2 = load i16, i16 * %GEP2			%L2 = load i16, i16 * %GEP2
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX


	@b = global [8 x i32] zeroinitializer, align 16			@b = global [8 x i32] zeroinitializer, align 16
	@a = global [8 x i32] zeroinitializer, align 16			@a = global [8 x i32] zeroinitializer, align 16

	define void @foo() {			define void @foo() {
	; SSE-LABEL: @foo(			; SSE-LABEL: @foo(
	; SSE-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @b to <4 x i32>*), align 16
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16			; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 2, i32 0, i32 2>
	; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			; SSE-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([8 x i32]* @a to <4 x i32>*), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 0, i32 2>
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8			; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4) to <4 x i32>*), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @foo(			; AVX-LABEL: @foo(
	; AVX-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @b to <4 x i32>*), align 16
	; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 2, i32 0, i32 2, i32 0, i32 2, i32 0, i32 2>
				xbolva00Unsubmitted Not Done Reply Inline Actions Regression on avx? xbolva00: Regression on avx?
				ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, looks like the issue with the cost of `@llvm.masked.gather` for masked gather with some undefs in the mask ABataev: Yes, looks like the issue with the cost of `@llvm.masked.gather` for masked gather with some…
				craig.topperUnsubmitted Not Done Reply Inline Actions Gather is slow on CPUs prior to AVX512. And its cost is proportional to the number of elements. I don't think the value of the mask should be a factor. craig.topper: Gather is slow on CPUs prior to AVX512. And its cost is proportional to the number of elements.
				ABataevAuthorUnsubmitted Done Reply Inline Actions True, but in some cases it can be optimized into `gather + shuffle` instead of wide `gather`, if there are undefs in mask. ABataev: True, but in some cases it can be optimized into `gather + shuffle` instead of wide `gather`…
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i32 0
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP2]], i32 1
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16			; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			%1 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16
	%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			%2 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8			store i32 %1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4			store i32 %2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX2			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512F			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512F
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=CHECK,AVX512VL			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512VL

	define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {			define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {
	; CHECK-LABEL: @gather_load(			; SSE-LABEL: @gather_load(
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1			; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]			; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11			; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
	; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; SSE-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0			; SSE-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1			; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2			; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3			; SSE-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>			; SSE-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>			; SSE-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: ret void			; SSE-NEXT: ret void
				;
				; AVX-LABEL: @gather_load(
				; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
				; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
				; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
				; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
				; AVX-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
				; AVX-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
				; AVX-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
				; AVX-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
				; AVX-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; AVX-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
				; AVX-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: ret void
				;
				; AVX2-LABEL: @gather_load(
				; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
				; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
				; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
				; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
				; AVX2-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
				; AVX2-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX2-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
				; AVX2-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
				; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
				; AVX2-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
				; AVX2-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; AVX2-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
				; AVX2-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX2-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
				; AVX2-NEXT: ret void
				;
				; AVX512F-LABEL: @gather_load(
				; AVX512F-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
				; AVX512F-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
				; AVX512F-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
				; AVX512F-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
				; AVX512F-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
				; AVX512F-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX512F-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
				; AVX512F-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
				; AVX512F-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
				; AVX512F-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
				; AVX512F-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; AVX512F-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
				; AVX512F-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX512F-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
				; AVX512F-NEXT: ret void
				;
				; AVX512VL-LABEL: @gather_load(
				; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0
				; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>
				; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 1, i64 poison>
				; AVX512VL-NEXT: [[TMP5:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP4]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x i32> undef), !tbaa [[TBAA0:![0-9]+]]
				; AVX512VL-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[TMP5]], <i32 1, i32 2, i32 3, i32 4>
				; AVX512VL-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX512VL-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX512VL-NEXT: ret void
	;			;
				; AVX512-LABEL: @gather_load(
				; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
				; AVX512-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
				; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
				; AVX512-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
				; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
				; AVX512-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX512-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
				; AVX512-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
				; AVX512-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1
				; AVX512-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2
				; AVX512-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3
				; AVX512-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
				; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX512-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, !tbaa [[TBAA0]]
				; AVX512-NEXT: ret void
	%3 = getelementptr inbounds i32, i32* %1, i64 1			%3 = getelementptr inbounds i32, i32* %1, i64 1
	%4 = load i32, i32* %1, align 4, !tbaa !2			%4 = load i32, i32* %1, align 4, !tbaa !2
	%5 = getelementptr inbounds i32, i32* %0, i64 1			%5 = getelementptr inbounds i32, i32* %0, i64 1
	%6 = getelementptr inbounds i32, i32* %1, i64 11			%6 = getelementptr inbounds i32, i32* %1, i64 11
	%7 = load i32, i32* %6, align 4, !tbaa !2			%7 = load i32, i32* %6, align 4, !tbaa !2
	%8 = getelementptr inbounds i32, i32* %0, i64 2			%8 = getelementptr inbounds i32, i32* %0, i64 2
	%9 = getelementptr inbounds i32, i32* %1, i64 4			%9 = getelementptr inbounds i32, i32* %1, i64 4
	%10 = load i32, i32* %9, align 4, !tbaa !2			%10 = load i32, i32* %9, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	;			;
	; AVX2-LABEL: @gather_load_3(			; AVX2-LABEL: @gather_load_3(
	; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11			; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
	; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
	; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
	; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18			; AVX2-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX2-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
	; AVX2-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP13:%.]] = load <4 x i32>, <4 x i32> [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6			; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX2-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
	; AVX2-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP17:%.*]] = insertelement <8 x i32> [[TMP16]], i32 [[TMP5]], i32 1
	; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0			; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> [[TMP17]], i32 [[TMP7]], i32 2
	; AVX2-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP5]], i32 1			; AVX2-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP7]], i32 2			; AVX2-NEXT: [[TMP19:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP9]], i32 3			; AVX2-NEXT: [[TMP20:%.*]] = shufflevector <8 x i32> [[TMP18]], <8 x i32> [[TMP19]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP11]], i32 4			; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP13]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP13]], i32 5			; AVX2-NEXT: [[TMP21:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP15]], i32 6			; AVX2-NEXT: [[TMP22:%.*]] = shufflevector <8 x i32> [[TMP20]], <8 x i32> [[TMP21]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i32 7			; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i32 7
	; AVX2-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>			; AVX2-NEXT: [[TMP24:%.*]] = add <8 x i32> [[TMP23]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>			; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
	; AVX2-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <8 x i32> [[TMP24]], <8 x i32>* [[TMP25]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_3(			; AVX512F-LABEL: @gather_load_3(
	; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11			; AVX512F-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
	; AVX512F-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
	; AVX512F-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX512F-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP6:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512F-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP6]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512F-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP7:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512F-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0			; AVX512F-NEXT: [[TMP8:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP5]], i32 1			; AVX512F-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512F-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP7]], i32 2			; AVX512F-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP5]] to <8 x i32>*
	; AVX512F-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3			; AVX512F-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP14:%.*]] = add <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 4
	; AVX512F-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
	; AVX512F-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX512F-NEXT: [[TMP20:%.]] = load i32, i32 [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512F-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512F-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP25:%.*]] = insertelement <4 x i32> poison, i32 [[TMP18]], i32 0
	; AVX512F-NEXT: [[TMP26:%.*]] = insertelement <4 x i32> [[TMP25]], i32 [[TMP20]], i32 1
	; AVX512F-NEXT: [[TMP27:%.*]] = insertelement <4 x i32> [[TMP26]], i32 [[TMP22]], i32 2
	; AVX512F-NEXT: [[TMP28:%.*]] = insertelement <4 x i32> [[TMP27]], i32 [[TMP24]], i32 3
	; AVX512F-NEXT: [[TMP29:%.*]] = add <4 x i32> [[TMP28]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP30:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP29]], <4 x i32>* [[TMP30]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_3(			; AVX512VL-LABEL: @gather_load_3(
	; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1			; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
	; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1			; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
	; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0			; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512VL-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP6]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512VL-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP8:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>			; AVX512VL-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512VL-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5			; AVX512VL-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP5]] to <8 x i32>*
	; AVX512VL-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*			; AVX512VL-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX512VL-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2
	; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
	; AVX512VL-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512VL-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3
	; AVX512VL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
	; AVX512VL-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512VL-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
	; AVX512VL-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	%3 = load i32, i32* %1, align 4, !tbaa !2			%3 = load i32, i32* %1, align 4, !tbaa !2
	%4 = add i32 %3, 1			%4 = add i32 %3, 1
	%5 = getelementptr inbounds i32, i32* %0, i64 1			%5 = getelementptr inbounds i32, i32* %0, i64 1
	store i32 %4, i32* %0, align 4, !tbaa !2			store i32 %4, i32* %0, align 4, !tbaa !2
	%6 = getelementptr inbounds i32, i32* %1, i64 11			%6 = getelementptr inbounds i32, i32* %1, i64 11
	%7 = load i32, i32* %6, align 4, !tbaa !2			%7 = load i32, i32* %6, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>			; AVX-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
	; AVX-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_4(			; AVX2-LABEL: @gather_load_4(
	; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
	; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
	; AVX2-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[T14]] to <4 x i32>*
	; AVX2-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP3:%.]] = bitcast i32 [[T26]] to <4 x i32>*
	; AVX2-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0			; AVX2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0
	; AVX2-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[T7]], i32 1			; AVX2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T7]], i32 1
	; AVX2-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[T11]], i32 2			; AVX2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T11]], i32 2
	; AVX2-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[T15]], i32 3			; AVX2-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[T19]], i32 4			; AVX2-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T23]], i32 5			; AVX2-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP7]], <8 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T27]], i32 6			; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[T31]], i32 7			; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>			; AVX2-NEXT: [[TMP11:%.*]] = shufflevector <8 x i32> [[TMP9]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>			; AVX2-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[T31]], i32 7
	; AVX2-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP13:%.*]] = add <8 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
				; AVX2-NEXT: [[TMP14:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
				; AVX2-NEXT: store <8 x i32> [[TMP13]], <8 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_4(			; AVX512F-LABEL: @gather_load_4(
	; AVX512F-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX512F-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512F-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX512F-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32* [[T1:%.*]], i32 0
	; AVX512F-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512F-NEXT: [[T17:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 4			; AVX512F-NEXT: [[TMP2:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512F-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX512F-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX512F-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX512F-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX512F-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP3:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
	; AVX512F-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512F-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <8 x i32>*
	; AVX512F-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP4]], <8 x i32>* [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T3]], i32 0
	; AVX512F-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T7]], i32 1
	; AVX512F-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T11]], i32 2
	; AVX512F-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T15]], i32 3
	; AVX512F-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> poison, i32 [[T19]], i32 0
	; AVX512F-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T23]], i32 1
	; AVX512F-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[T27]], i32 2
	; AVX512F-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[T31]], i32 3
	; AVX512F-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP11:%.]] = bitcast i32 [[T0]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP12:%.]] = bitcast i32 [[T17]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_4(			; AVX512VL-LABEL: @gather_load_4(
	; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1			; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0			; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32* [[T1:%.*]], i32 0
	; AVX512VL-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512VL-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
	; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
	; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
	; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1			; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
	; AVX512VL-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>			; AVX512VL-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512VL-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
	; AVX512VL-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
	; AVX512VL-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
	; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*			; AVX512VL-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <8 x i32>*
	; AVX512VL-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP4]], <8 x i32>* [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	%t5 = getelementptr inbounds i32, i32* %t0, i64 1			%t5 = getelementptr inbounds i32, i32* %t0, i64 1
	%t6 = getelementptr inbounds i32, i32* %t1, i64 11			%t6 = getelementptr inbounds i32, i32* %t1, i64 11
	%t9 = getelementptr inbounds i32, i32* %t0, i64 2			%t9 = getelementptr inbounds i32, i32* %t0, i64 2
	%t10 = getelementptr inbounds i32, i32* %t1, i64 4			%t10 = getelementptr inbounds i32, i32* %t1, i64 4
	%t13 = getelementptr inbounds i32, i32* %t0, i64 3			%t13 = getelementptr inbounds i32, i32* %t0, i64 3
	%t14 = getelementptr inbounds i32, i32* %t1, i64 15			%t14 = getelementptr inbounds i32, i32* %t1, i64 15
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]			; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]
	; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4			; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4
	; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*			; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*
	; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
	; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33			; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
	; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
	; SSE-NEXT: [[TMP34:%.]] = load float, float [[TMP33]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP34:%.]] = bitcast float [[TMP33]] to <4 x float>*
	; SSE-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30			; SSE-NEXT: [[TMP35:%.]] = load <4 x float>, <4 x float> [[TMP34]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
	; SSE-NEXT: [[TMP37:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; SSE-NEXT: [[TMP37:%.]] = bitcast float [[TMP36]] to <4 x float>*
	; SSE-NEXT: [[TMP38:%.]] = load float, float [[TMP37]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP38:%.]] = load <4 x float>, <4 x float> [[TMP37]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27			; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
	; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP43:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23			; SSE-NEXT: [[TMP43:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i32 0
	; SSE-NEXT: [[TMP44:%.]] = load float, float [[TMP43]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[TMP35]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i32 0			; SSE-NEXT: [[TMP44:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP34]], i32 1			; SSE-NEXT: [[TMP45:%.*]] = shufflevector <4 x float> [[TMP43]], <4 x float> [[TMP44]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP46]], float [[TMP38]], i32 2			; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP40]], i32 3
	; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP42]], i32 3			; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i32 0
	; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i32 0			; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP38]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP36]], i32 1			; SSE-NEXT: [[TMP48:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP50]], float [[TMP40]], i32 2			; SSE-NEXT: [[TMP49:%.*]] = shufflevector <4 x float> [[TMP47]], <4 x float> [[TMP48]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; SSE-NEXT: [[TMP52:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP44]], i32 3			; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP42]], i32 3
	; SSE-NEXT: [[TMP53:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP52]]			; SSE-NEXT: [[TMP51:%.*]] = fdiv <4 x float> [[TMP46]], [[TMP50]]
	; SSE-NEXT: [[TMP54:%.]] = bitcast float [[TMP27]] to <4 x float>*			; SSE-NEXT: [[TMP52:%.]] = bitcast float [[TMP27]] to <4 x float>*
	; SSE-NEXT: store <4 x float> [[TMP53]], <4 x float>* [[TMP54]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x float> [[TMP51]], <4 x float>* [[TMP52]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_div(			; AVX-LABEL: @gather_load_div(
	; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4			; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
	; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10			; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
	; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13			; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
	; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
	; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11			; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
	; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
	; AVX2-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
	; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44			; AVX2-NEXT: [[TMP16:%.]] = load float, float [[TMP15]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP17:%.]] = bitcast float [[TMP14]] to <4 x float>*
	; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX2-NEXT: [[TMP18:%.]] = load <4 x float>, <4 x float> [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP19:%.]] = load float, float [[TMP18]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP19:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
	; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33			; AVX2-NEXT: [[TMP20:%.]] = load float, float [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP21:%.]] = load float, float [[TMP20]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
	; AVX2-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX2-NEXT: [[TMP22:%.]] = bitcast float [[TMP21]] to <4 x float>*
	; AVX2-NEXT: [[TMP23:%.]] = load float, float [[TMP22]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP23:%.]] = load <4 x float>, <4 x float> [[TMP22]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30			; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
	; AVX2-NEXT: [[TMP25:%.]] = load float, float [[TMP24]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <4 x float>*
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX2-NEXT: [[TMP26:%.]] = load <4 x float>, <4 x float> [[TMP25]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP27:%.]] = load float, float [[TMP26]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27			; AVX2-NEXT: [[TMP28:%.]] = load float, float [[TMP27]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP29:%.]] = load float, float [[TMP28]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
	; AVX2-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX2-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP31:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP31:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
	; AVX2-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23			; AVX2-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[TMP7]], i32 1
	; AVX2-NEXT: [[TMP33:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[TMP11]], i32 2
	; AVX2-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0			; AVX2-NEXT: [[SHRINK_SHUFFLE4:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP7]], i32 1			; AVX2-NEXT: [[TMP34:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP11]], i32 2			; AVX2-NEXT: [[TMP35:%.*]] = shufflevector <8 x float> [[TMP33]], <8 x float> [[TMP34]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP15]], i32 3			; AVX2-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[TMP23]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP19]], i32 4			; AVX2-NEXT: [[TMP36:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP23]], i32 5			; AVX2-NEXT: [[TMP37:%.*]] = shufflevector <8 x float> [[TMP35]], <8 x float> [[TMP36]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP27]], i32 6			; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP28]], i32 7
	; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP31]], i32 7			; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0
	; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0			; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP9]], i32 1
	; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP9]], i32 1			; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP13]], i32 2
	; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP13]], i32 2			; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[TMP16]], i32 3
	; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP17]], i32 3			; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP20]], i32 4
	; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP21]], i32 4			; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP26]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP25]], i32 5			; AVX2-NEXT: [[TMP44:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE]], <2 x float> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i32 6			; AVX2-NEXT: [[TMP45:%.*]] = shufflevector <8 x float> [[TMP43]], <8 x float> [[TMP44]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i32 7			; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP30]], i32 7
	; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]			; AVX2-NEXT: [[TMP47:%.*]] = fdiv <8 x float> [[TMP38]], [[TMP46]]
	; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX2-NEXT: [[TMP48:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <8 x float> [[TMP47]], <8 x float>* [[TMP48]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_div(			; AVX512F-LABEL: @gather_load_div(
	; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i32 0			; AVX512F-NEXT: [[TMP3:%.]] = insertelement <8 x float> poison, float* [[TMP1:%.*]], i32 0
	; AVX512F-NEXT: [[TMP4:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP5:%.]] = getelementptr float, <4 x float> [[TMP4]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512F-NEXT: [[TMP4:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 10, i64 3, i64 14, i64 17, i64 8, i64 5, i64 20, i64 poison>
	; AVX512F-NEXT: [[TMP6:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0			; AVX512F-NEXT: [[TMP5:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP4]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP7:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512F-NEXT: [[TMP6:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP8:%.]] = getelementptr float, <2 x float> [[TMP7]], <2 x i64> <i64 8, i64 5>			; AVX512F-NEXT: [[TMP7:%.]] = getelementptr float, <8 x float> [[TMP6]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512F-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512F-NEXT: [[TMP8:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX512F-NEXT: [[TMP9:%.*]] = fdiv <8 x float> [[TMP5]], [[TMP8]]
	; AVX512F-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP5]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP10:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512F-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: store <8 x float> [[TMP9]], <8 x float>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP8]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512F-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP9]], i32 7
	; AVX512F-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512F-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX512F-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512F-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_div(			; AVX512VL-LABEL: @gather_load_div(
	; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i32 0			; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <8 x float> poison, float* [[TMP1:%.*]], i32 0
	; AVX512VL-NEXT: [[TMP4:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr float, <4 x float> [[TMP4]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 10, i64 3, i64 14, i64 17, i64 8, i64 5, i64 20, i64 poison>
	; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0			; AVX512VL-NEXT: [[TMP5:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP4]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP7:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP6:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr float, <2 x float> [[TMP7]], <2 x i64> <i64 8, i64 5>			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr float, <8 x float> [[TMP6]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512VL-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512VL-NEXT: [[TMP8:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX512VL-NEXT: [[TMP9:%.*]] = fdiv <8 x float> [[TMP5]], [[TMP8]]
	; AVX512VL-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP5]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP10:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512VL-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: store <8 x float> [[TMP9]], <8 x float>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP8]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512VL-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP9]], i32 7
	; AVX512VL-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512VL-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX512VL-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512VL-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	%3 = load float, float* %1, align 4, !tbaa !2			%3 = load float, float* %1, align 4, !tbaa !2
	%4 = getelementptr inbounds float, float* %1, i64 4			%4 = getelementptr inbounds float, float* %1, i64 4
	%5 = load float, float* %4, align 4, !tbaa !2			%5 = load float, float* %4, align 4, !tbaa !2
	%6 = fdiv float %3, %5			%6 = fdiv float %3, %5
	%7 = getelementptr inbounds float, float* %0, i64 1			%7 = getelementptr inbounds float, float* %0, i64 1
	store float %6, float* %0, align 4, !tbaa !2			store float %6, float* %0, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX2			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512F			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512F
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=CHECK,AVX512VL			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX512VL

	define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {			define void @gather_load(i32* noalias nocapture %0, i32* noalias nocapture readonly %1) {
	; CHECK-LABEL: @gather_load(			; SSE-LABEL: @gather_load(
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1			; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]			; SSE-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11			; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
	; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; SSE-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0			; SSE-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP6]], i32 1			; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i32 2			; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3			; SSE-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>			; SSE-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>			; SSE-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP15]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: ret void			; SSE-NEXT: ret void
				;
				; AVX-LABEL: @gather_load(
				; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
				; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
				; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
				; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
				; AVX-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
				; AVX-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
				; AVX-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
				; AVX-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
				; AVX-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; AVX-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
				; AVX-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: ret void
				;
				; AVX2-LABEL: @gather_load(
				; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
				; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
				; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
				; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
				; AVX2-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
				; AVX2-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX2-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
				; AVX2-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
				; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
				; AVX2-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
				; AVX2-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; AVX2-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
				; AVX2-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX2-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
				; AVX2-NEXT: ret void
				;
				; AVX512F-LABEL: @gather_load(
				; AVX512F-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
				; AVX512F-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0:![0-9]+]]
				; AVX512F-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
				; AVX512F-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
				; AVX512F-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
				; AVX512F-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX512F-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
				; AVX512F-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP6]], i32 1
				; AVX512F-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
				; AVX512F-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
				; AVX512F-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; AVX512F-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4>
				; AVX512F-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX512F-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
				; AVX512F-NEXT: ret void
				;
				; AVX512VL-LABEL: @gather_load(
				; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i32 0
				; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 undef>
				; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 1, i64 poison>
				; AVX512VL-NEXT: [[TMP5:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP4]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x i32> undef), !tbaa [[TBAA0:![0-9]+]]
				; AVX512VL-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[TMP5]], <i32 1, i32 2, i32 3, i32 4>
				; AVX512VL-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
				; AVX512VL-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[TMP7]], align 4, !tbaa [[TBAA0]]
				; AVX512VL-NEXT: ret void
	;			;
	%3 = getelementptr inbounds i32, i32* %1, i64 1			%3 = getelementptr inbounds i32, i32* %1, i64 1
	%4 = load i32, i32* %1, align 4, !tbaa !2			%4 = load i32, i32* %1, align 4, !tbaa !2
	%5 = getelementptr inbounds i32, i32* %0, i64 1			%5 = getelementptr inbounds i32, i32* %0, i64 1
	%6 = getelementptr inbounds i32, i32* %1, i64 11			%6 = getelementptr inbounds i32, i32* %1, i64 11
	%7 = load i32, i32* %6, align 4, !tbaa !2			%7 = load i32, i32* %6, align 4, !tbaa !2
	%8 = getelementptr inbounds i32, i32* %0, i64 2			%8 = getelementptr inbounds i32, i32* %0, i64 2
	%9 = getelementptr inbounds i32, i32* %1, i64 4			%9 = getelementptr inbounds i32, i32* %1, i64 4
	▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	;			;
	; AVX2-LABEL: @gather_load_3(			; AVX2-LABEL: @gather_load_3(
	; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11			; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11
	; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
	; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
	; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18			; AVX2-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX2-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
	; AVX2-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP13:%.]] = load <4 x i32>, <4 x i32> [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6			; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX2-NEXT: [[TMP16:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
	; AVX2-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP17:%.*]] = insertelement <8 x i32> [[TMP16]], i32 [[TMP5]], i32 1
	; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0			; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> [[TMP17]], i32 [[TMP7]], i32 2
	; AVX2-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP5]], i32 1			; AVX2-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP7]], i32 2			; AVX2-NEXT: [[TMP19:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP9]], i32 3			; AVX2-NEXT: [[TMP20:%.*]] = shufflevector <8 x i32> [[TMP18]], <8 x i32> [[TMP19]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP11]], i32 4			; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP13]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP13]], i32 5			; AVX2-NEXT: [[TMP21:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP15]], i32 6			; AVX2-NEXT: [[TMP22:%.*]] = shufflevector <8 x i32> [[TMP20]], <8 x i32> [[TMP21]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i32 7			; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i32 7
	; AVX2-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>			; AVX2-NEXT: [[TMP24:%.*]] = add <8 x i32> [[TMP23]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>			; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
	; AVX2-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <8 x i32> [[TMP24]], <8 x i32>* [[TMP25]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_3(			; AVX512F-LABEL: @gather_load_3(
	; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11			; AVX512F-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
	; AVX512F-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
	; AVX512F-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX512F-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP6:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512F-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP6]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512F-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP7:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512F-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0			; AVX512F-NEXT: [[TMP8:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP5]], i32 1			; AVX512F-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512F-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP7]], i32 2			; AVX512F-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP5]] to <8 x i32>*
	; AVX512F-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i32 3			; AVX512F-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP14:%.*]] = add <4 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 4
	; AVX512F-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
	; AVX512F-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX512F-NEXT: [[TMP20:%.]] = load i32, i32 [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512F-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512F-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP25:%.*]] = insertelement <4 x i32> poison, i32 [[TMP18]], i32 0
	; AVX512F-NEXT: [[TMP26:%.*]] = insertelement <4 x i32> [[TMP25]], i32 [[TMP20]], i32 1
	; AVX512F-NEXT: [[TMP27:%.*]] = insertelement <4 x i32> [[TMP26]], i32 [[TMP22]], i32 2
	; AVX512F-NEXT: [[TMP28:%.*]] = insertelement <4 x i32> [[TMP27]], i32 [[TMP24]], i32 3
	; AVX512F-NEXT: [[TMP29:%.*]] = add <4 x i32> [[TMP28]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP30:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP29]], <4 x i32>* [[TMP30]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_3(			; AVX512VL-LABEL: @gather_load_3(
	; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1			; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
	; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1			; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
	; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0			; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512VL-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP6]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512VL-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP8:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>			; AVX512VL-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512VL-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5			; AVX512VL-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP5]] to <8 x i32>*
	; AVX512VL-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*			; AVX512VL-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX512VL-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2
	; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
	; AVX512VL-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512VL-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3
	; AVX512VL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
	; AVX512VL-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512VL-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
	; AVX512VL-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	%3 = load i32, i32* %1, align 4, !tbaa !2			%3 = load i32, i32* %1, align 4, !tbaa !2
	%4 = add i32 %3, 1			%4 = add i32 %3, 1
	%5 = getelementptr inbounds i32, i32* %0, i64 1			%5 = getelementptr inbounds i32, i32* %0, i64 1
	store i32 %4, i32* %0, align 4, !tbaa !2			store i32 %4, i32* %0, align 4, !tbaa !2
	%6 = getelementptr inbounds i32, i32* %1, i64 11			%6 = getelementptr inbounds i32, i32* %1, i64 11
	%7 = load i32, i32* %6, align 4, !tbaa !2			%7 = load i32, i32* %6, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>			; AVX-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
	; AVX-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_4(			; AVX2-LABEL: @gather_load_4(
	; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
	; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
	; AVX2-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX2-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX2-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[T14]] to <4 x i32>*
	; AVX2-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP3:%.]] = bitcast i32 [[T26]] to <4 x i32>*
	; AVX2-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0			; AVX2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i32 0
	; AVX2-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[T7]], i32 1			; AVX2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T7]], i32 1
	; AVX2-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[T11]], i32 2			; AVX2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T11]], i32 2
	; AVX2-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[T15]], i32 3			; AVX2-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[T19]], i32 4			; AVX2-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE2]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T23]], i32 5			; AVX2-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP7]], <8 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T27]], i32 6			; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[T31]], i32 7			; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[SHRINK_SHUFFLE]], <2 x i32> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>			; AVX2-NEXT: [[TMP11:%.*]] = shufflevector <8 x i32> [[TMP9]], <8 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>			; AVX2-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[T31]], i32 7
	; AVX2-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP13:%.*]] = add <8 x i32> [[TMP12]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
				; AVX2-NEXT: [[TMP14:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
				; AVX2-NEXT: store <8 x i32> [[TMP13]], <8 x i32>* [[TMP14]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_4(			; AVX512F-LABEL: @gather_load_4(
	; AVX512F-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX512F-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512F-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX512F-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32* [[T1:%.*]], i32 0
	; AVX512F-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512F-NEXT: [[T17:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 4			; AVX512F-NEXT: [[TMP2:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512F-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX512F-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX512F-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX512F-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX512F-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP3:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
	; AVX512F-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512F-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <8 x i32>*
	; AVX512F-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP4]], <8 x i32>* [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T3]], i32 0
	; AVX512F-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T7]], i32 1
	; AVX512F-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T11]], i32 2
	; AVX512F-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T15]], i32 3
	; AVX512F-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> poison, i32 [[T19]], i32 0
	; AVX512F-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T23]], i32 1
	; AVX512F-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[T27]], i32 2
	; AVX512F-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[T31]], i32 3
	; AVX512F-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 1, i32 2, i32 3, i32 4>
	; AVX512F-NEXT: [[TMP11:%.]] = bitcast i32 [[T0]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP12:%.]] = bitcast i32 [[T17]] to <4 x i32>*
	; AVX512F-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_4(			; AVX512VL-LABEL: @gather_load_4(
	; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1			; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0			; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32* [[T1:%.*]], i32 0
	; AVX512VL-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32*> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 undef>
	; AVX512VL-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <8 x i32> [[SHUFFLE]], <8 x i64> <i64 11, i64 4, i64 15, i64 18, i64 9, i64 6, i64 21, i64 poison>
	; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
	; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
	; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
	; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1			; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
	; AVX512VL-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>			; AVX512VL-NEXT: [[TMP4:%.*]] = add <8 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 poison>
	; AVX512VL-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
	; AVX512VL-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
	; AVX512VL-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
	; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*			; AVX512VL-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <8 x i32>*
	; AVX512VL-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP4]], <8 x i32>* [[TMP5]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false>), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	%t5 = getelementptr inbounds i32, i32* %t0, i64 1			%t5 = getelementptr inbounds i32, i32* %t0, i64 1
	%t6 = getelementptr inbounds i32, i32* %t1, i64 11			%t6 = getelementptr inbounds i32, i32* %t1, i64 11
	%t9 = getelementptr inbounds i32, i32* %t0, i64 2			%t9 = getelementptr inbounds i32, i32* %t0, i64 2
	%t10 = getelementptr inbounds i32, i32* %t1, i64 4			%t10 = getelementptr inbounds i32, i32* %t1, i64 4
	%t13 = getelementptr inbounds i32, i32* %t0, i64 3			%t13 = getelementptr inbounds i32, i32* %t0, i64 3
	%t14 = getelementptr inbounds i32, i32* %t1, i64 15			%t14 = getelementptr inbounds i32, i32* %t1, i64 15
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]			; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]
	; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4			; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4
	; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*			; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*
	; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
	; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33			; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
	; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
	; SSE-NEXT: [[TMP34:%.]] = load float, float [[TMP33]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP34:%.]] = bitcast float [[TMP33]] to <4 x float>*
	; SSE-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30			; SSE-NEXT: [[TMP35:%.]] = load <4 x float>, <4 x float> [[TMP34]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
	; SSE-NEXT: [[TMP37:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; SSE-NEXT: [[TMP37:%.]] = bitcast float [[TMP36]] to <4 x float>*
	; SSE-NEXT: [[TMP38:%.]] = load float, float [[TMP37]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP38:%.]] = load <4 x float>, <4 x float> [[TMP37]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27			; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
	; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP43:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23			; SSE-NEXT: [[TMP43:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i32 0
	; SSE-NEXT: [[TMP44:%.]] = load float, float [[TMP43]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[TMP35]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i32 0			; SSE-NEXT: [[TMP44:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP34]], i32 1			; SSE-NEXT: [[TMP45:%.*]] = shufflevector <4 x float> [[TMP43]], <4 x float> [[TMP44]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP46]], float [[TMP38]], i32 2			; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP40]], i32 3
	; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP42]], i32 3			; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i32 0
	; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i32 0			; SSE-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP38]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP36]], i32 1			; SSE-NEXT: [[TMP48:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP50]], float [[TMP40]], i32 2			; SSE-NEXT: [[TMP49:%.*]] = shufflevector <4 x float> [[TMP47]], <4 x float> [[TMP48]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; SSE-NEXT: [[TMP52:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP44]], i32 3			; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP42]], i32 3
	; SSE-NEXT: [[TMP53:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP52]]			; SSE-NEXT: [[TMP51:%.*]] = fdiv <4 x float> [[TMP46]], [[TMP50]]
	; SSE-NEXT: [[TMP54:%.]] = bitcast float [[TMP27]] to <4 x float>*			; SSE-NEXT: [[TMP52:%.]] = bitcast float [[TMP27]] to <4 x float>*
	; SSE-NEXT: store <4 x float> [[TMP53]], <4 x float>* [[TMP54]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x float> [[TMP51]], <4 x float>* [[TMP52]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_div(			; AVX-LABEL: @gather_load_div(
	; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4			; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4
	; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10			; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
	; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13			; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
	; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
	; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11			; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
	; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
	; AVX2-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
	; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44			; AVX2-NEXT: [[TMP16:%.]] = load float, float [[TMP15]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP17:%.]] = bitcast float [[TMP14]] to <4 x float>*
	; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX2-NEXT: [[TMP18:%.]] = load <4 x float>, <4 x float> [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP19:%.]] = load float, float [[TMP18]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP19:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
	; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33			; AVX2-NEXT: [[TMP20:%.]] = load float, float [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP21:%.]] = load float, float [[TMP20]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
	; AVX2-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX2-NEXT: [[TMP22:%.]] = bitcast float [[TMP21]] to <4 x float>*
	; AVX2-NEXT: [[TMP23:%.]] = load float, float [[TMP22]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP23:%.]] = load <4 x float>, <4 x float> [[TMP22]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30			; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
	; AVX2-NEXT: [[TMP25:%.]] = load float, float [[TMP24]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <4 x float>*
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX2-NEXT: [[TMP26:%.]] = load <4 x float>, <4 x float> [[TMP25]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP27:%.]] = load float, float [[TMP26]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27			; AVX2-NEXT: [[TMP28:%.]] = load float, float [[TMP27]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP29:%.]] = load float, float [[TMP28]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
	; AVX2-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX2-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP31:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP31:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
	; AVX2-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23			; AVX2-NEXT: [[TMP32:%.*]] = insertelement <8 x float> [[TMP31]], float [[TMP7]], i32 1
	; AVX2-NEXT: [[TMP33:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP33:%.*]] = insertelement <8 x float> [[TMP32]], float [[TMP11]], i32 2
	; AVX2-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0			; AVX2-NEXT: [[SHRINK_SHUFFLE4:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP7]], i32 1			; AVX2-NEXT: [[TMP34:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP11]], i32 2			; AVX2-NEXT: [[TMP35:%.*]] = shufflevector <8 x float> [[TMP33]], <8 x float> [[TMP34]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP15]], i32 3			; AVX2-NEXT: [[SHRINK_SHUFFLE2:%.*]] = shufflevector <4 x float> [[TMP23]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP19]], i32 4			; AVX2-NEXT: [[TMP36:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE2]], <2 x float> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP23]], i32 5			; AVX2-NEXT: [[TMP37:%.*]] = shufflevector <8 x float> [[TMP35]], <8 x float> [[TMP36]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP27]], i32 6			; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP28]], i32 7
	; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP31]], i32 7			; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0
	; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i32 0			; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP9]], i32 1
	; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP9]], i32 1			; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP13]], i32 2
	; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP13]], i32 2			; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> [[TMP41]], float [[TMP16]], i32 3
	; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP17]], i32 3			; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP20]], i32 4
	; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP21]], i32 4			; AVX2-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP26]], <4 x float> poison, <2 x i32> <i32 0, i32 3>
	; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP25]], i32 5			; AVX2-NEXT: [[TMP44:%.*]] = shufflevector <2 x float> [[SHRINK_SHUFFLE]], <2 x float> poison, <8 x i32> <i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i32 6			; AVX2-NEXT: [[TMP45:%.*]] = shufflevector <8 x float> [[TMP43]], <8 x float> [[TMP44]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i32 7			; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP30]], i32 7
	; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]			; AVX2-NEXT: [[TMP47:%.*]] = fdiv <8 x float> [[TMP38]], [[TMP46]]
	; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX2-NEXT: [[TMP48:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <8 x float> [[TMP47]], <8 x float>* [[TMP48]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_div(			; AVX512F-LABEL: @gather_load_div(
	; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i32 0			; AVX512F-NEXT: [[TMP3:%.]] = insertelement <8 x float> poison, float* [[TMP1:%.*]], i32 0
	; AVX512F-NEXT: [[TMP4:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP5:%.]] = getelementptr float, <4 x float> [[TMP4]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512F-NEXT: [[TMP4:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 10, i64 3, i64 14, i64 17, i64 8, i64 5, i64 20, i64 poison>
	; AVX512F-NEXT: [[TMP6:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0			; AVX512F-NEXT: [[TMP5:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP4]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP7:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512F-NEXT: [[TMP6:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP8:%.]] = getelementptr float, <2 x float> [[TMP7]], <2 x i64> <i64 8, i64 5>			; AVX512F-NEXT: [[TMP7:%.]] = getelementptr float, <8 x float> [[TMP6]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512F-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512F-NEXT: [[TMP8:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX512F-NEXT: [[TMP9:%.*]] = fdiv <8 x float> [[TMP5]], [[TMP8]]
	; AVX512F-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP5]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP10:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512F-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: store <8 x float> [[TMP9]], <8 x float>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP8]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512F-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP9]], i32 7
	; AVX512F-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512F-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX512F-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512F-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_div(			; AVX512VL-LABEL: @gather_load_div(
	; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i32 0			; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <8 x float> poison, float* [[TMP1:%.*]], i32 0
	; AVX512VL-NEXT: [[TMP4:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr float, <4 x float> [[TMP4]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 10, i64 3, i64 14, i64 17, i64 8, i64 5, i64 20, i64 poison>
	; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0			; AVX512VL-NEXT: [[TMP5:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP4]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP7:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP6:%.]] = shufflevector <8 x float> [[TMP3]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr float, <2 x float> [[TMP7]], <2 x i64> <i64 8, i64 5>			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr float, <8 x float> [[TMP6]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512VL-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512VL-NEXT: [[TMP8:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX512VL-NEXT: [[TMP9:%.*]] = fdiv <8 x float> [[TMP5]], [[TMP8]]
	; AVX512VL-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP5]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP10:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512VL-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: store <8 x float> [[TMP9]], <8 x float>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP8]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512VL-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP9]], i32 7
	; AVX512VL-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512VL-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX512VL-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512VL-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_div(			; AVX512-LABEL: @gather_load_div(
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0			; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> poison, <2 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>			; AVX512-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>
	; AVX512-NEXT: [[TMP7:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0			; AVX512-NEXT: [[TMP7:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s
	; These conversions should be vectorized by reviews.llvm.org/D57059			; These conversions should be vectorized by reviews.llvm.org/D57059

	define dso_local <4 x float> @foo(<4 x i32> %0) {			define dso_local <4 x float> @foo(<4 x i32> %0) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[TMP2:%.]] = extractelement <4 x i32> [[TMP0:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = sitofp <4 x i32> [[TMP0:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP3:%.*]] = sitofp i32 [[TMP2]] to float			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: ret <4 x float> [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP6]] to <2 x float>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x float> [[TMP9]]
	;			;
	%2 = extractelement <4 x i32> %0, i32 1			%2 = extractelement <4 x i32> %0, i32 1
	%3 = sitofp i32 %2 to float			%3 = sitofp i32 %2 to float
	%4 = insertelement <4 x float> undef, float %3, i32 0			%4 = insertelement <4 x float> undef, float %3, i32 0
	%5 = insertelement <4 x float> %4, float %3, i32 1			%5 = insertelement <4 x float> %4, float %3, i32 1
	%6 = extractelement <4 x i32> %0, i32 2			%6 = extractelement <4 x i32> %0, i32 2
	%7 = sitofp i32 %6 to float			%7 = sitofp i32 %6 to float
	%8 = insertelement <4 x float> %5, float %7, i32 2			%8 = insertelement <4 x float> %5, float %7, i32 2
	%9 = extractelement <4 x i32> %0, i32 3			%9 = extractelement <4 x i32> %0, i32 3
	%10 = sitofp i32 %9 to float			%10 = sitofp i32 %9 to float
	%11 = insertelement <4 x float> %8, float %10, i32 3			%11 = insertelement <4 x float> %8, float %10, i32 3
	ret <4 x float> %11			ret <4 x float> %11
	}			}

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {		define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {
; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(		; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(
; CHECK-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i32> [[X:%.]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2
; CHECK-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3
; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0		; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0
; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1		; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1
; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2		; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2
; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3		; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[Y0]], i32 4
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[X1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[Y1]], i32 5
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[X2]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[Y2]], i32 6
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[X3]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[Y3]], i32 7
; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <8 x i32> [[SHUFFLE]], [[TMP4]]
; CHECK-NEXT: [[D0:%.*]] = icmp slt i32 [[X0]], [[Y0]]		; CHECK-NEXT: [[TMP6:%.*]] = freeze <8 x i1> [[TMP5]]
; CHECK-NEXT: [[D1:%.*]] = icmp slt i32 [[X1]], [[Y1]]		; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> [[TMP6]])
; CHECK-NEXT: [[D2:%.*]] = icmp slt i32 [[X2]], [[Y2]]		; CHECK-NEXT: ret i1 [[TMP7]]
; CHECK-NEXT: [[D3:%.*]] = icmp slt i32 [[X3]], [[Y3]]
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[S4:%.*]] = select i1 [[TMP7]], i1 [[D0]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <8 x i32> %x, i32 0		%x0 = extractelement <8 x i32> %x, i32 0
%x1 = extractelement <8 x i32> %x, i32 1		%x1 = extractelement <8 x i32> %x, i32 1
%x2 = extractelement <8 x i32> %x, i32 2		%x2 = extractelement <8 x i32> %x, i32 2
%x3 = extractelement <8 x i32> %x, i32 3		%x3 = extractelement <8 x i32> %x, i32 3
%y0 = extractelement <8 x i32> %y, i32 0		%y0 = extractelement <8 x i32> %y, i32 0
%y1 = extractelement <8 x i32> %y, i32 1		%y1 = extractelement <8 x i32> %y, i32 1
%y2 = extractelement <8 x i32> %y, i32 2		%y2 = extractelement <8 x i32> %y, i32 2
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @hoge() {			define void @hoge() {
	; CHECK-LABEL: @hoge(			; CHECK-LABEL: @hoge(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br i1 undef, label [[BB1:%.]], label [[BB2:%.]]			; CHECK-NEXT: br i1 undef, label [[BB1:%.]], label [[BB2:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15			; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> <i16 poison, i16 undef>, i16 [[T]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> poison, i16 [[T]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i16> [[TMP0]] to <2 x i32>			; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i16> [[TMP0]] to <2 x i32>
	; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <2 x i32> <i32 undef, i32 63>, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <2 x i32> <i32 undef, i32 63>, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP2]], poison
	; CHECK-NEXT: [[SHUFFLE10:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE10:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[SHUFFLE10]], <i32 15, i32 31, i32 47, i32 poison>			; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[SHUFFLE10]], <i32 15, i32 31, i32 47, i32 poison>
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP5]], i32 undef			; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP5]], i32 undef
	; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63			; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i32> undef, [[TMP1]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i32> poison, [[TMP1]]
	; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP6]], undef			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP6]], poison
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP8]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP9]], undef			; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP9]], undef
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP9]], i32 undef			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP9]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = icmp slt i32 [[OP_EXTRA1]], undef			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = icmp slt i32 [[OP_EXTRA1]], undef
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = select i1 [[OP_EXTRA2]], i32 [[OP_EXTRA1]], i32 undef			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = select i1 [[OP_EXTRA2]], i32 [[OP_EXTRA1]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = icmp slt i32 [[OP_EXTRA3]], undef			; CHECK-NEXT: [[OP_EXTRA4:%.*]] = icmp slt i32 [[OP_EXTRA3]], undef
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

	Show All 9 Lines
	; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]
	; CHECK: if.then22.i:			; CHECK: if.then22.i:
	; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1			; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1
	; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]			; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2
	; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = lshr <4 x i32> [[TMP4]], <i32 1, i32 2, i32 3, i32 4>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4
	; CHECK-NEXT: [[SHR_4_I_I:%.*]] = lshr i32 [[CONV31_I]], 5
	; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = lshr <2 x i32> [[TMP7]], <i32 6, i32 7>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7
	; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8			; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8
	; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9			; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9
	; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10			; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10
	; CHECK-NEXT: [[TMP9:%.*]] = lshr <4 x i32> [[TMP4]], <i32 8, i32 9, i32 10, i32 11>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11			; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11
	; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12			; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12
	; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13			; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13
	; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14			; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14
	; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP4]], <i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP1]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = lshr <16 x i32> [[SHUFFLE]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 poison>
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[TMP2]], i32 14
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <16 x i32> [[TMP12]], <16 x i32> [[TMP13]], <16 x i32> <i32 0, i32 16, i32 17, i32 18, i32 19, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[SHR_4_I_I]], i32 5			; CHECK-NEXT: [[TMP5:%.*]] = trunc <16 x i32> [[TMP2]] to <16 x i8>
	; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <16 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x i32> [[TMP2]], i32 13
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <16 x i32> [[TMP15]], <16 x i32> [[TMP16]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 16, i32 17, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x i32> [[TMP2]], i32 12
	; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x i32> [[TMP2]], i32 11
	; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <16 x i32> [[TMP17]], <16 x i32> [[TMP18]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x i32> [[TMP2]], i32 10
	; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x i32> [[TMP2]], i32 9
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <16 x i32> [[TMP19]], <16 x i32> [[TMP20]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x i32> [[TMP2]], i32 8
	; CHECK-NEXT: [[TMP22:%.*]] = trunc <16 x i32> [[TMP21]] to <16 x i8>			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x i32> [[TMP2]], i32 7
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP10]], i32 2			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x i32> [[TMP2]], i32 6
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x i32> [[TMP2]], i32 5
	; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x i32> [[TMP2]], i32 4
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i32> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x i32> [[TMP2]], i32 3
	; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i32> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x i32> [[TMP2]], i32 2
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i32> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <16 x i32> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i32> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <16 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP30:%.*]] = extractelement <2 x i32> [[TMP8]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = and <16 x i8> [[TMP5]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP31:%.*]] = extractelement <2 x i32> [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i32> [[TMP5]], i32 3
	; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i32> [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP35:%.*]] = extractelement <4 x i32> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP36:%.*]] = and <16 x i8> [[TMP22]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15			; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15
	; CHECK-NEXT: [[TMP37:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*			; CHECK-NEXT: [[TMP21:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP36]], <16 x i8>* [[TMP37]], align 1			; CHECK-NEXT: store <16 x i8> [[TMP20]], <16 x i8>* [[TMP21]], align 1
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end50.i:			; CHECK: if.end50.i:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end50.i, label %if.then22.i			br i1 undef, label %if.end50.i, label %if.then22.i

	if.then22.i: ; preds = %entry			if.then22.i: ; preds = %entry
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/revectorized_rdx_crash.ll

	Show All 24 Lines
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[I]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[I]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[I1]] to <2 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[I1]] to <2 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[I2]] to <2 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[I2]] to <2 x i32>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 16			; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 16
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[I3]] to <2 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[I3]] to <2 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> undef, [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> poison, [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]			; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]
	; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP9]], [[TMP3]]			; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP9]], [[TMP3]]
	; CHECK-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP10]], [[TMP1]]			; CHECK-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP10]], [[TMP1]]
	; CHECK-NEXT: [[TMP12:%.*]] = add <2 x i32> [[TMP11]], undef			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i32> [[TMP12]], i32 0			; CHECK-NEXT: [[I10:%.*]] = add i32 [[TMP12]], undef
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[I10]], i32 0
	; CHECK-NEXT: [[I11:%.*]] = add i32 [[TMP14]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = add <2 x i32> [[TMP11]], [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP12]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP14]], i32 0
	; CHECK-NEXT: [[I18:%.*]] = add i32 [[TMP15]], [[I11]]			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x i32> [[TMP14]], i32 1
	; CHECK-NEXT: [[I19:%.*]] = add i32 [[TMP15]], [[I18]]			; CHECK-NEXT: [[I18:%.*]] = add i32 [[TMP16]], [[TMP15]]
				; CHECK-NEXT: [[I19:%.*]] = add i32 [[TMP16]], [[I18]]
	; CHECK-NEXT: [[I20:%.*]] = add i32 undef, [[I19]]			; CHECK-NEXT: [[I20:%.*]] = add i32 undef, [[I19]]
	; CHECK-NEXT: [[I21:%.*]] = add i32 undef, [[I20]]			; CHECK-NEXT: [[I21:%.*]] = add i32 undef, [[I20]]
	; CHECK-NEXT: [[I22:%.*]] = add i32 undef, [[I21]]			; CHECK-NEXT: [[I22:%.*]] = add i32 undef, [[I21]]
	; CHECK-NEXT: [[I23:%.*]] = add i32 undef, [[I22]]			; CHECK-NEXT: [[I23:%.*]] = add i32 undef, [[I22]]
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[R:%.]] = phi i32 [ [[I23]], [[FOR_COND_PREHEADER]] ], [ undef, [[ENTRY:%.]] ]			; CHECK-NEXT: [[R:%.]] = phi i32 [ [[I23]], [[FOR_COND_PREHEADER]] ], [ undef, [[ENTRY:%.]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/rgb_phi.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx -slp-min-non-power2-values-size=2 \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
	target triple = "i386-apple-macosx10.9.0"			target triple = "i386-apple-macosx10.9.0"

	; We disable the vectorization of <3 x float> for now			; We disable the vectorization of <3 x float> for now

	; float foo(float *A) {			; float foo(float *A) {
	;			;
	; float R = A[0];			; float R = A[0];
	; float G = A[1];			; float G = A[1];
	; float B = A[2];			; float B = A[2];
	; for (int i=0; i < 121; i+=3) {			; for (int i=0; i < 121; i+=3) {
	; R+=A[i+0]*7;			; R+=A[i+0]*7;
	; G+=A[i+1]*8;			; G+=A[i+1]*8;
	; B+=A[i+2]*9;			; B+=A[i+2]*9;
	; }			; }
	;			;
	; return R+G+B;			; return R+G+B;
	; }			; }

	define float @foo(float* nocapture readonly %A) {			define float @foo(float* nocapture readonly %A) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A:%.]] to <2 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP0]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> undef)
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 2			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[TMP4:%.]] = phi float [ [[TMP3]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi float [ [[TMP2]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[B_032:%.]] = phi float [ [[TMP2]], [[ENTRY]] ], [ [[ADD14:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[TMP5:%.]] = phi <2 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP11:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[TMP5:%.*]] = add nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP5]]
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX7]] to <2 x float>*
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX7]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> [[TMP8]], float [[TMP7]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP9]], <float 7.000000e+00, float 8.000000e+00>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP9]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; CHECK-NEXT: [[TMP11]] = fadd <2 x float> [[TMP5]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[TMP10]], <float 7.000000e+00, float 8.000000e+00, float 9.000000e+00, float poison>
	; CHECK-NEXT: [[TMP12:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP12]] = fadd <4 x float> [[TMP4]], [[TMP11]]
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.]] = load float, float [[ARRAYIDX12]], align 4
	; CHECK-NEXT: [[MUL13:%.*]] = fmul float [[TMP13]], 9.000000e+00
	; CHECK-NEXT: [[ADD14]] = fadd float [[B_032]], [[MUL13]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP14]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP13]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]
	; CHECK: for.body.for.body_crit_edge:			; CHECK: for.body.for.body_crit_edge:
	; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4			; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP12]], i32 1
	; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[TMP15]], [[TMP16]]			; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[TMP14]], [[TMP15]]
	; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[ADD14]]			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x float> [[TMP12]], i32 2
				; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[TMP16]]
	; CHECK-NEXT: ret float [[ADD17]]			; CHECK-NEXT: ret float [[ADD17]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4

	define i32 @slp_schedule_bundle() local_unnamed_addr #0 {			define i32 @slp_schedule_bundle() local_unnamed_addr #0 {
	; CHECK-LABEL: @slp_schedule_bundle(			; CHECK-LABEL: @slp_schedule_bundle(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([1 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> bitcast ([1 x i32]* @b to <8 x i32>*), i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x i32> undef)
	; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31>			; CHECK-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31, i32 31, i32 31, i32 poison, i32 poison>
	; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = xor <8 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 poison, i32 poison>
	; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([1 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP2]], <8 x i32>* bitcast ([1 x i32]* @a to <8 x i32>*), i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>)
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0), align 4
	; CHECK-NEXT: [[DOTLOBIT_4:%.*]] = lshr i32 [[TMP3]], 31
	; CHECK-NEXT: [[DOTLOBIT_NOT_4:%.*]] = xor i32 [[DOTLOBIT_4]], 1
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_4]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0), align 4
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 5, i64 0), align 4
	; CHECK-NEXT: [[DOTLOBIT_5:%.*]] = lshr i32 [[TMP4]], 31
	; CHECK-NEXT: [[DOTLOBIT_NOT_5:%.*]] = xor i32 [[DOTLOBIT_5]], 1
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_5]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 5, i64 0), align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4			%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4
	%.lobit = lshr i32 %0, 31			%.lobit = lshr i32 %0, 31
	%.lobit.not = xor i32 %.lobit, 1			%.lobit.not = xor i32 %.lobit, 1
	store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4			store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4
	%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4			%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	bb:
ret void		ret void
}		}

define internal i32 @ipvideo_decode_block_opcode_0xD_16() {		define internal i32 @ipvideo_decode_block_opcode_0xD_16() {
; CHECK-LABEL: @ipvideo_decode_block_opcode_0xD_16(		; CHECK-LABEL: @ipvideo_decode_block_opcode_0xD_16(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ undef, [[ENTRY:%.]] ], [ [[SHRINK_SHUFFLE:%.]], [[IF_END:%.]] ]		; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ poison, [[ENTRY:%.]] ], [ [[TMP4:%.]], [[IF_END:%.]] ]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP0]], <2 x i16> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: br label [[IF_END]]		; CHECK-NEXT: br label [[IF_END]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[ARRAYIDX11_1:%.]] = getelementptr inbounds i16, i16 undef, i32 1		; CHECK-NEXT: [[ARRAYIDX11_1:%.]] = getelementptr inbounds i16, i16 undef, i32 1
; CHECK-NEXT: [[ARRAYIDX11_2:%.]] = getelementptr inbounds i16, i16 undef, i32 2		; CHECK-NEXT: [[ARRAYIDX11_2:%.]] = getelementptr inbounds i16, i16 undef, i32 2
; CHECK-NEXT: [[ARRAYIDX11_3:%.]] = getelementptr inbounds i16, i16 undef, i32 3		; CHECK-NEXT: [[ARRAYIDX11_3:%.]] = getelementptr inbounds i16, i16 undef, i32 3
; CHECK-NEXT: [[ARRAYIDX11_4:%.]] = getelementptr inbounds i16, i16 undef, i32 4		; CHECK-NEXT: [[ARRAYIDX11_4:%.]] = getelementptr inbounds i16, i16 undef, i32 4
; CHECK-NEXT: [[ARRAYIDX11_5:%.]] = getelementptr inbounds i16, i16 undef, i32 5		; CHECK-NEXT: [[ARRAYIDX11_5:%.]] = getelementptr inbounds i16, i16 undef, i32 5
; CHECK-NEXT: [[ARRAYIDX11_6:%.]] = getelementptr inbounds i16, i16 undef, i32 6		; CHECK-NEXT: [[ARRAYIDX11_6:%.]] = getelementptr inbounds i16, i16 undef, i32 6
; CHECK-NEXT: [[ARRAYIDX11_7:%.]] = getelementptr inbounds i16, i16 undef, i32 7		; CHECK-NEXT: [[ARRAYIDX11_7:%.]] = getelementptr inbounds i16, i16 undef, i32 7
; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* undef, align 2		; CHECK-NEXT: store <8 x i16> [[SHUFFLE]], <8 x i16>* undef, align 2
; CHECK-NEXT: [[SHRINK_SHUFFLE]] = shufflevector <8 x i16> [[SHUFFLE]], <8 x i16> poison, <2 x i32> <i32 0, i32 4>		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i16> [[SHUFFLE]], i32 0
		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i16> poison, i16 [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i16> [[SHUFFLE]], i32 4
		; CHECK-NEXT: [[TMP4]] = insertelement <2 x i16> [[TMP2]], i16 [[TMP3]], i32 1
; CHECK-NEXT: br label [[FOR_BODY]]		; CHECK-NEXT: br label [[FOR_BODY]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %if.end, %entry		for.body: ; preds = %if.end, %entry
%P.sroa.7.0 = phi i16 [ undef, %entry ], [ %P.sroa.7.0, %if.end ]		%P.sroa.7.0 = phi i16 [ undef, %entry ], [ %P.sroa.7.0, %if.end ]
%P.sroa.0.0 = phi i16 [ undef, %entry ], [ %P.sroa.0.0, %if.end ]		%P.sroa.0.0 = phi i16 [ undef, %entry ], [ %P.sroa.0.0, %if.end ]
Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/supernode.ll

	Show All 17 Lines
	; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8			; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8
	; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8
	; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*
	; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8			; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8
	; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0
	; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1			; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1
	; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP1]]			; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1			; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1
	; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]			; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]
	; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; ENABLED-NEXT: ret void			; ENABLED-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/value-bug-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; We used to crash on this example because we were building a constant			; We used to crash on this example because we were building a constant
	; expression during vectorization and the vectorizer expects instructions			; expression during vectorization and the vectorizer expects instructions
	; as elements of the vectorized tree.			; as elements of the vectorized tree.
	; PR19621			; PR19621

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb279:			; CHECK-NEXT: bb279:
	; CHECK-NEXT: br label [[BB283:%.*]]			; CHECK-NEXT: br label [[BB283:%.*]]
	; CHECK: bb283:			; CHECK: bb283:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP11:%.]], [[EXIT:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP11:%.]], [[EXIT:%.]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ undef, [[EXIT]] ]			; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ poison, [[EXIT]] ]
	; CHECK-NEXT: br label [[BB284:%.*]]			; CHECK-NEXT: br label [[BB284:%.*]]
	; CHECK: bb284:			; CHECK: bb284:
	; CHECK-NEXT: [[TMP2:%.*]] = fpext <2 x float> [[TMP0]] to <2 x double>			; CHECK-NEXT: [[TMP2:%.*]] = fpext <2 x float> [[TMP0]] to <2 x double>
	; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef
	; CHECK-NEXT: br label [[BB21_I:%.*]]			; CHECK-NEXT: br label [[BB21_I:%.*]]
	; CHECK: bb21.i:			; CHECK: bb21.i:
	; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]			; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/value-bug.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; We used to crash on this example because we were building a constant			; We used to crash on this example because we were building a constant
	; expression during vectorization and the vectorizer expects instructions			; expression during vectorization and the vectorizer expects instructions
	; as elements of the vectorized tree.			; as elements of the vectorized tree.
	; PR19621			; PR19621

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb279:			; CHECK-NEXT: bb279:
	; CHECK-NEXT: br label [[BB283:%.*]]			; CHECK-NEXT: br label [[BB283:%.*]]
	; CHECK: bb283:			; CHECK: bb283:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP11:%.]], [[EXIT:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP11:%.]], [[EXIT:%.]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ undef, [[EXIT]] ]			; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ poison, [[EXIT]] ]
	; CHECK-NEXT: br label [[BB284:%.*]]			; CHECK-NEXT: br label [[BB284:%.*]]
	; CHECK: bb284:			; CHECK: bb284:
	; CHECK-NEXT: [[TMP2:%.*]] = fpext <2 x float> [[TMP0]] to <2 x double>			; CHECK-NEXT: [[TMP2:%.*]] = fpext <2 x float> [[TMP0]] to <2 x double>
	; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef
	; CHECK-NEXT: br label [[BB21_I:%.*]]			; CHECK-NEXT: br label [[BB21_I:%.*]]
	; CHECK: bb21.i:			; CHECK: bb21.i:
	; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]			; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

	Show All 24 Lines
	; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4			; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4
	; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4			; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4
	; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]			; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]
	; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]			; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]
	; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]			; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]
	; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]			; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]
	; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]			; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]
	; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433			; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433
	; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270
	; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137			; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137
	; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]			; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]
	; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]			; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]
	; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]			; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]
	; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633			; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T40]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T15]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T47]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T27]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T9]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T40]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0
	; CHECK-NEXT: [[T50:%.*]] = add nsw i32 [[T40]], [[T48]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[T50]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[T691:%.*]] = shufflevector <8 x i32> [[T67]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[SHUFFLE]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T691]], i32 [[T50]], i32 5			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 3
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP11]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP8]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	%t5 = load i32, i32* %t4, align 4			%t5 = load i32, i32* %t4, align 4
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

	Show All 24 Lines
	; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4			; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4
	; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4			; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4
	; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]			; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]
	; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]			; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]
	; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]			; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]
	; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]			; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]
	; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]			; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]
	; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433			; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433
	; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270
	; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137			; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137
	; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]			; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]
	; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]			; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]
	; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]			; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]
	; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633			; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T40]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T15]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T47]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T27]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T9]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T40]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0
	; CHECK-NEXT: [[T50:%.*]] = add nsw i32 [[T40]], [[T48]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[T50]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[T691:%.*]] = shufflevector <8 x i32> [[T67]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[SHUFFLE]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T691]], i32 [[T50]], i32 5			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 3
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP11]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP8]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	%t5 = load i32, i32* %t4, align 4			%t5 = load i32, i32* %t4, align 4
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	define i32 @foo(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 0, i32 0>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A7:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A8:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A1:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A2:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A3:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A4:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A5:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A6:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	Show All 25 Lines
	define i32 @foo1(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo1(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo1(			; CHECK-LABEL: @foo1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 3			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 3>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 1, i32 2, i32 3, i32 1, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A6:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A1:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A4:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A5:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A8:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A2:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A3:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	Show All 29 Lines
	define i32 @foo2(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo2(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo2(			; CHECK-LABEL: @foo2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 3			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 3
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 3, i32 2, i32 3, i32 0, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A4:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A6:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A5:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A8:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A2:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A7:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A1:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A3:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	Show All 28 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Initial support for the vectorization of the non-power-of-2 vectors.Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 360420

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/compare-reduce.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/dot-product.ll

llvm/test/Transforms/SLPVectorizer/X86/extract.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

llvm/test/Transforms/SLPVectorizer/X86/no_alternate_divrem.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

llvm/test/Transforms/SLPVectorizer/X86/pr49081.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

llvm/test/Transforms/SLPVectorizer/X86/revectorized_rdx_crash.ll

llvm/test/Transforms/SLPVectorizer/X86/rgb_phi.ll

llvm/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

llvm/test/Transforms/SLPVectorizer/X86/supernode.ll

llvm/test/Transforms/SLPVectorizer/X86/value-bug-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/value-bug.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

[SLP] Initial support for the vectorization of the non-power-of-2 vectors.
Needs ReviewPublic