This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetTransformInfo.h
10/12
X86TargetTransformInfo.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
-
addsub.ll
-
alternate-fp.ll
-
alternate-int.ll
-
crash_7zip.ll
-
crash_bullet.ll
-
crash_bullet3.ll
-
crash_sim4b1.ll
-
fptosi.ll
-
fptoui.ll
-
insertvalue.ll
-
phi.ll
-
remark_not_all_parts.ll
-
reorder_phi.ll
-
resched.ll
-
rgb_phi.ll
-
saxpy.ll
-
schedule-bundle.ll
-
sext.ll
-
shift-lshr.ll
-
shift-shl.ll
-
sitofp.ll
-
tiny-tree.ll
-
uitofp.ll
1/2
vec-reg-64bit.ll
-
vect_copyable_in_binops.ll
-
zext.ll

Differential D56082

[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
ClosedPublic

Authored by anton-afanasyev on Dec 26 2018, 4:14 AM.

Download Raw Diff

Details

Reviewers

RKSimon
ABataev
dtemirbulatov
craig.topper
fhahn
spatel

Summary

Try to use 64-bit SLP vectorization. In addition to horizontal instrs          
this change triggers optimizations for partial vector operations (for instance,                                           
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).                                                                                                             
                                                                                                                      
Fixes llvm.org/PR32433

Diff Detail

Event Timeline

anton-afanasyev created this revision.Dec 26 2018, 4:14 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptDec 26 2018, 4:14 AM

General question: why 256bit sie of the vector is set is minimal vector register size if the architecture supports 128 bit vectors? TTI should report 128 as the min regsize, not 256. And we don't need all these new options, functions etc.

In D56082#1340924, @ABataev wrote:

General question: why 256bit sie of the vector is set is minimal vector register size if the architecture supports 128 bit vectors? TTI should report 128 as the min regsize, not 256. And we don't need all these new options, functions etc.

Hi @ABataev , x86 TTI has already been reporting 128 bit vector (TTI->getMinVectorRegisterBitWidth() returns 128), but we actually need 64-bit vectors (high and low parts of 128-bit registers) to be tried by SLPVectorizer to support horizontal 128-bit adds and subs. Though making TTI->getMinVectorRegisterBitWidth() returning 64 would fix this issue as well, we cannot merge these two notions (minimum vector register and minimum semi vector), since getMinVectorRegisterBitWidth() is used in other places.

Maybe the confusion was caused by naming HADDPS as horizontal 128-bit addings (here http://llvm.org/PR32433), and it should be called as horizontal 64-bit vector pair sum (<2 x float> + <2 x float> -> <2 x float>).

In D56082#1340948, @anton-afanasyev wrote:

In D56082#1340924, @ABataev wrote:

General question: why 256bit sie of the vector is set is minimal vector register size if the architecture supports 128 bit vectors? TTI should report 128 as the min regsize, not 256. And we don't need all these new options, functions etc.

Hi @ABataev , x86 TTI has already been reporting 128 bit vector (TTI->getMinVectorRegisterBitWidth() returns 128), but we actually need 64-bit vectors (high and low parts of 128-bit registers) to be tried by SLPVectorizer to support horizontal 128-bit adds and subs. Though making TTI->getMinVectorRegisterBitWidth() returning 64 would fix this issue as well, we cannot merge these two notions (minimum vector register and minimum semi vector), since getMinVectorRegisterBitWidth() is used in other places.

Maybe the confusion was caused by naming HADDPS as horizontal 128-bit addings (here http://llvm.org/PR32433), and it should be called as horizontal 64-bit vector pair sum (<2 x float> + <2 x float> -> <2 x float>).

Thenit is better to introduce another function in TTI - something like getMinVectOpWidth() and use it in SLP vectorizer at least rather than iadding sometthing like semi vector size.

In D56082#1340950, @ABataev wrote:

Thenit is better to introduce another function in TTI - something like getMinVectOpWidth() and use it in SLP vectorizer at least rather than iadding sometthing like semi vector size.

Thanks for your note! This requires TTI API changing as well but may be more flexible (one can change getMinVecOpWidth() to 1/4 of MinVectorRegisterBitWidth, for instance). Though it leads to less clear root cause of where this semi vectors come from (from special horizontal operations).

I'm to change this patch appropriately. I'm also to check what are the consequences of merging MinVecOpWidth and MinVectorRegisterBitWidth.

RKSimon added reviewers: craig.topper, fhahn, spatel.Dec 27 2018, 2:54 AM

Instead of being so explicit, why can't we just partially use a known legal vector width, maybe limited to subvectors that fit into the legal type (float2/int2/char4 etc.) - leaving the upper elements as undef/duplicates of the partial subvector? The cost model will likely be using the legal widths anyhow..

@craig.topper may have some thoughts on this patch's effects on D55251 - vector widening legalization

In D56082#1341234, @RKSimon wrote:

Instead of being so explicit, why can't we just partially use a known legal vector width, maybe limited to subvectors that fit into the legal type (float2/int2/char4 etc.) - leaving the upper elements as undef/duplicates of the partial subvector? The cost model will likely be using the legal widths anyhow..

@craig.topper may have some thoughts on this patch's effects on D55251 - vector widening legalization

Hi Simon, yes, if I understand you correctly, that is exactly the conclusion I came to. I've investigated getMinVectorRegisterBitWidth() uses, the only its user is SLPVectorizer itself and it makes sense just to change MinVectorRegisterBitWidth to 64 for x86 target. I've looked for cases when cost model cannot process this but it woks well. Several new cases open with this change -- for instrs like PSUB[B|W|D] working with MMX 64-bit registers. Also cost model allows partial subvector operatioins.

So I propose just changing MinVectorRegisterBitWidth to 64 for x86 (like for aarch64). All the subtle tuning should be done by cost model.
I'm to edit revision accordingly.

anton-afanasyev updated this revision to Diff 179559.Dec 27 2018, 8:50 AM

anton-afanasyev edited the summary of this revision. (Show Details)

I'm concerned about integer types. Without -x86-experimental-vector-widening-legalization we end up promoting v2i32 to v2i64 during type legalization. An X86 specific DAG combine turns some v2i64 operations back to v4i32 based on the result being truncated, but it isn't always able to rearrange the shuffles well.

Changing semi-vec-reg-128bit.ll to use i32 instead of float results in this code instead of phaddd. Even with -mcpu=btver2 which is needed to generate haddps for the float type for this test.

	vpshufd	$245, %xmm0, %xmm1      # xmm1 = xmm0[1,1,3,3]
	vpaddd	%xmm1, %xmm0, %xmm0
	vpshufd	$232, %xmm0, %xmm0      # xmm0 = xmm0[0,2,2,3]
	vmovq	%xmm0, (%rdi)

RKSimon added inline comments.Dec 27 2018, 11:07 AM

lib/Target/X86/X86TargetTransformInfo.cpp
150	Can we add a IsFloat bool argument here?

In D56082#1341374, @craig.topper wrote:
I'm concerned about integer types. Without -x86-experimental-vector-widening-legalization we end up promoting v2i32 to v2i64 during type legalization. An X86 specific DAG combine turns some v2i64 operations back to v4i32 based on the result being truncated, but it isn't always able to rearrange the shuffles well.

Changing semi-vec-reg-128bit.ll to use i32 instead of float results in this code instead of phaddd. Even with -mcpu=btver2 which is needed to generate haddps for the float type for this test.
	vpshufd	$245, %xmm0, %xmm1      # xmm1 = xmm0[1,1,3,3]
	vpaddd	%xmm1, %xmm0, %xmm0
	vpshufd	$232, %xmm0, %xmm0      # xmm0 = xmm0[0,2,2,3]
	vmovq	%xmm0, (%rdi)

That is not related to this patch, since it is doing the same thing for either float or i32:

> ~/llvm/build_rel_exp/bin/opt -S -mcpu=btver2 -slp-vectorizer -instcombine semi-vec-reg-128bit-i32.ll
...
define void @add_pairs_128(<4 x i32>, i32* nocapture) #0 {
  %3 = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> <i32 0, i32 2>
  %4 = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> <i32 1, i32 3>
  %5 = add <2 x i32> %3, %4
  %6 = bitcast i32* %1 to <2 x i32>*
  store <2 x i32> %5, <2 x i32>* %6, align 4
  ret void
}

attributes #0 = { nounwind "target-cpu"="btver2" }

An issue here is with x86 ISel, I believe it should be fixed there (does -x86-experimental-vector-widening-legalization fix it?). Another candidate could be InstCombiner to make specific combination of exctracts and inserts gotten from vectorizer.

anton-afanasyev marked an inline comment as done.Dec 27 2018, 1:46 PM

anton-afanasyev added inline comments.

lib/Target/X86/X86TargetTransformInfo.cpp
150	That is not possible: at this stage we can only operate by vector register width notion regardless of the scalars type. And what is the cause we can need it?

Added comments and changed semi-vec-reg-128bit.ll test (renamed to vec-reg-64bit.ll, added comments).

lebedev.ri added a subscriber: lebedev.ri.Dec 28 2018, 2:30 AM

lebedev.ri added inline comments.

test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll
13–38	There are no run lines for these prefixes.

lebedev.ri added inline comments.Dec 28 2018, 2:39 AM

lib/Target/X86/X86TargetTransformInfo.cpp
150–154	I lack context, but will this also handle e.g. `_mm_hadd_epi16()` ?

anton-afanasyev marked 2 inline comments as done.Dec 28 2018, 3:18 AM

anton-afanasyev added inline comments.

lib/Target/X86/X86TargetTransformInfo.cpp
150–154	Yes, phaddw is also handled.
test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll
13–38	Oops, thanks, forget to remove after prefix renaming.

Removed unchecked test prefix.

Ping.
Changed patch according to notes.

Ping!

To clarify: I have changed the patch after review notes so it just modifies the value returned by the function X86TTIImpl::getMinVectorRegisterBitWidth() (was 128, now it is 64).

RKSimon added inline comments.Jan 16 2019, 3:05 AM

lib/Target/X86/X86TargetTransformInfo.cpp
150	Don't refer to MMX - its gives the wrong impression that we actually generate code for it (all we ever do is emit intrinsic code).

anton-afanasyev marked 2 inline comments as done.Jan 16 2019, 3:12 AM

anton-afanasyev added inline comments.

lib/Target/X86/X86TargetTransformInfo.cpp
150	Ok, I'm to remove this from comment.

lebedev.ri added inline comments.Jan 16 2019, 3:22 AM

lib/Target/X86/X86TargetTransformInfo.cpp
150–154	Then i don't understand the function name. (yes, i understand that it is a pre-existing hook) `_mm_hadd_epi16()` operates on `i16`, how come `64` is the right pick here?

anton-afanasyev marked 3 inline comments as done.Jan 16 2019, 3:34 AM

anton-afanasyev added inline comments.

lib/Target/X86/X86TargetTransformInfo.cpp
150–154	64 is the size of whole vector with 16-bit elements packed in it. Here is a piece of manual: PHADDW (with 64-bit operands): mm1[15-0] = mm1[31-16] + mm1[15-0]; mm1[31-16] = mm1[63-48] + mm1[47-32]; mm1[47-32] = mm2/m64[31-16] + mm2/m64[15-0]; mm1[63-48] = mm2/m64[63-48] + mm2/m64[47-32];

ABataev added inline comments.Jan 16 2019, 6:50 AM

lib/Target/X86/X86TargetTransformInfo.cpp
150	Still refers to MMX

Update comment, remove MMX reference.

anton-afanasyev marked 2 inline comments as done.Jan 16 2019, 7:38 AM

anton-afanasyev added inline comments.

lib/Target/X86/X86TargetTransformInfo.cpp
150	Do you mean reference in comment? Removed.

ABataev added inline comments.Jan 16 2019, 7:46 AM

lib/Target/X86/X86TargetTransformInfo.cpp
150	Yes, it was about the comment. Still, the comment does not match the function. It says about 128-bit operations but returns 64 as a result.

anton-afanasyev updated this revision to Diff 182051.Jan 16 2019, 7:53 AM

anton-afanasyev marked 3 inline comments as done.

anton-afanasyev added inline comments.

lib/Target/X86/X86TargetTransformInfo.cpp
150	"horizontal 128-bit operations" means "operations with high and low 64-bit parts". Updating comment to be more clear.

There are a lot of test changes here that have nothing to do with the add/sub operations mentioned in the title. How do we know these changes are good? Was any benchmarking done with this patch?

In D56082#1360129, @craig.topper wrote:

There are a lot of test changes here that have nothing to do with the add/sub operations mentioned in the title. How do we know these changes are good? Was any benchmarking done with this patch?

Yes, I have done a lot of benchmarking using test suite while issue investigation.
I cannot attach results right now (working on other issue), can do it later.

The most suspicios was size..text metrics of dijkstra test (https://gcc.godbolt.org/z/3E1AmW), but I have investigated that is not bug, but rightful work of Loop Unrolling Pass.

Is size..text and exec_time enough? Can you advise any other benchmarking?

In D56082#1360129, @craig.topper wrote:

There are a lot of test changes here that have nothing to do with the add/sub operations mentioned in the title. How do we know these changes are good? Was any benchmarking done with this patch?

As for the test changes not related to add/sub operations (horizontal ops), they are mostly caused by partial vectorization (like fmul <2 x float>). Firstly I tried to escape this by adding additional cost for all 64-bit ops not concerned with horizontal ones, but it looks dirty and unnecessary since partial vectorization are fair optimization as well (if Cost < 0).

Rebase

Here are size..text results for test-suite compiled with Os:

> ~/llvm/test-suite/utils/compare.py  --filter-short -m size..text results_rel_base_Os.json results_rel_exp_Os.json
Tests: 1160
Short Running: 243 (filtered out)
Remaining: 917
Metric: size..text

Program                                         results_rel_base_Os  results_rel_exp_Os diff  
                                                                                             
SingleSour...Regression-C++-pointer_method2   610.00               482.00              -21.0%
MultiSourc...rks/Prolangs-C++/shapes/shapes   2674.00              2802.00              4.8% 
MultiSource/Benchmarks/Bullet/bullet          586082.00            569010.00           -2.9% 
MultiSourc.../Benchmarks/McCat/15-trie/trie   1218.00              1250.00              2.6% 
MultiSourc...rsaBench/beamformer/beamformer   3826.00              3730.00             -2.5% 
SingleSour...e/UnitTests/2003-05-07-VarArgs   1346.00              1378.00              2.4% 
SingleSour.../Vector/SSE/Vector-sse.stepfft   2114.00              2066.00             -2.3% 
SingleSource/UnitTests/initp1                 1570.00              1538.00             -2.0% 
SingleSour...arks/Shootout/Shootout-objinst   802.00               818.00               2.0% 
SingleSour...e/UnitTests/C++11/stdthreadbug   802.00               786.00              -2.0% 
SingleSour...rks/Shootout/Shootout-methcall   818.00               834.00               2.0% 
SingleSour...UnitTests/2003-08-11-VaListArg   1682.00              1714.00              1.9% 
MultiSourc...s/Prolangs-C++/objects/objects   1762.00              1794.00              1.8% 
SingleSour...tout-C++/Shootout-C++-methcall   898.00               914.00               1.8% 
SingleSource/Benchmarks/Dhrystone/dry         1010.00              1026.00              1.6%
       results_rel_base_Os  results_rel_exp_Os        diff
count  9.170000e+02         9.170000e+02        917.000000
mean   3.171576e+04         3.172267e+04        0.000862  
std    2.406645e+05         2.405892e+05        0.007775  
min    3.700000e+02         3.700000e+02       -0.209836  
25%    1.090000e+03         1.090000e+03        0.000000  
50%    2.697800e+04         2.704200e+04        0.000000  
75%    2.713800e+04         2.720200e+04        0.002360  
max    7.149522e+06         7.148562e+06        0.047868

and exec_time results for test-suite compiled with O3:

> ~/llvm/test-suite/utils/compare.py  --filter-short -m exec_time results_rel_base.json results_rel_exp.json
Tests: 1160
Short Running: 762 (filtered out)
Remaining: 398
Metric: exec_time

Program                                         results_rel_base  results_rel_exp diff  
                                                                                         
MicroBench...XRayFDRMultiThreaded/threads:2   524.56             411.88            -21.5%
MicroBench...RayFDRMultiThreaded/threads:16   1057.94            875.81            -17.2%
MicroBench...t:BM_PRESSURE_CALC_LAMBDA/5001    13.03              15.23            16.9% 
MicroBench...XRayFDRMultiThreaded/threads:8   1087.77            912.91            -16.1%
MultiSourc...nch/beamformer/beamformer.test     0.98               0.86            -11.9%
MicroBench...mbda.test:BM_PIC_2D_LAMBDA/171     1.95               2.14             9.7% 
MultiSource/Applications/siod/siod.test         1.80               1.64            -8.7% 
MicroBench...XRayFDRMultiThreaded/threads:4   669.22             612.08            -8.5% 
MicroBench...da.test:BM_PIC_1D_LAMBDA/44217   681.44             731.22             7.3% 
SingleSour...ut-C++/Shootout-C++-hash2.test     2.41               2.58             7.0% 
SingleSour...bra/kernels/gemver/gemver.test     0.84               0.78            -6.7% 
MultiSourc...mbolics-dbl/Symbolics-dbl.test     2.99               2.80            -6.6% 
MicroBench...ambda.test:BM_EOS_LAMBDA/44217    95.19              89.23            -6.3% 
MicroBench...da.test:BM_HYDRO_2D_LAMBDA/171    16.66              17.69             6.2% 
MicroBench...RayFDRMultiThreaded/threads:32   1022.42            1084.72            6.1%
       results_rel_base  results_rel_exp        diff
count  397.000000         398.000000        397.000000
mean   2536.523923        2523.193837      -0.001760  
std    17779.494744       17680.005983      0.025555  
min    0.600000           0.600000         -0.214805  
25%    2.064000           2.065000         -0.003161    
50%    6.472000           6.296000          0.000000    
75%    148.025000         148.279000        0.001152    
max    227023.000000      226291.000000     0.169066

Rebased, updated new tests. Does it have a chance for lgtm?

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 12 2019, 1:45 AM

This revision is now accepted and ready to land.Feb 12 2019, 3:23 AM

Submitted: https://llvm.org/svn/llvm-project/llvm/trunk@353923

Revision Contents

Path

Size

lib/

Target/

X86/

X86TargetTransformInfo.h

1 line

X86TargetTransformInfo.cpp

7 lines

test/

Transforms/

SLPVectorizer/

X86/

28 lines

13 lines

163 lines

22 lines

34 lines

26 lines

21 lines

85 lines

149 lines

33 lines

67 lines

remark_not_all_parts.ll

2 lines

57 lines

71 lines

36 lines

14 lines

12 lines

49 lines

96 lines

160 lines

37 lines

8 lines

37 lines

51 lines

vect_copyable_in_binops.ll

199 lines

zext.ll

49 lines

Diff 186420

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	llvm::Optional<unsigned> getCacheAssociativity(
TargetTransformInfo::CacheLevel Level) const;		TargetTransformInfo::CacheLevel Level) const;
/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(bool Vector);
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
		unsigned getMinVectorRegisterBitWidth() const;
unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;		unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
int getArithmeticInstrCost(		int getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) const {
}		}

if (ST->is64Bit())		if (ST->is64Bit())
return 64;		return 64;

return 32;		return 32;
}		}

		// Use horizontal 128-bit operations, which use low and high
		RKSimonUnsubmitted Not Done Reply Inline Actions Can we add a IsFloat bool argument here? RKSimon: Can we add a IsFloat bool argument here?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions That is not possible: at this stage we can only operate by vector register width notion regardless of the scalars type. And what is the cause we can need it? anton-afanasyev: That is not possible: at this stage we can only operate by vector register width notion…
		RKSimonUnsubmitted Done Reply Inline Actions Don't refer to MMX - its gives the wrong impression that we actually generate code for it (all we ever do is emit intrinsic code). RKSimon: Don't refer to MMX - its gives the wrong impression that we actually generate code for it (all…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, I'm to remove this from comment. anton-afanasyev: Ok, I'm to remove this from comment.
		ABataevUnsubmitted Done Reply Inline Actions Still refers to MMX ABataev: Still refers to MMX
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Do you mean reference in comment? Removed. anton-afanasyev: Do you mean reference in comment? Removed.
		ABataevUnsubmitted Done Reply Inline Actions Yes, it was about the comment. Still, the comment does not match the function. It says about 128-bit operations but returns 64 as a result. ABataev: Yes, it was about the comment. Still, the comment does not match the function. It says about…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions "horizontal 128-bit operations" means "operations with high and low 64-bit parts". Updating comment to be more clear. anton-afanasyev: "horizontal 128-bit operations" means "operations with high and low 64-bit parts". Updating…
		// 64-bit parts of vector register. This also allows vectorizer
		// to use partial vector operations.
		unsigned X86TTIImpl::getMinVectorRegisterBitWidth() const {
		return 64;
		lebedev.riUnsubmitted Not Done Reply Inline Actions I lack context, but will this also handle e.g. `_mm_hadd_epi16()` ? lebedev.ri: I lack context, but will this also handle e.g. `_mm_hadd_epi16()` ?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, phaddw is also handled. anton-afanasyev: Yes, phaddw is also handled.
		lebedev.riUnsubmitted Done Reply Inline Actions Then i don't understand the function name. (yes, i understand that it is a pre-existing hook) `_mm_hadd_epi16()` operates on `i16`, how come `64` is the right pick here? lebedev.ri: Then i don't understand the function name. (yes, i understand that it is a pre-existing hook)…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions 64 is the size of whole vector with 16-bit elements packed in it. Here is a piece of manual: PHADDW (with 64-bit operands): mm1[15-0] = mm1[31-16] + mm1[15-0]; mm1[31-16] = mm1[63-48] + mm1[47-32]; mm1[47-32] = mm2/m64[31-16] + mm2/m64[15-0]; mm1[63-48] = mm2/m64[63-48] + mm2/m64[47-32]; anton-afanasyev: 64 is the size of whole vector with 16-bit elements packed in it. Here is a piece of manual…
		}

unsigned X86TTIImpl::getLoadStoreVecRegBitWidth(unsigned) const {		unsigned X86TTIImpl::getLoadStoreVecRegBitWidth(unsigned) const {
return getRegisterBitWidth(true);		return getRegisterBitWidth(true);
}		}

unsigned X86TTIImpl::getMaxInterleaveFactor(unsigned VF) {		unsigned X86TTIImpl::getMaxInterleaveFactor(unsigned VF) {
// If the loop will not be vectorized, don't interleave the loop.		// If the loop will not be vectorized, don't interleave the loop.
// Let regular unroll to unroll the loop, which saves the overflow		// Let regular unroll to unroll the loop, which saves the overflow
// check and memory check cost.		// check and memory check cost.
▲ Show 20 Lines • Show All 3,112 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/addsub.ll

	Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines
	; fc[1] = fa[1]-fb[1];			; fc[1] = fa[1]-fb[1];
	; fc[2] = fa[2]+fb[2];			; fc[2] = fa[2]+fb[2];
	; fc[3] = fb[3]-fa[3];			; fc[3] = fb[3]-fa[3];
	; In the above code we can swap the 1st and 2nd operation as fadd is commutative			; In the above code we can swap the 1st and 2nd operation as fadd is commutative
	; but not 2nd or 4th as fsub is not commutative.			; but not 2nd or 4th as fsub is not commutative.

	define void @no_vec_shuff_reorder() #0 {			define void @no_vec_shuff_reorder() #0 {
	; CHECK-LABEL: @no_vec_shuff_reorder(			; CHECK-LABEL: @no_vec_shuff_reorder(
	; CHECK-NEXT: [[TMP1:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([4 x float]* @fa to <2 x float>*), align 4
	; CHECK-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> bitcast ([4 x float]* @fb to <2 x float>*), align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd float [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: store float [[TMP3]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 0), align 4			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP4]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP5:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4			; CHECK-NEXT: store <2 x float> [[TMP5]], <2 x float>* bitcast ([4 x float]* @fc to <2 x float>*), align 4
	; CHECK-NEXT: [[TMP6:%.*]] = fsub float [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 2) to <2 x float>*), align 4
	; CHECK-NEXT: store float [[TMP6]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 1), align 4			; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 2) to <2 x float>*), align 4
	; CHECK-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 2), align 4			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 2), align 4			; CHECK-NEXT: [[TMP9:%.*]] = fsub <2 x float> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd float [[TMP7]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store float [[TMP9]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 2), align 4			; CHECK-NEXT: store <2 x float> [[TMP10]], <2 x float>* bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 2) to <2 x float>*), align 4
	; CHECK-NEXT: [[TMP10:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 3), align 4
	; CHECK-NEXT: [[TMP11:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 3), align 4
	; CHECK-NEXT: [[TMP12:%.*]] = fsub float [[TMP10]], [[TMP11]]
	; CHECK-NEXT: store float [[TMP12]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 3), align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4			%1 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4
	%2 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4			%2 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4
	%3 = fadd float %1, %2			%3 = fadd float %1, %2
	store float %3, float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 0), align 4			store float %3, float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 0), align 4
	%4 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4			%4 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4
	%5 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4			%5 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4
	Show All 16 Lines

test/Transforms/SLPVectorizer/X86/alternate-fp.ll

	Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {			define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
	; SSE-LABEL: @fmul_fdiv_v4f32_const(			; SSE-LABEL: @fmul_fdiv_v4f32_const(
	; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; SSE-NEXT: ret <4 x float> [[TMP1]]			; SSE-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v4f32_const(			; SLM-LABEL: @fmul_fdiv_v4f32_const(
	; SLM-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i32 0			; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2
	; SLM-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
	; SLM-NEXT: [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; SLM-NEXT: [[AB0:%.*]] = fmul float [[A0]], 2.000000e+00			; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>
				; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
	; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00			; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
	; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[AB0]], i32 0			; SLM-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[A1]], i32 1			; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0
				; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
				; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP4]], i32 1
	; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2
	; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3			; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
	; SLM-NEXT: ret <4 x float> [[R3]]			; SLM-NEXT: ret <4 x float> [[R3]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v4f32_const(			; AVX-LABEL: @fmul_fdiv_v4f32_const(
	; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>			; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
	; AVX-NEXT: ret <4 x float> [[TMP1]]			; AVX-NEXT: ret <4 x float> [[TMP1]]
	;			;
	Show All 18 Lines

test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; SSE-LABEL: @add_mul_v4i32(		; SSE-LABEL: @add_mul_v4i32(
; SSE-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]
; SSE-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP1]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>		; SSE-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP1]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>
; SSE-NEXT: ret <4 x i32> [[R3]]		; SSE-NEXT: ret <4 x i32> [[R3]]
;		;
; SLM-LABEL: @add_mul_v4i32(		; SLM-LABEL: @add_mul_v4i32(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x i32> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <4 x i32> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <4 x i32> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <4 x i32> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <4 x i32> [[B]], i32 3		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x i32> [[B]], i32 3
; SLM-NEXT: [[AB0:%.*]] = mul i32 [[A0]], [[B0]]		; SLM-NEXT: [[AB0:%.*]] = mul i32 [[A0]], [[B0]]
; SLM-NEXT: [[AB1:%.*]] = add i32 [[A1]], [[B1]]		; SLM-NEXT: [[TMP1:%.*]] = add <4 x i32> [[A]], [[B]]
; SLM-NEXT: [[AB2:%.*]] = add i32 [[A2]], [[B2]]
; SLM-NEXT: [[AB3:%.*]] = mul i32 [[A3]], [[B3]]		; SLM-NEXT: [[AB3:%.*]] = mul i32 [[A3]], [[B3]]
; SLM-NEXT: [[R0:%.*]] = insertelement <4 x i32> undef, i32 [[AB0]], i32 0		; SLM-NEXT: [[R0:%.*]] = insertelement <4 x i32> undef, i32 [[AB0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <4 x i32> [[R0]], i32 [[AB1]], i32 1		; SLM-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x i32> [[R1]], i32 [[AB2]], i32 2		; SLM-NEXT: [[R1:%.*]] = insertelement <4 x i32> [[R0]], i32 [[TMP2]], i32 1
		; SLM-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 2
		; SLM-NEXT: [[R2:%.*]] = insertelement <4 x i32> [[R1]], i32 [[TMP3]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x i32> [[R2]], i32 [[AB3]], i32 3		; SLM-NEXT: [[R3:%.*]] = insertelement <4 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: ret <4 x i32> [[R3]]		; SLM-NEXT: ret <4 x i32> [[R3]]
;		;
; AVX-LABEL: @add_mul_v4i32(		; AVX-LABEL: @add_mul_v4i32(
; AVX-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]		; AVX-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]		; AVX-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]
; AVX-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP1]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>		; AVX-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP1]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>
; AVX-NEXT: ret <4 x i32> [[R3]]		; AVX-NEXT: ret <4 x i32> [[R3]]
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; SSE-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]		; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]		; SSE-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; SSE-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP3]], i32 2
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP4]], i32 3
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i32 4
		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP5]], i32 4
		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP2]], i32 5
		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP6]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	;
%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4		%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {		define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {
; CHECK-LABEL: @sdiv_v8i32_undefs(		; SSE-LABEL: @sdiv_v8i32_undefs(
; CHECK-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1		; SSE-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
; CHECK-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; CHECK-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; CHECK-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; CHECK-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; CHECK-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; CHECK-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4		; SSE-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
; CHECK-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8		; SSE-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
; CHECK-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16		; SSE-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
; CHECK-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4		; SSE-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
; CHECK-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8		; SSE-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
; CHECK-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16		; SSE-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; CHECK-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
		;
		; SLM-LABEL: @sdiv_v8i32_undefs(
		; SLM-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
		; SLM-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; SLM-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; SLM-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
		; SLM-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
		; SLM-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
		; SLM-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
		; SLM-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
		; SLM-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
		; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1
		; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
		; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX1-LABEL: @sdiv_v8i32_undefs(
		; AVX1-NEXT: [[A1:%.]] = extractelement <8 x i32> [[A:%.]], i32 1
		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; AVX1-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
		; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
		; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; AVX1-NEXT: [[AB1:%.*]] = sdiv i32 [[A1]], 4
		; AVX1-NEXT: [[AB2:%.*]] = sdiv i32 [[A2]], 8
		; AVX1-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
		; AVX1-NEXT: [[AB5:%.*]] = sdiv i32 [[A5]], 4
		; AVX1-NEXT: [[AB6:%.*]] = sdiv i32 [[A6]], 8
		; AVX1-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1
		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
		; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; AVX1-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX2-LABEL: @sdiv_v8i32_undefs(
		; AVX2-NEXT: [[A3:%.]] = extractelement <8 x i32> [[A:%.]], i32 3
		; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 1, i32 2>
		; AVX2-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 4, i32 8>
		; AVX2-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 5, i32 6>
		; AVX2-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 4, i32 8>
		; AVX2-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
		; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
		; AVX2-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP6]], i32 2
		; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP4]], i32 0
		; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP7]], i32 5
		; AVX2-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP4]], i32 1
		; AVX2-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[TMP8]], i32 6
		; AVX2-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; AVX2-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX512-LABEL: @sdiv_v8i32_undefs(
		; AVX512-NEXT: [[A3:%.]] = extractelement <8 x i32> [[A:%.]], i32 3
		; AVX512-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 1, i32 2>
		; AVX512-NEXT: [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 4, i32 8>
		; AVX512-NEXT: [[AB3:%.*]] = sdiv i32 [[A3]], 16
		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 5, i32 6>
		; AVX512-NEXT: [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 4, i32 8>
		; AVX512-NEXT: [[AB7:%.*]] = sdiv i32 [[A7]], 16
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
		; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
		; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP6]], i32 2
		; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP4]], i32 0
		; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP7]], i32 5
		; AVX512-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP4]], i32 1
		; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[TMP8]], i32 6
		; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 19 Lines

test/Transforms/SLPVectorizer/X86/crash_7zip.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 = type { %struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333, i16, i8, i8*, i32, i32, i64, i64, i32, i32, i32, [4 x i32], i32, i32, i32, i32, i32, [20 x i8] }			%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 = type { %struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333, i16, i8, i8*, i32, i32, i64, i64, i32, i32, i32, [4 x i32], i32, i32, i32, i32, i32, [20 x i8] }
	%struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333 = type { i32, i32, i32, i32 }			%struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333 = type { i32, i32, i32, i32 }

	define fastcc void @LzmaDec_DecodeReal2(%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p) {			define fastcc void @LzmaDec_DecodeReal2(%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p) {
	; CHECK-LABEL: @LzmaDec_DecodeReal2(			; CHECK-LABEL: @LzmaDec_DecodeReal2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[RANGE20_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334:%.]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P:%.*]], i64 0, i32 4			; CHECK-NEXT: [[RANGE20_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334:%.]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P:%.*]], i64 0, i32 4
	; CHECK-NEXT: [[CODE21_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 [[P]], i64 0, i32 5
	; CHECK-NEXT: br label [[DO_BODY66_I:%.*]]			; CHECK-NEXT: br label [[DO_BODY66_I:%.*]]
	; CHECK: do.body66.i:			; CHECK: do.body66.i:
	; CHECK-NEXT: [[RANGE_2_I:%.]] = phi i32 [ [[RANGE_4_I:%.]], [[DO_COND_I:%.]] ], [ undef, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i32> [ [[TMP5:%.]], [[DO_COND_I:%.]] ], [ undef, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[CODE_2_I:%.]] = phi i32 [ [[CODE_4_I:%.]], [[DO_COND_I]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[TMP1:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP0]]
	; CHECK-NEXT: [[DOTRANGE_2_I:%.*]] = select i1 undef, i32 undef, i32 [[RANGE_2_I]]			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; CHECK-NEXT: [[DOTCODE_2_I:%.*]] = select i1 undef, i32 undef, i32 [[CODE_2_I]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 1
	; CHECK-NEXT: br i1 undef, label [[DO_COND_I]], label [[IF_ELSE_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[DO_COND_I]], label [[IF_ELSE_I:%.*]]
	; CHECK: if.else.i:			; CHECK: if.else.i:
	; CHECK-NEXT: [[SUB91_I:%.*]] = sub i32 [[DOTRANGE_2_I]], undef			; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], undef
	; CHECK-NEXT: [[SUB92_I:%.*]] = sub i32 [[DOTCODE_2_I]], undef
	; CHECK-NEXT: br label [[DO_COND_I]]			; CHECK-NEXT: br label [[DO_COND_I]]
	; CHECK: do.cond.i:			; CHECK: do.cond.i:
	; CHECK-NEXT: [[RANGE_4_I]] = phi i32 [ [[SUB91_I]], [[IF_ELSE_I]] ], [ undef, [[DO_BODY66_I]] ]			; CHECK-NEXT: [[TMP5]] = phi <2 x i32> [ [[TMP4]], [[IF_ELSE_I]] ], [ [[TMP3]], [[DO_BODY66_I]] ]
	; CHECK-NEXT: [[CODE_4_I]] = phi i32 [ [[SUB92_I]], [[IF_ELSE_I]] ], [ [[DOTCODE_2_I]], [[DO_BODY66_I]] ]
	; CHECK-NEXT: br i1 undef, label [[DO_BODY66_I]], label [[DO_END1006_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[DO_BODY66_I]], label [[DO_END1006_I:%.*]]
	; CHECK: do.end1006.i:			; CHECK: do.end1006.i:
	; CHECK-NEXT: [[DOTRANGE_4_I:%.*]] = select i1 undef, i32 undef, i32 [[RANGE_4_I]]			; CHECK-NEXT: [[TMP6:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP5]]
	; CHECK-NEXT: [[DOTCODE_4_I:%.*]] = select i1 undef, i32 undef, i32 [[CODE_4_I]]			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[RANGE20_I]] to <2 x i32>*
	; CHECK-NEXT: store i32 [[DOTRANGE_4_I]], i32* [[RANGE20_I]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: store i32 [[DOTCODE_4_I]], i32* [[CODE21_I]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%range20.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 4			%range20.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 4
	%code21.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 5			%code21.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 5
	br label %do.body66.i			br label %do.body66.i

	do.body66.i: ; preds = %do.cond.i, %entry			do.body66.i: ; preds = %do.cond.i, %entry
	Show All 23 Lines

test/Transforms/SLPVectorizer/X86/crash_bullet.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }			%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }

	define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {			define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {
	; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(			; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: if.else:			; CHECK: if.else:
	; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0			; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[NUB5:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO]], i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]
	; CHECK: land.lhs.true.i.1:			; CHECK: land.lhs.true.i.1:
	; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]			; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]
	; CHECK: if.then7.1:			; CHECK: if.then7.1:
	; CHECK-NEXT: [[INC_1:%.*]] = add nsw i32 0, 1			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
	; CHECK-NEXT: store i32 [[INC_1]], i32* [[M_NUMCONSTRAINTROWS4]], align 4			; CHECK-NEXT: store <2 x i32> <i32 1, i32 5>, <2 x i32>* [[TMP0]], align 4
	; CHECK-NEXT: [[DEC_1:%.*]] = add nsw i32 6, -1
	; CHECK-NEXT: store i32 [[DEC_1]], i32* [[NUB5]], align 4
	; CHECK-NEXT: br label [[FOR_INC_1]]			; CHECK-NEXT: br label [[FOR_INC_1]]
	; CHECK: for.inc.1:			; CHECK: for.inc.1:
	; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[DEC_1]], [[IF_THEN7_1]] ], [ 6, [[LAND_LHS_TRUE_I_1]] ]			; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = phi i32 [ [[INC_1]], [[IF_THEN7_1]] ], [ 0, [[LAND_LHS_TRUE_I_1]] ]			; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> <i32 1, i32 -1>, [[TMP1]]
	; CHECK-NEXT: [[INC_2:%.*]] = add nsw i32 [[TMP1]], 1			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
	; CHECK-NEXT: store i32 [[INC_2]], i32* [[M_NUMCONSTRAINTROWS4]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4
	; CHECK-NEXT: [[DEC_2:%.*]] = add nsw i32 [[TMP0]], -1
	; CHECK-NEXT: store i32 [[DEC_2]], i32* [[NUB5]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br i1 undef, label %if.else, label %if.then			br i1 undef, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	ret void			ret void

	Show All 27 Lines
	%class.btVector4.7.32.67.92.117.142.177.187.262.282.331 = type { %class.btVector3.5.30.65.90.115.140.175.185.260.280.330 }			%class.btVector4.7.32.67.92.117.142.177.187.262.282.331 = type { %class.btVector3.5.30.65.90.115.140.175.185.260.280.330 }

	define void @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(%class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* %this) {			define void @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(%class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* %this) {
	; CHECK-LABEL: @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(			; CHECK-LABEL: @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332:%.]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* [[THIS:%.*]], i64 0, i32 2, i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332:%.]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* [[THIS:%.*]], i64 0, i32 2, i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332 [[THIS]], i64 0, i32 2, i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332 [[THIS]], i64 0, i32 2, i64 0, i32 0, i64 2
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX36]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX36]], align 4
	; CHECK-NEXT: [[ADD587:%.*]] = fadd float undef, undef			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> undef, float [[TMP0]], i32 1
	; CHECK-NEXT: [[SUB600:%.*]] = fsub float [[ADD587]], undef			; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x float> [[TMP1]], undef
	; CHECK-NEXT: store float [[SUB600]], float* undef, align 4			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x float> [[TMP2]], undef
	; CHECK-NEXT: [[SUB613:%.*]] = fsub float [[ADD587]], [[SUB600]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; CHECK-NEXT: store float [[SUB613]], float* [[ARRAYIDX26]], align 4			; CHECK-NEXT: store float [[TMP4]], float* undef, align 4
	; CHECK-NEXT: [[ADD626:%.*]] = fadd float [[TMP0]], undef			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x float> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[SUB639:%.*]] = fsub float [[ADD626]], undef			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX26]] to <2 x float>*
	; CHECK-NEXT: [[SUB652:%.*]] = fsub float [[ADD626]], [[SUB639]]			; CHECK-NEXT: store <2 x float> [[TMP5]], <2 x float>* [[TMP6]], align 4
	; CHECK-NEXT: store float [[SUB652]], float* [[ARRAYIDX36]], align 4
	; CHECK-NEXT: br i1 undef, label [[IF_ELSE1609:%.]], label [[IF_THEN1595:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_ELSE1609:%.]], label [[IF_THEN1595:%.]]
	; CHECK: if.then1595:			; CHECK: if.then1595:
	; CHECK-NEXT: br i1 undef, label [[RETURN:%.]], label [[FOR_BODY_LR_PH_I_I1702:%.]]			; CHECK-NEXT: br i1 undef, label [[RETURN:%.]], label [[FOR_BODY_LR_PH_I_I1702:%.]]
	; CHECK: for.body.lr.ph.i.i1702:			; CHECK: for.body.lr.ph.i.i1702:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.else1609:			; CHECK: if.else1609:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: return:			; CHECK: return:
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/crash_bullet3.ll

	Show All 18 Lines
	; CHECK-NEXT: br label [[FOR_BODY144:%.*]]			; CHECK-NEXT: br label [[FOR_BODY144:%.*]]
	; CHECK: for.body144:			; CHECK: for.body144:
	; CHECK-NEXT: br i1 undef, label [[FOR_END227:%.*]], label [[FOR_BODY144]]			; CHECK-NEXT: br i1 undef, label [[FOR_END227:%.*]], label [[FOR_BODY144]]
	; CHECK: for.end227:			; CHECK: for.end227:
	; CHECK-NEXT: br i1 undef, label [[FOR_END271:%.]], label [[FOR_BODY233:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_END271:%.]], label [[FOR_BODY233:%.]]
	; CHECK: for.body233:			; CHECK: for.body233:
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY233]], label [[FOR_END271]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY233]], label [[FOR_END271]]
	; CHECK: for.end271:			; CHECK: for.end271:
	; CHECK-NEXT: [[TMP0:%.*]] = phi float [ 0x47EFFFFFE0000000, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]			; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x float> [ <float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000>, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = phi float [ 0x47EFFFFFE0000000, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]			; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x float> undef, [[TMP0]]
	; CHECK-NEXT: [[SUB275:%.*]] = fsub float undef, [[TMP1]]
	; CHECK-NEXT: [[SUB279:%.*]] = fsub float undef, [[TMP0]]
	; CHECK-NEXT: br i1 undef, label [[IF_THEN291:%.*]], label [[RETURN]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN291:%.*]], label [[RETURN]]
	; CHECK: if.then291:			; CHECK: if.then291:
	; CHECK-NEXT: [[MUL292:%.*]] = fmul float [[SUB275]], 5.000000e-01			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> <float 5.000000e-01, float 5.000000e-01>, [[TMP1]]
	; CHECK-NEXT: [[ADD294:%.*]] = fadd float [[TMP1]], [[MUL292]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP0]], [[TMP2]]
	; CHECK-NEXT: [[MUL295:%.*]] = fmul float [[SUB279]], 5.000000e-01
	; CHECK-NEXT: [[ADD297:%.*]] = fadd float [[TMP0]], [[MUL295]]
	; CHECK-NEXT: br i1 undef, label [[IF_END332:%.]], label [[IF_ELSE319:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END332:%.]], label [[IF_ELSE319:%.]]
	; CHECK: if.else319:			; CHECK: if.else319:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN325:%.]], label [[IF_END327:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN325:%.]], label [[IF_END327:%.]]
	; CHECK: if.then325:			; CHECK: if.then325:
	; CHECK-NEXT: br label [[IF_END327]]			; CHECK-NEXT: br label [[IF_END327]]
	; CHECK: if.end327:			; CHECK: if.end327:
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i32 0
				; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float undef, i32 1
	; CHECK-NEXT: br i1 undef, label [[IF_THEN329:%.*]], label [[IF_END332]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN329:%.*]], label [[IF_END332]]
	; CHECK: if.then329:			; CHECK: if.then329:
	; CHECK-NEXT: br label [[IF_END332]]			; CHECK-NEXT: br label [[IF_END332]]
	; CHECK: if.end332:			; CHECK: if.end332:
	; CHECK-NEXT: [[DX272_1:%.*]] = phi float [ [[SUB275]], [[IF_THEN329]] ], [ [[SUB275]], [[IF_END327]] ], [ 0x3F847AE140000000, [[IF_THEN291]] ]			; CHECK-NEXT: [[TMP7:%.*]] = phi <2 x float> [ [[TMP6]], [[IF_THEN329]] ], [ [[TMP6]], [[IF_END327]] ], [ <float 0x3F847AE140000000, float 0x3F847AE140000000>, [[IF_THEN291]] ]
	; CHECK-NEXT: [[DY276_1:%.*]] = phi float [ undef, [[IF_THEN329]] ], [ undef, [[IF_END327]] ], [ 0x3F847AE140000000, [[IF_THEN291]] ]			; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x float> [[TMP3]], [[TMP7]]
	; CHECK-NEXT: [[SUB334:%.*]] = fsub float [[ADD294]], [[DX272_1]]
	; CHECK-NEXT: [[SUB338:%.*]] = fsub float [[ADD297]], [[DY276_1]]
	; CHECK-NEXT: [[ARRAYIDX_I_I606:%.]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113:%.]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113* [[VERTICES:%.*]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX_I_I606:%.]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113:%.]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113* [[VERTICES:%.*]], i64 0, i32 0, i64 0
	; CHECK-NEXT: store float [[SUB334]], float* [[ARRAYIDX_I_I606]], align 4			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ARRAYIDX_I_I606]] to <2 x float>*
	; CHECK-NEXT: [[ARRAYIDX3_I607:%.]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113 [[VERTICES]], i64 0, i32 0, i64 1			; CHECK-NEXT: store <2 x float> [[TMP8]], <2 x float>* [[TMP9]], align 4
	; CHECK-NEXT: store float [[SUB338]], float* [[ARRAYIDX3_I607]], align 4
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: if.then17.1:			; CHECK: if.then17.1:
	; CHECK-NEXT: br label [[IF_END22_1]]			; CHECK-NEXT: br label [[IF_END22_1]]
	; CHECK: if.end22.1:			; CHECK: if.end22.1:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN17_2:%.]], label [[IF_END22_2:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN17_2:%.]], label [[IF_END22_2:%.]]
	; CHECK: if.then17.2:			; CHECK: if.then17.2:
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll

	Show All 21 Lines
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE:%.]], label [[LAND_LHS_TRUE167:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE:%.]], label [[LAND_LHS_TRUE167:%.]]
	; CHECK: land.lhs.true:			; CHECK: land.lhs.true:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN17:%.*]], label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN17:%.*]], label [[LAND_LHS_TRUE167]]
	; CHECK: if.then17:			; CHECK: if.then17:
	; CHECK-NEXT: br i1 undef, label [[IF_END98:%.]], label [[LAND_RHS_LR_PH:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END98:%.]], label [[LAND_RHS_LR_PH:%.]]
	; CHECK: land.rhs.lr.ph:			; CHECK: land.rhs.lr.ph:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end98:			; CHECK: if.end98:
	; CHECK-NEXT: [[FROM299:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE167]], label [[IF_THEN103:%.*]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE167]], label [[IF_THEN103:%.*]]
	; CHECK: if.then103:			; CHECK: if.then103:
	; CHECK-NEXT: [[DOTSUB100:%.*]] = select i1 undef, i32 250, i32 undef			; CHECK-NEXT: [[DOTSUB100:%.*]] = select i1 undef, i32 250, i32 undef
	; CHECK-NEXT: [[MUL114:%.*]] = shl nsw i32 [[DOTSUB100]], 2			; CHECK-NEXT: [[MUL114:%.*]] = shl nsw i32 [[DOTSUB100]], 2
	; CHECK-NEXT: [[FROM1115:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171 undef, i64 0, i32 0			; CHECK-NEXT: [[FROM1115:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 0
	; CHECK-NEXT: [[COND125:%.*]] = select i1 undef, i32 undef, i32 [[MUL114]]			; CHECK-NEXT: [[COND125:%.*]] = select i1 undef, i32 undef, i32 [[MUL114]]
				; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> undef, i32 [[COND125]], i32 0
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[DOTSUB100]], i32 1
	; CHECK-NEXT: br label [[FOR_COND_I:%.*]]			; CHECK-NEXT: br label [[FOR_COND_I:%.*]]
	; CHECK: for.cond.i:			; CHECK: for.cond.i:
	; CHECK-NEXT: [[ROW_0_I:%.]] = phi i32 [ undef, [[LAND_RHS_I874:%.]] ], [ [[DOTSUB100]], [[IF_THEN103]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x i32> [ undef, [[LAND_RHS_I874:%.]] ], [ [[TMP1]], [[IF_THEN103]] ]
	; CHECK-NEXT: [[COL_0_I:%.*]] = phi i32 [ undef, [[LAND_RHS_I874]] ], [ [[COND125]], [[IF_THEN103]] ]
	; CHECK-NEXT: br i1 undef, label [[LAND_RHS_I874]], label [[FOR_END_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[LAND_RHS_I874]], label [[FOR_END_I:%.*]]
	; CHECK: land.rhs.i874:			; CHECK: land.rhs.i874:
	; CHECK-NEXT: br i1 undef, label [[FOR_COND_I]], label [[FOR_END_I]]			; CHECK-NEXT: br i1 undef, label [[FOR_COND_I]], label [[FOR_END_I]]
	; CHECK: for.end.i:			; CHECK: for.end.i:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN_I:%.]], label [[IF_END_I:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN_I:%.]], label [[IF_END_I:%.]]
	; CHECK: if.then.i:			; CHECK: if.then.i:
	; CHECK-NEXT: [[ADD14_I:%.*]] = add nsw i32 [[ROW_0_I]], undef			; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> undef, [[TMP2]]
	; CHECK-NEXT: [[ADD15_I:%.*]] = add nsw i32 [[COL_0_I]], undef
	; CHECK-NEXT: br label [[EXTEND_BW_EXIT:%.*]]			; CHECK-NEXT: br label [[EXTEND_BW_EXIT:%.*]]
	; CHECK: if.end.i:			; CHECK: if.end.i:
	; CHECK-NEXT: [[ADD16_I:%.*]] = add i32 [[COND125]], [[DOTSUB100]]			; CHECK-NEXT: [[ADD16_I:%.*]] = add i32 [[COND125]], [[DOTSUB100]]
	; CHECK-NEXT: [[CMP26514_I:%.*]] = icmp slt i32 [[ADD16_I]], 0			; CHECK-NEXT: [[CMP26514_I:%.*]] = icmp slt i32 [[ADD16_I]], 0
	; CHECK-NEXT: br i1 [[CMP26514_I]], label [[FOR_END33_I:%.]], label [[FOR_BODY28_LR_PH_I:%.]]			; CHECK-NEXT: br i1 [[CMP26514_I]], label [[FOR_END33_I:%.]], label [[FOR_BODY28_LR_PH_I:%.]]
	; CHECK: for.body28.lr.ph.i:			; CHECK: for.body28.lr.ph.i:
	; CHECK-NEXT: br label [[FOR_END33_I]]			; CHECK-NEXT: br label [[FOR_END33_I]]
	; CHECK: for.end33.i:			; CHECK: for.end33.i:
	; CHECK-NEXT: br i1 undef, label [[FOR_END58_I:%.]], label [[FOR_BODY52_LR_PH_I:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_END58_I:%.]], label [[FOR_BODY52_LR_PH_I:%.]]
	; CHECK: for.body52.lr.ph.i:			; CHECK: for.body52.lr.ph.i:
	; CHECK-NEXT: br label [[FOR_END58_I]]			; CHECK-NEXT: br label [[FOR_END58_I]]
	; CHECK: for.end58.i:			; CHECK: for.end58.i:
	; CHECK-NEXT: br label [[WHILE_COND260_I:%.*]]			; CHECK-NEXT: br label [[WHILE_COND260_I:%.*]]
	; CHECK: while.cond260.i:			; CHECK: while.cond260.i:
	; CHECK-NEXT: br i1 undef, label [[LAND_RHS263_I:%.]], label [[WHILE_END275_I:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_RHS263_I:%.]], label [[WHILE_END275_I:%.]]
	; CHECK: land.rhs263.i:			; CHECK: land.rhs263.i:
	; CHECK-NEXT: br i1 undef, label [[WHILE_COND260_I]], label [[WHILE_END275_I]]			; CHECK-NEXT: br i1 undef, label [[WHILE_COND260_I]], label [[WHILE_END275_I]]
	; CHECK: while.end275.i:			; CHECK: while.end275.i:
	; CHECK-NEXT: br label [[EXTEND_BW_EXIT]]			; CHECK-NEXT: br label [[EXTEND_BW_EXIT]]
	; CHECK: extend_bw.exit:			; CHECK: extend_bw.exit:
	; CHECK-NEXT: [[ADD14_I1262:%.*]] = phi i32 [ [[ADD14_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]			; CHECK-NEXT: [[TMP4:%.*]] = phi <2 x i32> [ [[TMP3]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
	; CHECK-NEXT: [[ADD15_I1261:%.*]] = phi i32 [ [[ADD15_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
	; CHECK-NEXT: br i1 false, label [[IF_THEN157:%.*]], label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br i1 false, label [[IF_THEN157:%.*]], label [[LAND_LHS_TRUE167]]
	; CHECK: if.then157:			; CHECK: if.then157:
	; CHECK-NEXT: [[ADD158:%.*]] = add nsw i32 [[ADD14_I1262]], 1			; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> <i32 1, i32 1>, [[TMP4]]
	; CHECK-NEXT: store i32 [[ADD158]], i32* [[FROM299]], align 4			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[FROM1115]] to <2 x i32>*
	; CHECK-NEXT: [[ADD160:%.*]] = add nsw i32 [[ADD15_I1261]], 1			; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: store i32 [[ADD160]], i32* [[FROM1115]], align 4
	; CHECK-NEXT: br label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE167]]
	; CHECK: land.lhs.true167:			; CHECK: land.lhs.true167:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: for.inc603:			; CHECK: for.inc603:
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_END605]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_END605]]
	; CHECK: for.end605:			; CHECK: for.end605:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: return:			; CHECK: return:
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/fptosi.ll

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f64_8i8() #0 {		define void @fptosi_8f64_8i8() #0 {
; CHECK-LABEL: @fptosi_8f64_8i8(		; SSE-LABEL: @fptosi_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX-LABEL: @fptosi_8f64_8i8(
		; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
		; AVX-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
		; AVX-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	;
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f32_8i8() #0 {		define void @fptosi_8f32_8i8() #0 {
; CHECK-LABEL: @fptosi_8f32_8i8(		; CHECK-LABEL: @fptosi_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; CHECK-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; CHECK-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/fptoui.ll

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f64_8i8() #0 {		define void @fptoui_8f64_8i8() #0 {
; CHECK-LABEL: @fptoui_8f64_8i8(		; SSE-LABEL: @fptoui_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; SSE-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; SSE-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		; SSE-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		; SSE-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		; SSE-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		; SSE-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8		; SSE-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @fptoui_8f64_8i8(
		; AVX256NODQ-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
		; AVX256NODQ-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
		; AVX256NODQ-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
		; AVX256NODQ-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
		; AVX256NODQ-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
		; AVX256NODQ-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
		; AVX256NODQ-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8
		; AVX256NODQ-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8
		; AVX256NODQ-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8
		; AVX256NODQ-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8
		; AVX256NODQ-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8
		; AVX256NODQ-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8
		; AVX256NODQ-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8
		; AVX256NODQ-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
		; AVX256NODQ-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @fptoui_8f64_8i8(
		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
		; AVX512-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @fptoui_8f64_8i8(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
		; AVX256DQ-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8		%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	;
store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2		store i16 %cvt4, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 4), align 2
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f32_8i8() #0 {		define void @fptoui_8f32_8i8() #0 {
; CHECK-LABEL: @fptoui_8f32_8i8(		; SSE-LABEL: @fptoui_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		; SSE-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		; SSE-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		; SSE-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4		; SSE-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8		; SSE-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8		; SSE-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8		; SSE-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8		; SSE-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8		; SSE-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8		; SSE-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8		; SSE-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8		; SSE-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1		; SSE-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1		; SSE-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1		; SSE-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1		; SSE-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1		; SSE-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1		; SSE-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1		; SSE-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1		; SSE-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX-LABEL: @fptoui_8f32_8i8(
		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
		; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i8>
		; AVX-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
		; AVX-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4		%a6 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
Show All 21 Lines

test/Transforms/SLPVectorizer/X86/insertvalue.ll

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	top:
ret void		ret void
}		}

; Almost identical to previous test, but for type that should NOT be vectorized.		; Almost identical to previous test, but for type that should NOT be vectorized.
;		;
define void @julia_load_array_of_i16([4 x i16]* %a, [4 x i16]* %b, [4 x i16]* %c) {		define void @julia_load_array_of_i16([4 x i16]* %a, [4 x i16]* %b, [4 x i16]* %c) {
; CHECK-LABEL: @julia_load_array_of_i16(		; CHECK-LABEL: @julia_load_array_of_i16(
; CHECK-NEXT: top:		; CHECK-NEXT: top:
; CHECK-NEXT: [[A_ARR:%.]] = load [4 x i16], [4 x i16] [[A:%.*]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast [4 x i16] [[A:%.]] to <4 x i16>
; CHECK-NEXT: [[A0:%.*]] = extractvalue [4 x i16] [[A_ARR]], 0		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> [[TMP0]], align 4
; CHECK-NEXT: [[A2:%.*]] = extractvalue [4 x i16] [[A_ARR]], 2		; CHECK-NEXT: [[A_ARR:%.]] = load [4 x i16], [4 x i16] [[A]], align 4
; CHECK-NEXT: [[A1:%.*]] = extractvalue [4 x i16] [[A_ARR]], 1		; CHECK-NEXT: [[TMP2:%.]] = bitcast [4 x i16] [[B:%.]] to <4 x i16>
; CHECK-NEXT: [[B_ARR:%.]] = load [4 x i16], [4 x i16] [[B:%.*]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> [[TMP2]], align 4
; CHECK-NEXT: [[B0:%.*]] = extractvalue [4 x i16] [[B_ARR]], 0		; CHECK-NEXT: [[B_ARR:%.]] = load [4 x i16], [4 x i16] [[B]], align 4
; CHECK-NEXT: [[B2:%.*]] = extractvalue [4 x i16] [[B_ARR]], 2		; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i16> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[B1:%.*]] = extractvalue [4 x i16] [[B_ARR]], 1		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i16> [[TMP4]], i32 0
; CHECK-NEXT: [[A3:%.*]] = extractvalue [4 x i16] [[A_ARR]], 3		; CHECK-NEXT: [[C_ARR0:%.*]] = insertvalue [4 x i16] undef, i16 [[TMP5]], 0
; CHECK-NEXT: [[C1:%.*]] = sub i16 [[A1]], [[B1]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i16> [[TMP4]], i32 1
; CHECK-NEXT: [[B3:%.*]] = extractvalue [4 x i16] [[B_ARR]], 3		; CHECK-NEXT: [[C_ARR1:%.*]] = insertvalue [4 x i16] [[C_ARR0]], i16 [[TMP6]], 1
; CHECK-NEXT: [[C0:%.*]] = sub i16 [[A0]], [[B0]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i16> [[TMP4]], i32 2
; CHECK-NEXT: [[C2:%.*]] = sub i16 [[A2]], [[B2]]		; CHECK-NEXT: [[C_ARR2:%.*]] = insertvalue [4 x i16] [[C_ARR1]], i16 [[TMP7]], 2
; CHECK-NEXT: [[C_ARR0:%.*]] = insertvalue [4 x i16] undef, i16 [[C0]], 0		; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i16> [[TMP4]], i32 3
; CHECK-NEXT: [[C_ARR1:%.*]] = insertvalue [4 x i16] [[C_ARR0]], i16 [[C1]], 1		; CHECK-NEXT: [[C_ARR3:%.*]] = insertvalue [4 x i16] [[C_ARR2]], i16 [[TMP8]], 3
; CHECK-NEXT: [[C3:%.*]] = sub i16 [[A3]], [[B3]]
; CHECK-NEXT: [[C_ARR2:%.*]] = insertvalue [4 x i16] [[C_ARR1]], i16 [[C2]], 2
; CHECK-NEXT: [[C_ARR3:%.*]] = insertvalue [4 x i16] [[C_ARR2]], i16 [[C3]], 3
; CHECK-NEXT: store [4 x i16] [[C_ARR3]], [4 x i16]* [[C:%.*]], align 4		; CHECK-NEXT: store [4 x i16] [[C_ARR3]], [4 x i16]* [[C:%.*]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
top:		top:
%a_arr = load [4 x i16], [4 x i16]* %a, align 4		%a_arr = load [4 x i16], [4 x i16]* %a, align 4
%a0 = extractvalue [4 x i16] %a_arr, 0		%a0 = extractvalue [4 x i16] %a_arr, 0
%a2 = extractvalue [4 x i16] %a_arr, 2		%a2 = extractvalue [4 x i16] %a_arr, 2
%a1 = extractvalue [4 x i16] %a_arr, 1		%a1 = extractvalue [4 x i16] %a_arr, 1
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @foo3(			; CHECK-LABEL: @foo3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[REORDER_SHUFFLE]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[REORDER_SHUFFLE]], i32 3
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> undef, float [[TMP0]], i32 0
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[TMP18:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = phi float [ [[TMP3]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP7:%.]] = phi <4 x float> [ [[REORDER_SHUFFLE]], [[ENTRY]] ], [ [[TMP23:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[TMP14:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP8:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[REORDER_SHUFFLE]], [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP6]], 7.000000e+00
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP5]], 7.000000e+00			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6:%.*]] = fadd float [[TMP9]], [[MUL]]
	; CHECK-NEXT: [[TMP7:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP10:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP7]]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP8:%.]] = load float, float [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP11:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*
	; CHECK-NEXT: [[TMP10:%.]] = load <2 x float>, <2 x float> [[TMP9]], align 4			; CHECK-NEXT: [[TMP13:%.]] = load <2 x float>, <2 x float> [[TMP12]], align 4
	; CHECK-NEXT: [[REORDER_SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[REORDER_SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP13]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x float> <float 1.100000e+01, float 1.000000e+01, float 9.000000e+00, float undef>, float [[TMP4]], i32 3			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP12]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> <float 1.100000e+01, float 1.000000e+01, float 9.000000e+00, float undef>, float [[TMP14]], i32 3
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x float> undef, float [[TMP12]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 0
	; CHECK-NEXT: [[TMP14]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 1			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x float> undef, float [[TMP16]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP14]], i32 1			; CHECK-NEXT: [[TMP18]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP8]], i32 2			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x float> [[TMP17]], float [[TMP18]], i32 1
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x float> [[TMP16]], float 8.000000e+00, i32 3			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP11]], i32 2
	; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[TMP11]], [[TMP17]]			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float 8.000000e+00, i32 3
	; CHECK-NEXT: [[TMP19]] = fadd <4 x float> [[TMP6]], [[TMP18]]			; CHECK-NEXT: [[TMP22:%.*]] = fmul <4 x float> [[TMP15]], [[TMP21]]
	; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP23]] = fadd <4 x float> [[TMP7]], [[TMP22]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP20]], 121			; CHECK-NEXT: [[TMP24:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP24]], 121
				; CHECK-NEXT: [[TMP25:%.*]] = insertelement <2 x float> undef, float [[ADD6]], i32 0
				; CHECK-NEXT: [[TMP26]] = insertelement <2 x float> [[TMP25]], float [[TMP16]], i32 1
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP19]], i32 3			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x float> [[TMP23]], i32 3
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP27]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP19]], i32 2			; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x float> [[TMP23]], i32 2
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP28]]
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP19]], i32 1			; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x float> [[TMP23]], i32 1
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP29]]
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x float> [[TMP19]], i32 0			; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x float> [[TMP23]], i32 0
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP30]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer -pass-remarks-output=%t < %s \| FileCheck %s			; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer -slp-min-reg-size=128 -pass-remarks-output=%t < %s \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	define i32 @foo(i32* nocapture readonly %diff) #0 {			define i32 @foo(i32* nocapture readonly %diff) #0 {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[M2:%.*]] = alloca [8 x [8 x i32]], align 16			; CHECK-NEXT: [[M2:%.*]] = alloca [8 x [8 x i32]], align 16
	; CHECK-NEXT: [[TMP0:%.]] = bitcast [8 x [8 x i32]] [[M2]] to i8*			; CHECK-NEXT: [[TMP0:%.]] = bitcast [8 x [8 x i32]] [[M2]] to i8*
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/reorder_phi.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s

	%struct.complex = type { float, float }			%struct.complex = type { float, float }

	define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {			define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP20:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP25:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP19:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP24:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP3:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP18:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[A]], i64 [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP3]] to <2 x float>*
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[A]], i64 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> [[TMP5]], align 4
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP8:%.]] = load float, float [[TMP7]], align 4
	; CHECK-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[TMP9]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x float> undef, float [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = fmul float [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP13:%.*]] = fmul float [[TMP7]], [[TMP11]]			; CHECK-NEXT: [[TMP13:%.*]] = fmul <2 x float> [[TMP6]], [[TMP12]]
	; CHECK-NEXT: [[TMP14:%.*]] = fsub float [[TMP12]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = fmul float [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> undef, float [[TMP14]], i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = fmul float [[TMP5]], [[TMP11]]			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP16]], i32 1
	; CHECK-NEXT: [[TMP18]] = fadd float [[TMP3]], [[TMP14]]			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <2 x float> undef, float [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP19]] = fadd float [[TMP2]], [[TMP17]]			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x float> [[TMP18]], float [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP20]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP20:%.*]] = fmul <2 x float> [[TMP17]], [[TMP19]]
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[TMP20]], [[TMP0]]			; CHECK-NEXT: [[TMP21:%.*]] = fsub <2 x float> [[TMP13]], [[TMP20]]
	; CHECK-NEXT: br i1 [[TMP21]], label [[EXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: [[TMP22:%.*]] = fadd <2 x float> [[TMP13]], [[TMP20]]
				; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <2 x float> [[TMP21]], <2 x float> [[TMP22]], <2 x i32> <i32 0, i32 3>
				; CHECK-NEXT: [[TMP24]] = fadd <2 x float> [[TMP2]], [[TMP23]]
				; CHECK-NEXT: [[TMP25]] = add nuw nsw i64 [[TMP1]], 1
				; CHECK-NEXT: [[TMP26:%.*]] = icmp eq i64 [[TMP25]], [[TMP0]]
				; CHECK-NEXT: br i1 [[TMP26]], label [[EXIT:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0
	; CHECK-NEXT: store float [[TMP18]], float* [[TMP22]], align 4			; CHECK-NEXT: [[TMP28:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT]], i32 0, i32 1
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT]], i32 0, i32 1			; CHECK-NEXT: [[TMP29:%.]] = bitcast float [[TMP27]] to <2 x float>*
	; CHECK-NEXT: store float [[TMP19]], float* [[TMP23]], align 4			; CHECK-NEXT: store <2 x float> [[TMP24]], <2 x float>* [[TMP29]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = add i64 256, 0			%0 = add i64 256, 0
	br label %loop			br label %loop

	loop:			loop:
	%1 = phi i64 [ 0, %entry ], [ %20, %loop ]			%1 = phi i64 [ 0, %entry ], [ %20, %loop ]
	Show All 30 Lines

test/Transforms/SLPVectorizer/X86/resched.ll

	Show All 32 Lines
	; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10			; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10
	; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11			; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> undef, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> undef, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[CONV31_I]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[CONV31_I]], i32 2			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[CONV31_I]], i32 3			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP14:%.*]] = lshr <4 x i32> [[TMP13]], <i32 9, i32 10, i32 11, i32 12>			; CHECK-NEXT: [[TMP14:%.*]] = lshr <4 x i32> [[TMP13]], <i32 9, i32 10, i32 11, i32 12>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12			; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12
	; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13
	; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13			; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13
	; CHECK-NEXT: [[SHR_13_I_I:%.*]] = lshr i32 [[CONV31_I]], 14			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[CONV31_I]], i32 0
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[CONV31_I]], i32 1
				; CHECK-NEXT: [[TMP17:%.*]] = lshr <2 x i32> [[TMP16]], <i32 13, i32 14>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14			; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14
	; CHECK-NEXT: [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15			; CHECK-NEXT: [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> undef, i32 [[SUB_I]], i32 0			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <16 x i32> undef, i32 [[SUB_I]], i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP16]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <16 x i32> [[TMP18]], i32 [[TMP19]], i32 1
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <8 x i32> [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP18]], i32 2			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[TMP21]], i32 2
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP23:%.*]] = extractelement <8 x i32> [[TMP9]], i32 2
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP20]], i32 3			; CHECK-NEXT: [[TMP24:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[TMP23]], i32 3
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP9]], i32 3
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP22]], i32 4			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <16 x i32> [[TMP24]], i32 [[TMP25]], i32 4
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP9]], i32 4			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[TMP9]], i32 4
	; CHECK-NEXT: [[TMP25:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP24]], i32 5			; CHECK-NEXT: [[TMP28:%.*]] = insertelement <16 x i32> [[TMP26]], i32 [[TMP27]], i32 5
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <8 x i32> [[TMP9]], i32 5			; CHECK-NEXT: [[TMP29:%.*]] = extractelement <8 x i32> [[TMP9]], i32 5
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <16 x i32> [[TMP25]], i32 [[TMP26]], i32 6			; CHECK-NEXT: [[TMP30:%.*]] = insertelement <16 x i32> [[TMP28]], i32 [[TMP29]], i32 6
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP9]], i32 6			; CHECK-NEXT: [[TMP31:%.*]] = extractelement <8 x i32> [[TMP9]], i32 6
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <16 x i32> [[TMP27]], i32 [[TMP28]], i32 7			; CHECK-NEXT: [[TMP32:%.*]] = insertelement <16 x i32> [[TMP30]], i32 [[TMP31]], i32 7
	; CHECK-NEXT: [[TMP30:%.*]] = extractelement <8 x i32> [[TMP9]], i32 7			; CHECK-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP9]], i32 7
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <16 x i32> [[TMP29]], i32 [[TMP30]], i32 8			; CHECK-NEXT: [[TMP34:%.*]] = insertelement <16 x i32> [[TMP32]], i32 [[TMP33]], i32 8
	; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0			; CHECK-NEXT: [[TMP35:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <16 x i32> [[TMP31]], i32 [[TMP32]], i32 9			; CHECK-NEXT: [[TMP36:%.*]] = insertelement <16 x i32> [[TMP34]], i32 [[TMP35]], i32 9
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1			; CHECK-NEXT: [[TMP37:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <16 x i32> [[TMP33]], i32 [[TMP34]], i32 10			; CHECK-NEXT: [[TMP38:%.*]] = insertelement <16 x i32> [[TMP36]], i32 [[TMP37]], i32 10
	; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2			; CHECK-NEXT: [[TMP39:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <16 x i32> [[TMP35]], i32 [[TMP36]], i32 11			; CHECK-NEXT: [[TMP40:%.*]] = insertelement <16 x i32> [[TMP38]], i32 [[TMP39]], i32 11
	; CHECK-NEXT: [[TMP38:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3			; CHECK-NEXT: [[TMP41:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3
	; CHECK-NEXT: [[TMP39:%.*]] = insertelement <16 x i32> [[TMP37]], i32 [[TMP38]], i32 12			; CHECK-NEXT: [[TMP42:%.*]] = insertelement <16 x i32> [[TMP40]], i32 [[TMP41]], i32 12
	; CHECK-NEXT: [[TMP40:%.*]] = insertelement <16 x i32> [[TMP39]], i32 [[SHR_12_I_I]], i32 13			; CHECK-NEXT: [[TMP43:%.*]] = extractelement <2 x i32> [[TMP17]], i32 0
	; CHECK-NEXT: [[TMP41:%.*]] = insertelement <16 x i32> [[TMP40]], i32 [[SHR_13_I_I]], i32 14			; CHECK-NEXT: [[TMP44:%.*]] = insertelement <16 x i32> [[TMP42]], i32 [[TMP43]], i32 13
	; CHECK-NEXT: [[TMP42:%.*]] = insertelement <16 x i32> [[TMP41]], i32 [[SHR_14_I_I]], i32 15			; CHECK-NEXT: [[TMP45:%.*]] = extractelement <2 x i32> [[TMP17]], i32 1
	; CHECK-NEXT: [[TMP43:%.*]] = trunc <16 x i32> [[TMP42]] to <16 x i8>			; CHECK-NEXT: [[TMP46:%.*]] = insertelement <16 x i32> [[TMP44]], i32 [[TMP45]], i32 14
	; CHECK-NEXT: [[TMP44:%.*]] = and <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[TMP43]]			; CHECK-NEXT: [[TMP47:%.*]] = insertelement <16 x i32> [[TMP46]], i32 [[SHR_14_I_I]], i32 15
				; CHECK-NEXT: [[TMP48:%.*]] = trunc <16 x i32> [[TMP47]] to <16 x i8>
				; CHECK-NEXT: [[TMP49:%.*]] = and <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[TMP48]]
	; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15			; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15
	; CHECK-NEXT: [[TMP45:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*			; CHECK-NEXT: [[TMP50:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP44]], <16 x i8>* [[TMP45]], align 1			; CHECK-NEXT: store <16 x i8> [[TMP49]], <16 x i8>* [[TMP50]], align 1
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end50.i:			; CHECK: if.end50.i:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end50.i, label %if.then22.i			br i1 undef, label %if.end50.i, label %if.then22.i

	if.then22.i: ; preds = %entry			if.then22.i: ; preds = %entry
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/rgb_phi.ll

	Show All 19 Lines
	; return R+G+B;			; return R+G+B;
	; }			; }

	define float @foo(float* nocapture readonly %A) {			define float @foo(float* nocapture readonly %A) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX1]], align 4			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <2 x float>*
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 2			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[TMP3:%.]] = phi float [ [[TMP0]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi float [ [[TMP0]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[B_032:%.]] = phi float [ [[TMP2]], [[ENTRY]] ], [ [[ADD14:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[G_031:%.]] = phi float [ [[TMP1]], [[ENTRY]] ], [ [[ADD9:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[R_030:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD4:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[R_030:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD4:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
				; CHECK-NEXT: [[TMP4:%.]] = phi <2 x float> [ [[REORDER_SHUFFLE]], [[ENTRY]] ], [ [[TMP9:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP3]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP3]], 7.000000e+00
	; CHECK-NEXT: [[ADD4]] = fadd float [[R_030]], [[MUL]]			; CHECK-NEXT: [[ADD4]] = fadd float [[R_030]], [[MUL]]
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP5:%.*]] = add nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX7]], align 4			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX7]] to <2 x float>*
	; CHECK-NEXT: [[MUL8:%.*]] = fmul float [[TMP5]], 8.000000e+00			; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[ADD9]] = fadd float [[G_031]], [[MUL8]]			; CHECK-NEXT: [[REORDER_SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x float> <float 9.000000e+00, float 8.000000e+00>, [[REORDER_SHUFFLE1]]
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP9]] = fadd <2 x float> [[TMP4]], [[TMP8]]
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX12]], align 4
	; CHECK-NEXT: [[MUL13:%.*]] = fmul float [[TMP7]], 9.000000e+00
	; CHECK-NEXT: [[ADD14]] = fadd float [[B_032]], [[MUL13]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP10:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP8]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP10]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]
	; CHECK: for.body.for.body_crit_edge:			; CHECK: for.body.for.body_crit_edge:
	; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4			; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[ADD4]], [[ADD9]]			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP9]], i32 1
	; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[ADD14]]			; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[ADD4]], [[TMP11]]
				; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i32 0
				; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[TMP12]]
	; CHECK-NEXT: ret float [[ADD17]]			; CHECK-NEXT: ret float [[ADD17]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	Show All 36 Lines

test/Transforms/SLPVectorizer/X86/saxpy.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	}			}

	; Make sure we don't crash on this one.			; Make sure we don't crash on this one.
	define void @SAXPY_crash(i32* noalias nocapture %x, i32* noalias nocapture %y, i64 %i) {			define void @SAXPY_crash(i32* noalias nocapture %x, i32* noalias nocapture %y, i64 %i) {
	; CHECK-LABEL: @SAXPY_crash(			; CHECK-LABEL: @SAXPY_crash(
	; CHECK-NEXT: [[TMP1:%.]] = add i64 [[I:%.]], 1			; CHECK-NEXT: [[TMP1:%.]] = add i64 [[I:%.]], 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[Y:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[Y:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <2 x i32>*
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw i32 undef, [[TMP4]]			; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 4
	; CHECK-NEXT: store i32 [[TMP5]], i32* [[TMP2]], align 4			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <2 x i32> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[I]], 2			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP2]] to <2 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[X]], i64 [[TMP6]]			; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[Y]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw i32 undef, [[TMP9]]
	; CHECK-NEXT: store i32 [[TMP10]], i32* [[TMP7]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = add i64 %i, 1			%1 = add i64 %i, 1
	%2 = getelementptr inbounds i32, i32* %x, i64 %1			%2 = getelementptr inbounds i32, i32* %x, i64 %1
	%3 = getelementptr inbounds i32, i32* %y, i64 %1			%3 = getelementptr inbounds i32, i32* %y, i64 %1
	%4 = load i32, i32* %3, align 4			%4 = load i32, i32* %3, align 4
	%5 = add nsw i32 undef, %4			%5 = add nsw i32 undef, %4
	store i32 %5, i32* %2, align 4			store i32 %5, i32* %2, align 4
	%6 = add i64 %i, 2			%6 = add i64 %i, 2
	%7 = getelementptr inbounds i32, i32* %x, i64 %6			%7 = getelementptr inbounds i32, i32* %x, i64 %6
	%8 = getelementptr inbounds i32, i32* %y, i64 %6			%8 = getelementptr inbounds i32, i32* %y, i64 %6
	%9 = load i32, i32* %8, align 4			%9 = load i32, i32* %8, align 4
	%10 = add nsw i32 undef, %9			%10 = add nsw i32 undef, %9
	store i32 %10, i32* %7, align 4			store i32 %10, i32* %7, align 4
	ret void			ret void
	}			}

test/Transforms/SLPVectorizer/X86/schedule-bundle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4

	define i32 @slp_schedule_bundle() local_unnamed_addr #0 {			define i32 @slp_schedule_bundle() local_unnamed_addr #0 {
	; CHECK-LABEL: @slp_schedule_bundle(			; CHECK-LABEL: @slp_schedule_bundle(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([1 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([1 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31>			; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31>
	; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[TMP1]]
	; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([1 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([1 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0), align 4			; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0) to <2 x i32>*), align 4
	; CHECK-NEXT: [[DOTLOBIT_4:%.*]] = lshr i32 [[TMP3]], 31			; CHECK-NEXT: [[TMP4:%.*]] = lshr <2 x i32> [[TMP3]], <i32 31, i32 31>
	; CHECK-NEXT: [[DOTLOBIT_NOT_4:%.*]] = xor i32 [[DOTLOBIT_4]], 1			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> <i32 1, i32 1>, [[TMP4]]
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_4]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0), align 4			; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* bitcast (i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0) to <2 x i32>*), align 4
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 5, i64 0), align 4
	; CHECK-NEXT: [[DOTLOBIT_5:%.*]] = lshr i32 [[TMP4]], 31
	; CHECK-NEXT: [[DOTLOBIT_NOT_5:%.*]] = xor i32 [[DOTLOBIT_5]], 1
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_5]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 5, i64 0), align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4			%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4
	%.lobit = lshr i32 %0, 31			%.lobit = lshr i32 %0, 31
	%.lobit.not = xor i32 %.lobit, 1			%.lobit.not = xor i32 %.lobit, 1
	store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4			store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4
	%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4			%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4
	Show All 21 Lines

test/Transforms/SLPVectorizer/X86/sext.ll

Show First 20 Lines • Show All 802 Lines • ▼ Show 20 Lines	;
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE2-LABEL: @loadext_4i32_to_4i64(		; SSE2-LABEL: @loadext_4i32_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE2-NEXT: [[TMP3:%.]] = bitcast i32 [[P2]] to <2 x i32>*
; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE2-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE2-NEXT: [[TMP5:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; SSE2-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE2-NEXT: [[TMP6:%.*]] = sext <2 x i32> [[TMP4]] to <2 x i64>
; SSE2-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
; SSE2-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i32_to_4i64(		; SLM-LABEL: @loadext_4i32_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
Show All 9 Lines
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i32_to_4i64(		; AVX1-LABEL: @loadext_4i32_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[P2]] to <2 x i32>*
; AVX1-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; AVX1-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 1
; AVX1-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>		; AVX1-NEXT: [[TMP5:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; AVX1-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64		; AVX1-NEXT: [[TMP6:%.*]] = sext <2 x i32> [[TMP4]] to <2 x i64>
; AVX1-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64		; AVX1-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; AVX1-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
; AVX1-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX2-LABEL: @loadext_4i32_to_4i64(		; AVX2-LABEL: @loadext_4i32_to_4i64(
; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/shift-lshr.ll

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	;
store i64 %r5, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		store i64 %r5, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @lshr_v16i32() {		define void @lshr_v16i32() {
; SSE-LABEL: @lshr_v16i32(		; SSE-LABEL: @lshr_v16i32(
; SSE-NEXT: [[A0:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @a32 to <2 x i32>*), align 4
; SSE-NEXT: [[A1:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1), align 4		; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2) to <2 x i32>*), align 4
; SSE-NEXT: [[A2:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2), align 4		; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <2 x i32>*), align 4
; SSE-NEXT: [[A3:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3), align 4		; SSE-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6) to <2 x i32>*), align 4
; SSE-NEXT: [[A4:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4), align 4		; SSE-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <2 x i32>*), align 4
; SSE-NEXT: [[A5:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 5), align 4		; SSE-NEXT: [[TMP6:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 10) to <2 x i32>*), align 4
; SSE-NEXT: [[A6:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6), align 4		; SSE-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <2 x i32>*), align 4
; SSE-NEXT: [[A7:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 7), align 4		; SSE-NEXT: [[TMP8:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 14) to <2 x i32>*), align 4
; SSE-NEXT: [[A8:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8), align 4		; SSE-NEXT: [[TMP9:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @b32 to <2 x i32>*), align 4
; SSE-NEXT: [[A9:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 9), align 4		; SSE-NEXT: [[TMP10:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 2) to <2 x i32>*), align 4
; SSE-NEXT: [[A10:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 10), align 4		; SSE-NEXT: [[TMP11:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <2 x i32>*), align 4
; SSE-NEXT: [[A11:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 11), align 4		; SSE-NEXT: [[TMP12:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 6) to <2 x i32>*), align 4
; SSE-NEXT: [[A12:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12), align 4		; SSE-NEXT: [[TMP13:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <2 x i32>*), align 4
; SSE-NEXT: [[A13:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 13), align 4		; SSE-NEXT: [[TMP14:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 10) to <2 x i32>*), align 4
; SSE-NEXT: [[A14:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 14), align 4		; SSE-NEXT: [[TMP15:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <2 x i32>*), align 4
; SSE-NEXT: [[A15:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 15), align 4		; SSE-NEXT: [[TMP16:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 14) to <2 x i32>*), align 4
; SSE-NEXT: [[B0:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 0), align 4		; SSE-NEXT: [[TMP17:%.*]] = lshr <2 x i32> [[TMP1]], [[TMP9]]
; SSE-NEXT: [[B1:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 1), align 4		; SSE-NEXT: [[TMP18:%.*]] = lshr <2 x i32> [[TMP2]], [[TMP10]]
; SSE-NEXT: [[B2:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 2), align 4		; SSE-NEXT: [[TMP19:%.*]] = lshr <2 x i32> [[TMP3]], [[TMP11]]
; SSE-NEXT: [[B3:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 3), align 4		; SSE-NEXT: [[TMP20:%.*]] = lshr <2 x i32> [[TMP4]], [[TMP12]]
; SSE-NEXT: [[B4:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4), align 4		; SSE-NEXT: [[TMP21:%.*]] = lshr <2 x i32> [[TMP5]], [[TMP13]]
; SSE-NEXT: [[B5:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 5), align 4		; SSE-NEXT: [[TMP22:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP14]]
; SSE-NEXT: [[B6:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 6), align 4		; SSE-NEXT: [[TMP23:%.*]] = lshr <2 x i32> [[TMP7]], [[TMP15]]
; SSE-NEXT: [[B7:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 7), align 4		; SSE-NEXT: [[TMP24:%.*]] = lshr <2 x i32> [[TMP8]], [[TMP16]]
; SSE-NEXT: [[B8:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8), align 4		; SSE-NEXT: store <2 x i32> [[TMP17]], <2 x i32>* bitcast ([16 x i32]* @c32 to <2 x i32>*), align 4
; SSE-NEXT: [[B9:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 9), align 4		; SSE-NEXT: store <2 x i32> [[TMP18]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 2) to <2 x i32>*), align 4
; SSE-NEXT: [[B10:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 10), align 4		; SSE-NEXT: store <2 x i32> [[TMP19]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <2 x i32>*), align 4
; SSE-NEXT: [[B11:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 11), align 4		; SSE-NEXT: store <2 x i32> [[TMP20]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 6) to <2 x i32>*), align 4
; SSE-NEXT: [[B12:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12), align 4		; SSE-NEXT: store <2 x i32> [[TMP21]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <2 x i32>*), align 4
; SSE-NEXT: [[B13:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 13), align 4		; SSE-NEXT: store <2 x i32> [[TMP22]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 10) to <2 x i32>*), align 4
; SSE-NEXT: [[B14:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 14), align 4		; SSE-NEXT: store <2 x i32> [[TMP23]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <2 x i32>*), align 4
; SSE-NEXT: [[B15:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 15), align 4		; SSE-NEXT: store <2 x i32> [[TMP24]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14) to <2 x i32>*), align 4
; SSE-NEXT: [[R0:%.*]] = lshr i32 [[A0]], [[B0]]
; SSE-NEXT: [[R1:%.*]] = lshr i32 [[A1]], [[B1]]
; SSE-NEXT: [[R2:%.*]] = lshr i32 [[A2]], [[B2]]
; SSE-NEXT: [[R3:%.*]] = lshr i32 [[A3]], [[B3]]
; SSE-NEXT: [[R4:%.*]] = lshr i32 [[A4]], [[B4]]
; SSE-NEXT: [[R5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[R6:%.*]] = lshr i32 [[A6]], [[B6]]
; SSE-NEXT: [[R7:%.*]] = lshr i32 [[A7]], [[B7]]
; SSE-NEXT: [[R8:%.*]] = lshr i32 [[A8]], [[B8]]
; SSE-NEXT: [[R9:%.*]] = lshr i32 [[A9]], [[B9]]
; SSE-NEXT: [[R10:%.*]] = lshr i32 [[A10]], [[B10]]
; SSE-NEXT: [[R11:%.*]] = lshr i32 [[A11]], [[B11]]
; SSE-NEXT: [[R12:%.*]] = lshr i32 [[A12]], [[B12]]
; SSE-NEXT: [[R13:%.*]] = lshr i32 [[A13]], [[B13]]
; SSE-NEXT: [[R14:%.*]] = lshr i32 [[A14]], [[B14]]
; SSE-NEXT: [[R15:%.*]] = lshr i32 [[A15]], [[B15]]
; SSE-NEXT: store i32 [[R0]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 0), align 4
; SSE-NEXT: store i32 [[R1]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 1), align 4
; SSE-NEXT: store i32 [[R2]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 2), align 4
; SSE-NEXT: store i32 [[R3]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 3), align 4
; SSE-NEXT: store i32 [[R4]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4), align 4
; SSE-NEXT: store i32 [[R5]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 5), align 4
; SSE-NEXT: store i32 [[R6]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 6), align 4
; SSE-NEXT: store i32 [[R7]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 7), align 4
; SSE-NEXT: store i32 [[R8]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8), align 4
; SSE-NEXT: store i32 [[R9]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 9), align 4
; SSE-NEXT: store i32 [[R10]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 10), align 4
; SSE-NEXT: store i32 [[R11]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 11), align 4
; SSE-NEXT: store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4
; SSE-NEXT: store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4
; SSE-NEXT: store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
; SSE-NEXT: store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @lshr_v16i32(		; AVX-LABEL: @lshr_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[TMP1]], [[TMP3]]
▲ Show 20 Lines • Show All 664 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/shift-shl.ll

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	;
store i32 %r13, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4		store i32 %r13, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @shl_v32i16() {		define void @shl_v32i16() {
; SSE-LABEL: @shl_v32i16(		; SSE-LABEL: @shl_v32i16(
; SSE-NEXT: [[A0:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0), align 2		; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @a16 to <4 x i16>*), align 2
; SSE-NEXT: [[A1:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1), align 2		; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4) to <4 x i16>*), align 2
; SSE-NEXT: [[A2:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2), align 2		; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <4 x i16>*), align 2
; SSE-NEXT: [[A3:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3), align 2		; SSE-NEXT: [[TMP4:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 12) to <4 x i16>*), align 2
; SSE-NEXT: [[A4:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4), align 2		; SSE-NEXT: [[TMP5:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <4 x i16>*), align 2
; SSE-NEXT: [[A5:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 5), align 2		; SSE-NEXT: [[TMP6:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 20) to <4 x i16>*), align 2
; SSE-NEXT: [[A6:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 6), align 2		; SSE-NEXT: [[TMP7:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <4 x i16>*), align 2
; SSE-NEXT: [[A7:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 7), align 2		; SSE-NEXT: [[TMP8:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 28) to <4 x i16>*), align 2
; SSE-NEXT: [[A8:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8), align 2		; SSE-NEXT: [[TMP9:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @b16 to <4 x i16>*), align 2
; SSE-NEXT: [[A9:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 9), align 2		; SSE-NEXT: [[TMP10:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 4) to <4 x i16>*), align 2
; SSE-NEXT: [[A10:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 10), align 2		; SSE-NEXT: [[TMP11:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <4 x i16>*), align 2
; SSE-NEXT: [[A11:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 11), align 2		; SSE-NEXT: [[TMP12:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 12) to <4 x i16>*), align 2
; SSE-NEXT: [[A12:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 12), align 2		; SSE-NEXT: [[TMP13:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <4 x i16>*), align 2
; SSE-NEXT: [[A13:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 13), align 2		; SSE-NEXT: [[TMP14:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 20) to <4 x i16>*), align 2
; SSE-NEXT: [[A14:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 14), align 2		; SSE-NEXT: [[TMP15:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <4 x i16>*), align 2
; SSE-NEXT: [[A15:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 15), align 2		; SSE-NEXT: [[TMP16:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 28) to <4 x i16>*), align 2
; SSE-NEXT: [[A16:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16), align 2		; SSE-NEXT: [[TMP17:%.*]] = shl <4 x i16> [[TMP1]], [[TMP9]]
; SSE-NEXT: [[A17:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 17), align 2		; SSE-NEXT: [[TMP18:%.*]] = shl <4 x i16> [[TMP2]], [[TMP10]]
; SSE-NEXT: [[A18:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 18), align 2		; SSE-NEXT: [[TMP19:%.*]] = shl <4 x i16> [[TMP3]], [[TMP11]]
; SSE-NEXT: [[A19:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 19), align 2		; SSE-NEXT: [[TMP20:%.*]] = shl <4 x i16> [[TMP4]], [[TMP12]]
; SSE-NEXT: [[A20:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 20), align 2		; SSE-NEXT: [[TMP21:%.*]] = shl <4 x i16> [[TMP5]], [[TMP13]]
; SSE-NEXT: [[A21:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 21), align 2		; SSE-NEXT: [[TMP22:%.*]] = shl <4 x i16> [[TMP6]], [[TMP14]]
; SSE-NEXT: [[A22:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 22), align 2		; SSE-NEXT: [[TMP23:%.*]] = shl <4 x i16> [[TMP7]], [[TMP15]]
; SSE-NEXT: [[A23:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 23), align 2		; SSE-NEXT: [[TMP24:%.*]] = shl <4 x i16> [[TMP8]], [[TMP16]]
; SSE-NEXT: [[A24:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24), align 2		; SSE-NEXT: store <4 x i16> [[TMP17]], <4 x i16>* bitcast ([32 x i16]* @c16 to <4 x i16>*), align 2
; SSE-NEXT: [[A25:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 25), align 2		; SSE-NEXT: store <4 x i16> [[TMP18]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 4) to <4 x i16>*), align 2
; SSE-NEXT: [[A26:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 26), align 2		; SSE-NEXT: store <4 x i16> [[TMP19]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <4 x i16>*), align 2
; SSE-NEXT: [[A27:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 27), align 2		; SSE-NEXT: store <4 x i16> [[TMP20]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 12) to <4 x i16>*), align 2
; SSE-NEXT: [[A28:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 28), align 2		; SSE-NEXT: store <4 x i16> [[TMP21]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <4 x i16>*), align 2
; SSE-NEXT: [[A29:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 29), align 2		; SSE-NEXT: store <4 x i16> [[TMP22]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 20) to <4 x i16>*), align 2
; SSE-NEXT: [[A30:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 30), align 2		; SSE-NEXT: store <4 x i16> [[TMP23]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <4 x i16>*), align 2
; SSE-NEXT: [[A31:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 31), align 2		; SSE-NEXT: store <4 x i16> [[TMP24]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 28) to <4 x i16>*), align 2
; SSE-NEXT: [[B0:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 0), align 2
; SSE-NEXT: [[B1:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 1), align 2
; SSE-NEXT: [[B2:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 2), align 2
; SSE-NEXT: [[B3:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 3), align 2
; SSE-NEXT: [[B4:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 4), align 2
; SSE-NEXT: [[B5:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 5), align 2
; SSE-NEXT: [[B6:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 6), align 2
; SSE-NEXT: [[B7:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 7), align 2
; SSE-NEXT: [[B8:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8), align 2
; SSE-NEXT: [[B9:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 9), align 2
; SSE-NEXT: [[B10:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 10), align 2
; SSE-NEXT: [[B11:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 11), align 2
; SSE-NEXT: [[B12:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 12), align 2
; SSE-NEXT: [[B13:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 13), align 2
; SSE-NEXT: [[B14:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 14), align 2
; SSE-NEXT: [[B15:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 15), align 2
; SSE-NEXT: [[B16:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16), align 2
; SSE-NEXT: [[B17:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 17), align 2
; SSE-NEXT: [[B18:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 18), align 2
; SSE-NEXT: [[B19:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 19), align 2
; SSE-NEXT: [[B20:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 20), align 2
; SSE-NEXT: [[B21:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 21), align 2
; SSE-NEXT: [[B22:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 22), align 2
; SSE-NEXT: [[B23:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 23), align 2
; SSE-NEXT: [[B24:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24), align 2
; SSE-NEXT: [[B25:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 25), align 2
; SSE-NEXT: [[B26:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 26), align 2
; SSE-NEXT: [[B27:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 27), align 2
; SSE-NEXT: [[B28:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 28), align 2
; SSE-NEXT: [[B29:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 29), align 2
; SSE-NEXT: [[B30:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 30), align 2
; SSE-NEXT: [[B31:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 31), align 2
; SSE-NEXT: [[R0:%.*]] = shl i16 [[A0]], [[B0]]
; SSE-NEXT: [[R1:%.*]] = shl i16 [[A1]], [[B1]]
; SSE-NEXT: [[R2:%.*]] = shl i16 [[A2]], [[B2]]
; SSE-NEXT: [[R3:%.*]] = shl i16 [[A3]], [[B3]]
; SSE-NEXT: [[R4:%.*]] = shl i16 [[A4]], [[B4]]
; SSE-NEXT: [[R5:%.*]] = shl i16 [[A5]], [[B5]]
; SSE-NEXT: [[R6:%.*]] = shl i16 [[A6]], [[B6]]
; SSE-NEXT: [[R7:%.*]] = shl i16 [[A7]], [[B7]]
; SSE-NEXT: [[R8:%.*]] = shl i16 [[A8]], [[B8]]
; SSE-NEXT: [[R9:%.*]] = shl i16 [[A9]], [[B9]]
; SSE-NEXT: [[R10:%.*]] = shl i16 [[A10]], [[B10]]
; SSE-NEXT: [[R11:%.*]] = shl i16 [[A11]], [[B11]]
; SSE-NEXT: [[R12:%.*]] = shl i16 [[A12]], [[B12]]
; SSE-NEXT: [[R13:%.*]] = shl i16 [[A13]], [[B13]]
; SSE-NEXT: [[R14:%.*]] = shl i16 [[A14]], [[B14]]
; SSE-NEXT: [[R15:%.*]] = shl i16 [[A15]], [[B15]]
; SSE-NEXT: [[R16:%.*]] = shl i16 [[A16]], [[B16]]
; SSE-NEXT: [[R17:%.*]] = shl i16 [[A17]], [[B17]]
; SSE-NEXT: [[R18:%.*]] = shl i16 [[A18]], [[B18]]
; SSE-NEXT: [[R19:%.*]] = shl i16 [[A19]], [[B19]]
; SSE-NEXT: [[R20:%.*]] = shl i16 [[A20]], [[B20]]
; SSE-NEXT: [[R21:%.*]] = shl i16 [[A21]], [[B21]]
; SSE-NEXT: [[R22:%.*]] = shl i16 [[A22]], [[B22]]
; SSE-NEXT: [[R23:%.*]] = shl i16 [[A23]], [[B23]]
; SSE-NEXT: [[R24:%.*]] = shl i16 [[A24]], [[B24]]
; SSE-NEXT: [[R25:%.*]] = shl i16 [[A25]], [[B25]]
; SSE-NEXT: [[R26:%.*]] = shl i16 [[A26]], [[B26]]
; SSE-NEXT: [[R27:%.*]] = shl i16 [[A27]], [[B27]]
; SSE-NEXT: [[R28:%.*]] = shl i16 [[A28]], [[B28]]
; SSE-NEXT: [[R29:%.*]] = shl i16 [[A29]], [[B29]]
; SSE-NEXT: [[R30:%.*]] = shl i16 [[A30]], [[B30]]
; SSE-NEXT: [[R31:%.*]] = shl i16 [[A31]], [[B31]]
; SSE-NEXT: store i16 [[R0]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 0), align 2
; SSE-NEXT: store i16 [[R1]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 1), align 2
; SSE-NEXT: store i16 [[R2]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 2), align 2
; SSE-NEXT: store i16 [[R3]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 3), align 2
; SSE-NEXT: store i16 [[R4]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 4), align 2
; SSE-NEXT: store i16 [[R5]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 5), align 2
; SSE-NEXT: store i16 [[R6]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 6), align 2
; SSE-NEXT: store i16 [[R7]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 7), align 2
; SSE-NEXT: store i16 [[R8]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8), align 2
; SSE-NEXT: store i16 [[R9]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 9), align 2
; SSE-NEXT: store i16 [[R10]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 10), align 2
; SSE-NEXT: store i16 [[R11]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 11), align 2
; SSE-NEXT: store i16 [[R12]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 12), align 2
; SSE-NEXT: store i16 [[R13]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 13), align 2
; SSE-NEXT: store i16 [[R14]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 14), align 2
; SSE-NEXT: store i16 [[R15]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 15), align 2
; SSE-NEXT: store i16 [[R16]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16), align 2
; SSE-NEXT: store i16 [[R17]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 17), align 2
; SSE-NEXT: store i16 [[R18]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 18), align 2
; SSE-NEXT: store i16 [[R19]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 19), align 2
; SSE-NEXT: store i16 [[R20]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 20), align 2
; SSE-NEXT: store i16 [[R21]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 21), align 2
; SSE-NEXT: store i16 [[R22]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 22), align 2
; SSE-NEXT: store i16 [[R23]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 23), align 2
; SSE-NEXT: store i16 [[R24]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24), align 2
; SSE-NEXT: store i16 [[R25]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 25), align 2
; SSE-NEXT: store i16 [[R26]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 26), align 2
; SSE-NEXT: store i16 [[R27]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 27), align 2
; SSE-NEXT: store i16 [[R28]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 28), align 2
; SSE-NEXT: store i16 [[R29]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 29), align 2
; SSE-NEXT: store i16 [[R30]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
; SSE-NEXT: store i16 [[R31]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @shl_v32i16(		; AVX-LABEL: @shl_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = shl <16 x i16> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = shl <16 x i16> [[TMP1]], [[TMP3]]
▲ Show 20 Lines • Show All 436 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/sitofp.ll

Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; SITOFP to vXf32		; SITOFP to vXf32
;		;

define void @sitofp_2i64_2f32() #0 {		define void @sitofp_2i64_2f32() #0 {
; CHECK-LABEL: @sitofp_2i64_2f32(		; SSE-LABEL: @sitofp_2i64_2f32(
; CHECK-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		; SSE-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
; CHECK-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float		; SSE-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
; CHECK-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float		; SSE-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
; CHECK-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; CHECK-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @sitofp_2i64_2f32(
		; AVX256NODQ-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
		; AVX256NODQ-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @sitofp_2i64_2f32(
		; AVX512-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX512-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @sitofp_2i64_2f32(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX256DQ-NEXT: [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX256DQ-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX256DQ-NEXT: ret void
;		;
%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
%cvt0 = sitofp i64 %ld0 to float		%cvt0 = sitofp i64 %ld0 to float
%cvt1 = sitofp i64 %ld1 to float		%cvt1 = sitofp i64 %ld1 to float
store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
ret void		ret void
▲ Show 20 Lines • Show All 646 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/tiny-tree.ll

	Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[SRC_ADDR_021:%.]] = phi float [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]			; CHECK-NEXT: [[SRC_ADDR_021:%.]] = phi float [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_021]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_021]], align 4
	; CHECK-NEXT: store float [[TMP0]], float* [[DST_ADDR_022]], align 4			; CHECK-NEXT: store float [[TMP0]], float* [[DST_ADDR_022]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 4			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 4
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 1
	; CHECK-NEXT: store float [[TMP1]], float* [[ARRAYIDX3]], align 4			; CHECK-NEXT: store float [[TMP1]], float* [[ARRAYIDX3]], align 4
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 2			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX4]], align 4
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 2			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 2
	; CHECK-NEXT: store float [[TMP2]], float* [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 3			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = load float, float [[ARRAYIDX6]], align 4			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX4]] to <2 x float>*
				; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 3			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 3
	; CHECK-NEXT: store float [[TMP3]], float* [[ARRAYIDX7]], align 4			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX5]] to <2 x float>*
				; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 [[I_023]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 [[I_023]]
	; CHECK-NEXT: [[ADD_PTR8]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 [[I_023]]			; CHECK-NEXT: [[ADD_PTR8]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 [[I_023]]
	; CHECK-NEXT: [[INC]] = add i64 [[I_023]], 1			; CHECK-NEXT: [[INC]] = add i64 [[I_023]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[COUNT]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/uitofp.ll

Show First 20 Lines • Show All 544 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; UITOFP to vXf32		; UITOFP to vXf32
;		;

define void @uitofp_2i64_2f32() #0 {		define void @uitofp_2i64_2f32() #0 {
; CHECK-LABEL: @uitofp_2i64_2f32(		; SSE-LABEL: @uitofp_2i64_2f32(
; CHECK-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		; SSE-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
; CHECK-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[CVT0:%.*]] = uitofp i64 [[LD0]] to float		; SSE-NEXT: [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
; CHECK-NEXT: [[CVT1:%.*]] = uitofp i64 [[LD1]] to float		; SSE-NEXT: [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
; CHECK-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; CHECK-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @uitofp_2i64_2f32(
		; AVX256NODQ-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
		; AVX256NODQ-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @uitofp_2i64_2f32(
		; AVX512-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX512-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @uitofp_2i64_2f32(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX256DQ-NEXT: [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX256DQ-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX256DQ-NEXT: ret void
;		;
%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
%cvt0 = uitofp i64 %ld0 to float		%cvt0 = uitofp i64 %ld0 to float
%cvt1 = uitofp i64 %ld1 to float		%cvt1 = uitofp i64 %ld1 to float
store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
ret void		ret void
▲ Show 20 Lines • Show All 596 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -slp-vectorizer -mcpu=btver2 -S \| FileCheck %s --check-prefix=VECT
				; RUN: opt < %s -basicaa -slp-vectorizer -mcpu=btver2 -slp-min-reg-size=128 -S \| FileCheck %s --check-prefix=NOVECT

				; Check SLPVectorizer works for packed horizontal 128-bit instrs.
				; See llvm.org/PR32433

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define void @add_pairs_128(<4 x float>, float* nocapture) #0 {
				; VECT-LABEL: @add_pairs_128(
				; VECT-NEXT: [[TMP3:%.]] = extractelement <4 x float> [[TMP0:%.]], i32 0
				; VECT-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; VECT-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
				; VECT-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
				; VECT-NEXT: [[TMP7:%.*]] = insertelement <2 x float> undef, float [[TMP3]], i32 0
				; VECT-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[TMP5]], i32 1
				; VECT-NEXT: [[TMP9:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i32 0
				; VECT-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP6]], i32 1
				; VECT-NEXT: [[TMP11:%.*]] = fadd <2 x float> [[TMP8]], [[TMP10]]
				; VECT-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 1
				; VECT-NEXT: [[TMP13:%.]] = bitcast float [[TMP1]] to <2 x float>*
				; VECT-NEXT: store <2 x float> [[TMP11]], <2 x float>* [[TMP13]], align 4
				; VECT-NEXT: ret void
				;
				; NOVECT-LABEL: @add_pairs_128(
				; NOVECT-NEXT: [[TMP3:%.]] = extractelement <4 x float> [[TMP0:%.]], i32 0
				; NOVECT-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOVECT-NEXT: [[TMP5:%.*]] = fadd float [[TMP3]], [[TMP4]]
				; NOVECT-NEXT: store float [[TMP5]], float* [[TMP1:%.*]], align 4
				; NOVECT-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
				; NOVECT-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
				; NOVECT-NEXT: [[TMP8:%.*]] = fadd float [[TMP6]], [[TMP7]]
				; NOVECT-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 1
				; NOVECT-NEXT: store float [[TMP8]], float* [[TMP9]], align 4
				; NOVECT-NEXT: ret void
				;
				lebedev.riUnsubmitted Not Done Reply Inline Actions There are no run lines for these prefixes. lebedev.ri: There are no run lines for these prefixes.
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Oops, thanks, forget to remove after prefix renaming. anton-afanasyev: Oops, thanks, forget to remove after prefix renaming.
				%3 = extractelement <4 x float> %0, i32 0
				%4 = extractelement <4 x float> %0, i32 1
				%5 = fadd float %3, %4
				store float %5, float* %1, align 4
				%6 = extractelement <4 x float> %0, i32 2
				%7 = extractelement <4 x float> %0, i32 3
				%8 = fadd float %6, %7
				%9 = getelementptr inbounds float, float* %1, i64 1
				store float %8, float* %9, align 4
				ret void
				}

				attributes #0 = { nounwind }

test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
define void @add1(i32* noalias %dst, i32* noalias %src) {		define void @add1(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @add1(		; CHECK-LABEL: @add1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP1]], 1
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[ADD3]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[INCDEC_PTR]] to <2 x i32>*
; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[TMP2]], 2		; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4
		; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> <i32 1, i32 2>, [[TMP2]]
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[ADD6]], i32* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[INCDEC_PTR1]] to <2 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4		; CHECK-NEXT: store <2 x i32> [[TMP3]], <2 x i32>* [[TMP4]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP3]], 3		; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4
		; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP5]], 3
; CHECK-NEXT: store i32 [[ADD9]], i32* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store i32 [[ADD9]], i32* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %0, i32* %dst, align 4		store i32 %0, i32* %dst, align 4
Show All 21 Lines
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1		; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[INCDEC_PTR2]] to <2 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = add nsw i32 [[TMP3]], -3		; CHECK-NEXT: [[TMP4:%.*]] = add nsw <2 x i32> <i32 -2, i32 -3>, [[TMP3]]
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[INCDEC_PTR3]] to <2 x i32>*
		; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1		; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[INCDEC_PTR2]] to <2 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3		; CHECK-NEXT: [[TMP4:%.*]] = add nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>
		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[INCDEC_PTR3]] to <2 x i32>*
		; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
Show All 11 Lines	entry:
store i32 %sub8, i32* %incdec.ptr6, align 4		store i32 %sub8, i32* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @addsub1(i32* noalias %dst, i32* noalias %src) {		define void @addsub1(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @addsub1(		; CHECK-LABEL: @addsub1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <2 x i32>*
; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i32 [[TMP1]], -1		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>
		; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[SUB1]], i32* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[DST]] to <2 x i32>*
		; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: store i32 [[TMP6]], i32* [[INCDEC_PTR3]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3		; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP7]], -3
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
define void @shl0(i32* noalias %dst, i32* noalias %src) {		define void @shl0(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @shl0(		; CHECK-LABEL: @shl0(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[TMP1]], 1
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[SHL]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[INCDEC_PTR]] to <2 x i32>*
; CHECK-NEXT: [[SHL5:%.*]] = shl i32 [[TMP2]], 2		; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4
		; CHECK-NEXT: [[TMP3:%.*]] = shl <2 x i32> [[TMP2]], <i32 1, i32 2>
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[SHL5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[INCDEC_PTR1]] to <2 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store <2 x i32> [[TMP3]], <2 x i32>* [[TMP4]], align 4
; CHECK-NEXT: [[SHL8:%.*]] = shl i32 [[TMP3]], 3		; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4
		; CHECK-NEXT: [[SHL8:%.*]] = shl i32 [[TMP5]], 3
; CHECK-NEXT: store i32 [[SHL8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: store i32 [[SHL8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %0, i32* %dst, align 4		store i32 %0, i32* %dst, align 4
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
define void @add1f(float* noalias %dst, float* noalias %src) {		define void @add1f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @add1f(		; CHECK-LABEL: @add1f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4		; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = fadd fast float [[TMP1]], 1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[ADD3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[INCDEC_PTR]] to <2 x float>*
; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], 2.000000e+00		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
		; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <2 x float> <float 1.000000e+00, float 2.000000e+00>, [[TMP2]]
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[INCDEC_PTR1]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], 3.000000e+00		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
		; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP5]], 3.000000e+00
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %0, float* %dst, align 4		store float %0, float* %dst, align 4
Show All 21 Lines
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4		; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], -2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[INCDEC_PTR2]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x float> <float -2.000000e+00, float -3.000000e+00>, [[TMP3]]
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[INCDEC_PTR4]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%add = fadd fast float %0, -1.000000e+00		%add = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %add, float* %dst, align 4		store float %add, float* %dst, align 4
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00		; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4		; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = fadd fast float [[TMP2]], -2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[SUB5]], float* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[INCDEC_PTR2]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP5:%.*]] = fsub fast <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
		; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[INCDEC_PTR3]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP6]], <2 x float>* [[TMP7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fadd fast float %0, -1.000000e+00		%sub = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %sub, float* %dst, align 4		store float %sub, float* %dst, align 4
Show All 11 Lines	entry:
store float %sub8, float* %incdec.ptr6, align 4		store float %sub8, float* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @addsub1f(float* noalias %dst, float* noalias %src) {		define void @addsub1f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @addsub1f(		; CHECK-LABEL: @addsub1f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <2 x float>*
; CHECK-NEXT: [[SUB1:%.*]] = fsub fast float [[TMP1]], -1.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4
		; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <2 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00>
		; CHECK-NEXT: [[TMP3:%.*]] = fsub fast <2 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP3]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[SUB1]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[DST]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP6:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: store float [[TMP6]], float* [[INCDEC_PTR3]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP7:%.]] = load float, float [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP7]], -3.000000e+00
; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fadd fast float %0, -1.000000e+00		%sub = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
Show All 12 Lines	entry:
store float %sub8, float* %incdec.ptr6, align 4		store float %sub8, float* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @mulf(float* noalias %dst, float* noalias %src) {		define void @mulf(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @mulf(		; CHECK-LABEL: @mulf(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = fmul fast float [[TMP0]], 2.570000e+02
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <2 x float>*
; CHECK-NEXT: [[SUB3:%.*]] = fmul fast float [[TMP1]], -3.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4
		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> <float 2.570000e+02, float -3.000000e+00>, [[TMP1]]
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[SUB3]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store float [[TMP4]], float* [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00		; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP5]], -9.000000e+00
; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fmul fast float %0, 2.570000e+02		%sub = fmul fast float %0, 2.570000e+02
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
define void @add1fn(float* noalias %dst, float* noalias %src) {		define void @add1fn(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @add1fn(		; CHECK-LABEL: @add1fn(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4		; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = fadd float [[TMP1]], 1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[ADD3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[INCDEC_PTR]] to <2 x float>*
; CHECK-NEXT: [[ADD6:%.*]] = fadd float [[TMP2]], 2.000000e+00		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> <float 1.000000e+00, float 2.000000e+00>, [[TMP2]]
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[INCDEC_PTR1]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP3]], 3.000000e+00		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
		; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP5]], 3.000000e+00
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %0, float* %dst, align 4		store float %0, float* %dst, align 4
Show All 21 Lines
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4		; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd float [[TMP2]], -2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[INCDEC_PTR2]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x float> <float -2.000000e+00, float -3.000000e+00>, [[TMP3]]
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[INCDEC_PTR4]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%add = fadd fast float %0, -1.000000e+00		%add = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %add, float* %dst, align 4		store float %add, float* %dst, align 4
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	entry:
store float %sub9, float* %incdec.ptr7, align 4		store float %sub9, float* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @mulfn(float* noalias %dst, float* noalias %src) {		define void @mulfn(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @mulfn(		; CHECK-LABEL: @mulfn(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = fmul float [[TMP0]], 2.570000e+02
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <2 x float>*
; CHECK-NEXT: [[SUB3:%.*]] = fmul float [[TMP1]], -3.000000e+00		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4
		; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> <float 2.570000e+02, float -3.000000e+00>, [[TMP1]]
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[SUB3]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store float [[TMP4]], float* [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00		; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP5]], -9.000000e+00
; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fmul float %0, 2.570000e+02		%sub = fmul float %0, 2.570000e+02
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
Show All 15 Lines

test/Transforms/SLPVectorizer/X86/zext.ll

Show First 20 Lines • Show All 676 Lines • ▼ Show 20 Lines	;
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE2-LABEL: @loadext_4i32_to_4i64(		; SSE2-LABEL: @loadext_4i32_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE2-NEXT: [[TMP3:%.]] = bitcast i32 [[P2]] to <2 x i32>*
; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE2-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 1
; SSE2-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64		; SSE2-NEXT: [[TMP5:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; SSE2-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64		; SSE2-NEXT: [[TMP6:%.*]] = zext <2 x i32> [[TMP4]] to <2 x i64>
; SSE2-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
; SSE2-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i32_to_4i64(		; SLM-LABEL: @loadext_4i32_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
Show All 9 Lines
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i32_to_4i64(		; AVX1-LABEL: @loadext_4i32_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[P2]] to <2 x i32>*
; AVX1-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; AVX1-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 1
; AVX1-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>		; AVX1-NEXT: [[TMP5:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; AVX1-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64		; AVX1-NEXT: [[TMP6:%.*]] = zext <2 x i32> [[TMP4]] to <2 x i64>
; AVX1-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64		; AVX1-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; AVX1-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
; AVX1-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX2-LABEL: @loadext_4i32_to_4i64(		; AVX2-LABEL: @loadext_4i32_to_4i64(
; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 186420

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

test/Transforms/SLPVectorizer/X86/addsub.ll

test/Transforms/SLPVectorizer/X86/alternate-fp.ll

test/Transforms/SLPVectorizer/X86/alternate-int.ll

test/Transforms/SLPVectorizer/X86/crash_7zip.ll

test/Transforms/SLPVectorizer/X86/crash_bullet.ll

test/Transforms/SLPVectorizer/X86/crash_bullet3.ll

test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll

test/Transforms/SLPVectorizer/X86/fptosi.ll

test/Transforms/SLPVectorizer/X86/fptoui.ll

test/Transforms/SLPVectorizer/X86/insertvalue.ll

test/Transforms/SLPVectorizer/X86/phi.ll

test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

test/Transforms/SLPVectorizer/X86/reorder_phi.ll

test/Transforms/SLPVectorizer/X86/resched.ll

test/Transforms/SLPVectorizer/X86/rgb_phi.ll

test/Transforms/SLPVectorizer/X86/saxpy.ll

test/Transforms/SLPVectorizer/X86/schedule-bundle.ll

test/Transforms/SLPVectorizer/X86/sext.ll

test/Transforms/SLPVectorizer/X86/shift-lshr.ll

test/Transforms/SLPVectorizer/X86/shift-shl.ll

test/Transforms/SLPVectorizer/X86/sitofp.ll

test/Transforms/SLPVectorizer/X86/tiny-tree.ll

test/Transforms/SLPVectorizer/X86/uitofp.ll

test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll

test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

test/Transforms/SLPVectorizer/X86/zext.ll

[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
ClosedPublic