This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
3/13
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
transpose-inseltpoison.ll
-
transpose.ll
-
X86/
-
PR35865-inseltpoison.ll
-
PR35865.ll
-
alternate-int-inseltpoison.ll
-
alternate-int.ll
-
hadd-inseltpoison.ll
-
hadd.ll
-
hsub-inseltpoison.ll
1/2
hsub.ll
-
resched.ll
-
sext-inseltpoison.ll
-
sext.ll
-
zext-inseltpoison.ll
-
zext.ll

Differential D101555

[SLP]Improve handling of compensate external uses cost.
ClosedPublic

Authored by ABataev on Apr 29 2021, 11:09 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon

Commits

rG8dab25954b0a: [SLP]Improve handling of compensate external uses cost.

Summary

External insertelement users can be represented as a result of shuffle
of the vectorized element and noconsecutive insertlements too. Added
support for handling non-consecutive insertelements.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	70 ms	x64 debian > Clang.Driver::debug-pass-structure.c

Event Timeline

ABataev created this revision.Apr 29 2021, 11:09 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 29 2021, 11:09 AM

ABataev requested review of this revision.Apr 29 2021, 11:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2021, 11:09 AM

Harbormaster completed remote builds in B101688: Diff 341583.Apr 29 2021, 12:19 PM

Matt added a subscriber: Matt.Apr 29 2021, 12:48 PM

RKSimon added inline comments.May 4 2021, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
601	Explain the purpose of InsertUses in the doxygen

ABataev added inline comments.May 4 2021, 2:48 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
601	Will add, thanks!

Address comments.

Harbormaster completed remote builds in B102726: Diff 343018.May 5 2021, 6:48 AM

RKSimon added inline comments.May 10 2021, 4:57 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4293	IsIdentity &= ?
llvm/test/Transforms/SLPVectorizer/X86/hsub.ll
174	These regressions looks like we need to do more in the shuffle costs to recognise when the shuffles don't cross subvector boundaries? Either for illegal types like this or across 128-bit subvector boundaries on AVX.

ABataev added inline comments.May 10 2021, 5:00 AM

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll
174	Yes, need to subtract scalarization overhead for insertelement instruction, trying to handle it correctly in vectorization of InsertElement instructions patch. I'm going to abandon this patch when vectorization of InsertElements is landed. Keeping it just in case.

ABataev mentioned this in D98714: [SLP] Add insertelement instructions to vectorizable tree.May 10 2021, 5:52 AM

anton-afanasyev added a subscriber: anton-afanasyev.May 10 2021, 6:05 AM

Rework after handling of insertelements

ABataev edited the summary of this revision. (Show Details)May 14 2021, 1:34 PM

Harbormaster completed remote builds in B104582: Diff 345545.May 14 2021, 2:25 PM

Rebase + improved build vector detection.

RKSimon added inline comments.May 18 2021, 1:29 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2850–2851	Any good way to merge the SourceVectors set with the VectorOperands list?
2852	would this be better as a count_if ?

ABataev added inline comments.May 18 2021, 1:41 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2850–2851	Thought about it too, will try to improve it somehow
2852	We do not need simple count here, we're filling list of operands and counting source vectors at the same time. Rather doubt count_if will help here

Harbormaster completed remote builds in B105079: Diff 346224.May 18 2021, 2:11 PM

Rebase + address comments

Harbormaster completed remote builds in B105210: Diff 346422.May 19 2021, 6:37 AM

A few minors

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2851	SmallVector seems unnecessary - why not just ValueList VectorOperands[NumOps] ? Even NumOps seems a bit too much.
3726	V is used only once getInsertIndex(VL[I], 0)
3738	assert(Offset < UINT_MAX && "Failed to find vector index offset") ? Or should it be Offset < NumScalars ?
4256	auto
4298	Can this be replaced with a if (none_of(FirstUsers)) pattern? You might be able to merge AreFromSingleVector into the lambda as well, although that might get too unwieldy?
4801	assert(Offset < UINT_MAX && "Failed to find vector index offset") ? Or should it be Offset < NumScalars ?

Address Comments.

RKSimon added inline comments.May 20 2021, 8:27 AM

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll
72 ↗	(On Diff #346735)	Still not performing fptoui on the entire <4 x i32>?

Harbormaster completed remote builds in B105425: Diff 346735.May 20 2021, 8:33 AM

RKSimon mentioned this in rGa26288e8030a: [X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs.May 20 2021, 10:57 AM

LGTM

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll
72 ↗	(On Diff #346735)	This is purely a cost-model issue - fptoui for 2f32 is 8 but 4f32 is 18 (looks like the model assumes they scalarize which they don't) - these are really wrong, but shouldn't stop this patch.

This revision is now accepted and ready to land.May 21 2021, 1:18 AM

RKSimon added inline comments.May 21 2021, 4:18 AM

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll
72 ↗	(On Diff #346735)	@ABataev If you rebase this should be fixed after rGeb6429d0fb94fd467e03d229177ae6ff3a44e3cc + rG3ae7f7ae0a33961be48948205981aea91920d3aa

ABataev added inline comments.May 21 2021, 4:45 AM

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll
72 ↗	(On Diff #346735)	Ok, thanks. I'll check it. Sorry for the delay with the answers, busy with other regressions.

Closed by commit rG8dab25954b0a: [SLP]Improve handling of compensate external uses cost. (authored by ABataev). · Explain WhyMay 21 2021, 7:46 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG8dab25954b0a: [SLP]Improve handling of compensate external uses cost..

We're seeing some test failures that bisected to this patch, possibly a miscompile. The test failure is in the unit test for this file: https://github.com/google/tink/blob/master/cc/subtle/aes_eax_aesni.cc. Are there already any known issues with this patch?

In D101555#2780145, @rupprecht wrote:

We're seeing some test failures that bisected to this patch, possibly a miscompile. The test failure is in the unit test for this file: https://github.com/google/tink/blob/master/cc/subtle/aes_eax_aesni.cc. Are there already any known issues with this patch?

No, there are not. It would help if you could provide the reproducer and exact compile command to check if the problem exists.

In D101555#2780152, @ABataev wrote:

In D101555#2780145, @rupprecht wrote:

We're seeing some test failures that bisected to this patch, possibly a miscompile. The test failure is in the unit test for this file: https://github.com/google/tink/blob/master/cc/subtle/aes_eax_aesni.cc. Are there already any known issues with this patch?

No, there are not. It would help if you could provide the reproducer and exact compile command to check if the problem exists.

I was unsuccessful in getting it to repro directly from the open source repo. However I reduced this which shows the issue:

$ cat repro.cc
#include <xmmintrin.h>

#include <cstdint>
#include <cstdio>
#include <cstring>

// https://github.com/google/tink/blob/a72c9d542cd1dd8b58b2620ab52585cf5544f212/cc/subtle/aes_eax_aesni.cc#L79
inline __m128i Add(__m128i x, uint64_t y) {
  // Convert to a vector of two uint64_t.
  uint64_t vec[2];
  _mm_storeu_si128(reinterpret_cast<__m128i *>(vec), x);
  // Perform the addition on the vector.
  vec[0] += y;
  if (y > vec[0]) {
    vec[1]++;
  }
  // Convert back to xmm.
  return _mm_loadu_si128(reinterpret_cast<__m128i *>(vec));
}

void print128(__m128i var) {
  uint64_t parts[2];
  memcpy(parts, &var, sizeof(parts));
  printf("%lu %lu\n", parts[0], parts[1]);
}

template <class T>
void DoNotOptimize(const T &var) {
  asm volatile("" : "+m"(const_cast<T &>(var)));
}

int main() {
  __m128i x = _mm_setzero_si128();
  DoNotOptimize(x);
  __m128i y = Add(x, 1);
  print128(x);
  print128(y);
}
$ clang++ repro.cc -o /tmp/miscompile -O2 -fno-slp-vectorize && /tmp/miscompile
0 0
1 0
$ clang++ repro.cc -o /tmp/miscompile -O2 && /tmp/miscompile
0 0
1 1

Prior to this patch, there is no difference when enabling or disabling -fslp-vectorize. The issue seems to be how this optimizes Add:

vec[0] += y;
if (y > vec[0]) {  // This effectively evaluates to true
  vec[1]++;
}

dyung added a subscriber: dyung.May 26 2021, 2:21 AM

This comment was removed by dyung.

Hi, we are noticing a regression in the quality of the code generated by the compiler for btver2 after this change.

Consider the following code (ymm-1undef-add_ps_002.cpp):

#include <x86intrin.h>

__attribute__((noinline))
__m256 add_ps_002(__m256 a, __m256 b) {
  __m256 r = (__m256){ a[0] + a[1], a[2] + a[3], a[4] + a[5], a[6] + a[7],
                       b[0] + b[1], b[2] + b[3], b[4] + b[5], b[6] + b[7] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3, 4, 5, 6, 7);
}

Prior to this change, when compiled with "-g0 -O3 -march=btver2" the compiler would generate the following assembly:

# %bb.0:                                # %entry                                                                                         
        vhaddps %xmm0, %xmm0, %xmm2                                                                                                      
        vextractf128    $1, %ymm0, %xmm0                                                                                                 
        vhaddps %xmm0, %xmm1, %xmm3                                                                                                      
        vinsertf128     $1, %xmm3, %ymm0, %ymm3                                                                                          
        vhaddps %ymm0, %ymm1, %ymm0                                                                                                      
        vblendps        $3, %ymm2, %ymm3, %ymm2         # ymm2 = ymm2[0,1],ymm3[2,3,4,5,6,7]
        vshufpd $2, %ymm0, %ymm2, %ymm0         # ymm0 = ymm2[0],ymm0[1],ymm2[2],ymm0[2]
        retq

With the following characteristics according to llvm-mca:

Iterations:        100
Instructions:      800
Total Cycles:      902
Total uOps:        1200

Dispatch Width:    2
uOps Per Cycle:    1.33
IPC:               0.89
Block RThroughput: 6.0

But after this change, the compiler is now producing the following assembly for the same code:

# %bb.0:                                # %entry
        vextractf128    $1, %ymm0, %xmm2
        vmovlhps        %xmm2, %xmm0, %xmm3             # xmm3 = xmm0[0],xmm2[0]                                                         
        vshufps $17, %xmm2, %xmm0, %xmm0        # xmm0 = xmm0[1,0],xmm2[1,0]                                                             
        vshufps $232, %xmm2, %xmm3, %xmm3       # xmm3 = xmm3[0,2],xmm2[2,3]                                                             
        vshufps $248, %xmm2, %xmm0, %xmm0       # xmm0 = xmm0[0,2],xmm2[3,3]                                                             
        vextractf128    $1, %ymm1, %xmm2                            
        vinsertps       $48, %xmm1, %xmm3, %xmm3 # xmm3 = xmm3[0,1,2],xmm1[0]                                                            
        vinsertps       $112, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[1]                                                           
        vhaddps %xmm2, %xmm1, %xmm1                                                                                                      
        vhaddps %xmm2, %xmm2, %xmm2                                                                                                      
        vaddps  %xmm0, %xmm3, %xmm0                                 
        vpermilps       $148, %xmm0, %xmm3      # xmm3 = xmm0[0,1,1,2]                                                                   
        vinsertps       $200, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[3],xmm1[1,2],zero                                                        
        vinsertps       $112, %xmm2, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm2[1]                                                           
        vinsertf128     $1, %xmm0, %ymm3, %ymm0                                                                                          
        retq

Which has the following characteristics according to llvm-mca:

Iterations:        100                                                                                                                   
Instructions:      1600                                                                                                                  
Total Cycles:      1007                                                                                                                  
Total uOps:        1700         

Dispatch Width:    2
uOps Per Cycle:    1.69
IPC:               1.59
Block RThroughput: 8.5

With some help understanding the llvm-mca output from @RKSimon, I understand that the increased RThroughput number is bad for hot loops, while the increase in the total cycles is worse for straight line code.

Could you take a look?

Thanks for the reports, will investigate them all and fix ASAP.

In D101555#2781034, @rupprecht wrote:
In D101555#2780152, @ABataev wrote:

In D101555#2780145, @rupprecht wrote:

We're seeing some test failures that bisected to this patch, possibly a miscompile. The test failure is in the unit test for this file: https://github.com/google/tink/blob/master/cc/subtle/aes_eax_aesni.cc. Are there already any known issues with this patch?

No, there are not. It would help if you could provide the reproducer and exact compile command to check if the problem exists.

I was unsuccessful in getting it to repro directly from the open source repo. However I reduced this which shows the issue:
$ cat repro.cc
#include <xmmintrin.h>

#include <cstdint>
#include <cstdio>
#include <cstring>

// https://github.com/google/tink/blob/a72c9d542cd1dd8b58b2620ab52585cf5544f212/cc/subtle/aes_eax_aesni.cc#L79
inline __m128i Add(__m128i x, uint64_t y) {
  // Convert to a vector of two uint64_t.
  uint64_t vec[2];
  _mm_storeu_si128(reinterpret_cast<__m128i *>(vec), x);
  // Perform the addition on the vector.
  vec[0] += y;
  if (y > vec[0]) {
    vec[1]++;
  }
  // Convert back to xmm.
  return _mm_loadu_si128(reinterpret_cast<__m128i *>(vec));
}

void print128(__m128i var) {
  uint64_t parts[2];
  memcpy(parts, &var, sizeof(parts));
  printf("%lu %lu\n", parts[0], parts[1]);
}

template <class T>
void DoNotOptimize(const T &var) {
  asm volatile("" : "+m"(const_cast<T &>(var)));
}

int main() {
  __m128i x = _mm_setzero_si128();
  DoNotOptimize(x);
  __m128i y = Add(x, 1);
  print128(x);
  print128(y);
}
$ clang++ repro.cc -o /tmp/miscompile -O2 -fno-slp-vectorize && /tmp/miscompile
0 0
1 0
$ clang++ repro.cc -o /tmp/miscompile -O2 && /tmp/miscompile
0 0
1 1
Prior to this patch, there is no difference when enabling or disabling -fslp-vectorize. The issue seems to be how this optimizes Add:
vec[0] += y;
if (y > vec[0]) {  // This effectively evaluates to true
  vec[1]++;
}

Here is a fix D103164

In D101555#2781369, @dyung wrote:

Hi, we are noticing a regression in the quality of the code generated by the compiler for btver2 after this change.

Consider the following code (ymm-1undef-add_ps_002.cpp):

#include <x86intrin.h>

__attribute__((noinline))
__m256 add_ps_002(__m256 a, __m256 b) {
  __m256 r = (__m256){ a[0] + a[1], a[2] + a[3], a[4] + a[5], a[6] + a[7],
                       b[0] + b[1], b[2] + b[3], b[4] + b[5], b[6] + b[7] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3, 4, 5, 6, 7);
}

Prior to this change, when compiled with "-g0 -O3 -march=btver2" the compiler would generate the following assembly:

# %bb.0:                                # %entry                                                                                         
        vhaddps %xmm0, %xmm0, %xmm2                                                                                                      
        vextractf128    $1, %ymm0, %xmm0                                                                                                 
        vhaddps %xmm0, %xmm1, %xmm3                                                                                                      
        vinsertf128     $1, %xmm3, %ymm0, %ymm3                                                                                          
        vhaddps %ymm0, %ymm1, %ymm0                                                                                                      
        vblendps        $3, %ymm2, %ymm3, %ymm2         # ymm2 = ymm2[0,1],ymm3[2,3,4,5,6,7]
        vshufpd $2, %ymm0, %ymm2, %ymm0         # ymm0 = ymm2[0],ymm0[1],ymm2[2],ymm0[2]
        retq

With the following characteristics according to llvm-mca:

Iterations:        100
Instructions:      800
Total Cycles:      902
Total uOps:        1200

Dispatch Width:    2
uOps Per Cycle:    1.33
IPC:               0.89
Block RThroughput: 6.0

But after this change, the compiler is now producing the following assembly for the same code:

# %bb.0:                                # %entry
        vextractf128    $1, %ymm0, %xmm2
        vmovlhps        %xmm2, %xmm0, %xmm3             # xmm3 = xmm0[0],xmm2[0]                                                         
        vshufps $17, %xmm2, %xmm0, %xmm0        # xmm0 = xmm0[1,0],xmm2[1,0]                                                             
        vshufps $232, %xmm2, %xmm3, %xmm3       # xmm3 = xmm3[0,2],xmm2[2,3]                                                             
        vshufps $248, %xmm2, %xmm0, %xmm0       # xmm0 = xmm0[0,2],xmm2[3,3]                                                             
        vextractf128    $1, %ymm1, %xmm2                            
        vinsertps       $48, %xmm1, %xmm3, %xmm3 # xmm3 = xmm3[0,1,2],xmm1[0]                                                            
        vinsertps       $112, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[1]                                                           
        vhaddps %xmm2, %xmm1, %xmm1                                                                                                      
        vhaddps %xmm2, %xmm2, %xmm2                                                                                                      
        vaddps  %xmm0, %xmm3, %xmm0                                 
        vpermilps       $148, %xmm0, %xmm3      # xmm3 = xmm0[0,1,1,2]                                                                   
        vinsertps       $200, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[3],xmm1[1,2],zero                                                        
        vinsertps       $112, %xmm2, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm2[1]                                                           
        vinsertf128     $1, %xmm0, %ymm3, %ymm0                                                                                          
        retq

Which has the following characteristics according to llvm-mca:

Iterations:        100                                                                                                                   
Instructions:      1600                                                                                                                  
Total Cycles:      1007                                                                                                                  
Total uOps:        1700         

Dispatch Width:    2
uOps Per Cycle:    1.69
IPC:               1.59
Block RThroughput: 8.5

Could you take a look?

Looks like codegen or some other later passes previously recognized the pattern while SLP vectorizer did not. Actually, without SLP vectorizer I'm getting just this:

vperm2f128      $49, %ymm1, %ymm0, %ymm2 # ymm2 = ymm0[2,3],ymm1[2,3]
vinsertf128     $1, %xmm1, %ymm0, %ymm0
vhaddps %ymm2, %ymm0, %ymm0
retq

I assume SLP will be able to produce something similar (or even better) after we start supporting vectorization of non-power-2 vectors. Here we have a pattern that matches it exactly:

return __builtin_shufflevector(r, a, 0, -1, 2, 3, 4, 5, 6, 7);

-1 causes the optimizer to optimize out a[2] + a[3] operation and SLP does not recognize vectorization of 7 addition operations. This is the price we have to pay till the landing of non-power-2 vectorization. Will try to speed up.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

92 lines

test/

Transforms/

SLPVectorizer/

AArch64/

transpose-inseltpoison.ll

38 lines

transpose.ll

38 lines

X86/

PR35865-inseltpoison.ll

16 lines

PR35865.ll

16 lines

alternate-int-inseltpoison.ll

117 lines

117 lines

84 lines

84 lines

84 lines

84 lines

78 lines

988 lines

988 lines

670 lines

670 lines

Diff 341583

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	public:
/// generated extractvalue instructions.		/// generated extractvalue instructions.
Value *vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues);		Value *vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues);

/// \returns the cost incurred by unwanted spills and fills, caused by		/// \returns the cost incurred by unwanted spills and fills, caused by
/// holding live values over call sites.		/// holding live values over call sites.
InstructionCost getSpillCost() const;		InstructionCost getSpillCost() const;

/// \returns the vectorization cost of the subtree that starts at \p VL.		/// \returns the vectorization cost of the subtree that starts at \p VL.
/// A negative number means that this is profitable.		/// A negative number means that this is profitable.
		RKSimonUnsubmitted Not Done Reply Inline Actions Explain the purpose of InsertUses in the doxygen RKSimon: Explain the purpose of InsertUses in the doxygen
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will add, thanks! ABataev: Will add, thanks!
InstructionCost getTreeCost();		InstructionCost getTreeCost(ArrayRef<Value *> InsertUses = None);

/// Construct a vectorizable tree that starts at \p Roots, ignoring users for		/// Construct a vectorizable tree that starts at \p Roots, ignoring users for
/// the purpose of scheduling and extraction in the \p UserIgnoreLst.		/// the purpose of scheduling and extraction in the \p UserIgnoreLst.
void buildTree(ArrayRef<Value *> Roots,		void buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst = None);		ArrayRef<Value *> UserIgnoreLst = None);

/// Construct a vectorizable tree that starts at \p Roots, ignoring users for		/// Construct a vectorizable tree that starts at \p Roots, ignoring users for
/// the purpose of scheduling and extraction in the \p UserIgnoreLst taking		/// the purpose of scheduling and extraction in the \p UserIgnoreLst taking
▲ Show 20 Lines • Show All 2,231 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
VectorizableTree.back()->setOperand(0, Op0);		VectorizableTree.back()->setOperand(0, Op0);
return;		return;
}		}
LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");		LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
return;		return;
}		}
case Instruction::Load: {		case Instruction::Load: {
		RKSimonUnsubmitted Not Done Reply Inline Actions Any good way to merge the SourceVectors set with the VectorOperands list? RKSimon: Any good way to merge the SourceVectors set with the VectorOperands list?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Thought about it too, will try to improve it somehow ABataev: Thought about it too, will try to improve it somehow
		RKSimonUnsubmitted Not Done Reply Inline Actions SmallVector seems unnecessary - why not just ValueList VectorOperands[NumOps] ? Even NumOps seems a bit too much. RKSimon: SmallVector seems unnecessary - why not just ValueList VectorOperands[NumOps] ? Even NumOps…
// Check that a vectorized load would load the same memory as a scalar		// Check that a vectorized load would load the same memory as a scalar
		RKSimonUnsubmitted Not Done Reply Inline Actions would this be better as a count_if ? RKSimon: would this be better as a count_if ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions We do not need simple count here, we're filling list of operands and counting source vectors at the same time. Rather doubt count_if will help here ABataev: We do not need simple count here, we're filling list of operands and counting source vectors at…
// load. For example, we don't want to vectorize loads that are smaller		// load. For example, we don't want to vectorize loads that are smaller
// than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM		// than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM
// treats loading/storing it as an i8 struct. If we vectorize loads/stores		// treats loading/storing it as an i8 struct. If we vectorize loads/stores
// from such a struct, we read/write packed bits disagreeing with the		// from such a struct, we read/write packed bits disagreeing with the
// unvectorized version.		// unvectorized version.
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();

if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
▲ Show 20 Lines • Show All 857 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
}		}
return CommonCost;		return CommonCost;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
		RKSimonUnsubmitted Not Done Reply Inline Actions V is used only once getInsertIndex(VL[I], 0) RKSimon: V is used only once ``` getInsertIndex(VL[I], 0) ```
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy,		TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy,
TTI::getCastContextHint(VL0), CostKind, VL0);		TTI::getCastContextHint(VL0), CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
		RKSimonUnsubmitted Not Done Reply Inline Actions assert(Offset < UINT_MAX && "Failed to find vector index offset") ? Or should it be Offset < NumScalars ? RKSimon: assert(Offset < UINT_MAX && "Failed to find vector index offset") ? Or should it be Offset <…
}		}

// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
InstructionCost ScalarCost = VL.size() * ScalarEltCost;		InstructionCost ScalarCost = VL.size() * ScalarEltCost;

auto *SrcVecTy = FixedVectorType::get(SrcTy, VL.size());		auto *SrcVecTy = FixedVectorType::get(SrcTy, VL.size());
InstructionCost VecCost = 0;		InstructionCost VecCost = 0;
// Check if the values are candidates to demote.		// Check if the values are candidates to demote.
▲ Show 20 Lines • Show All 457 Lines • ▼ Show 20 Lines	for (Instruction *Inst : OrderedScalars) {
}		}

PrevInst = Inst;		PrevInst = Inst;
}		}

return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getTreeCost() {		InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> InsertUses) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "		LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
<< VectorizableTree.size() << ".\n");		<< VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0]->Scalars.size();		unsigned BundleWidth = VectorizableTree[0]->Scalars.size();

for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {		for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I].get();		TreeEntry &TE = *VectorizableTree[I].get();
Show All 22 Lines	for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
InstructionCost C = getEntryCost(&TE);		InstructionCost C = getEntryCost(&TE);
Cost += C;		Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n"		<< ".\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

		// Track scalar instructions and their idx and insertelement instruction
		// itself.
		DenseMap<Value , std::pair<Value , int>> ValsToInserts;
		int VF = 0;
		for (Value *V : InsertUses) {
		auto *I = cast<InsertElementInst>(V);
		RKSimonUnsubmitted Not Done Reply Inline Actions auto RKSimon: auto
		int Idx = UndefMaskElem;
		if (auto *CI = dyn_cast<ConstantInt>(I->getOperand(2))) {
		if (auto *FTy = dyn_cast<FixedVectorType>(I->getOperand(0)->getType())) {
		Idx = CI->getSExtValue();
		assert((VF == 0 \|\| VF == static_cast<int>(FTy->getNumElements())) &&
		"Expected insertelements of the same types.");
		VF = FTy->getNumElements();
		}
		}
		ValsToInserts.try_emplace(I->getOperand(1), std::make_pair(V, Idx));
		}

SmallPtrSet<Value *, 16> ExtractCostCalculated;		SmallPtrSet<Value *, 16> ExtractCostCalculated;
InstructionCost ExtractCost = 0;		InstructionCost ExtractCost = 0;
		bool UsedShuffleMask = false;
		bool IsIdentity = true;
		SmallVector<int> ShuffleMask(VF, UndefMaskElem);
for (ExternalUser &EU : ExternalUses) {		for (ExternalUser &EU : ExternalUses) {
// We only add extract cost once for the same scalar.		// We only add extract cost once for the same scalar.
if (!ExtractCostCalculated.insert(EU.Scalar).second)		if (!ExtractCostCalculated.insert(EU.Scalar).second)
continue;		continue;

// Uses by ephemeral values are free (because the ephemeral value will be		// Uses by ephemeral values are free (because the ephemeral value will be
// removed prior to code generation, and so the extraction will be		// removed prior to code generation, and so the extraction will be
// removed as well).		// removed as well).
if (EphValues.count(EU.User))		if (EphValues.count(EU.User))
continue;		continue;

		// If found user is an insertelement, do not calculate extract cost but try
		// to detect it as a final shuffled/identity match.
		auto It = ValsToInserts.find(EU.Scalar);
		if (It != ValsToInserts.end() && EU.User == It->second.first) {
		if (It->second.second >= 0 && It->second.second < VF)
		ShuffleMask[It->second.second] = EU.Lane;
		IsIdentity = IsIdentity && (EU.Lane == It->second.second \|\|
		It->second.second == UndefMaskElem \|\|
		EU.Lane == UndefMaskElem);
		RKSimonUnsubmitted Not Done Reply Inline Actions IsIdentity &= ? RKSimon: IsIdentity &= ?
		UsedShuffleMask = true;
		continue;
		}

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
		RKSimonUnsubmitted Not Done Reply Inline Actions Can this be replaced with a if (none_of(FirstUsers)) pattern? You might be able to merge AreFromSingleVector into the lambda as well, although that might get too unwieldy? RKSimon: Can this be replaced with a if (none_of(FirstUsers)) pattern? You might be able to merge…
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);		auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);
auto *ScalarRoot = VectorizableTree[0]->Scalars[0];		auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
auto Extend =		auto Extend =
MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;		MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;
VecTy = FixedVectorType::get(MinTy, BundleWidth);		VecTy = FixedVectorType::get(MinTy, BundleWidth);
ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),		ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),
VecTy, EU.Lane);		VecTy, EU.Lane);
} else {		} else {
ExtractCost +=		ExtractCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);
}		}
}		}

InstructionCost SpillCost = getSpillCost();		InstructionCost SpillCost = getSpillCost();
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;
		if (UsedShuffleMask && !IsIdentity) {
		InstructionCost C = TTI->getShuffleCost(
		TTI::SK_PermuteSingleSrc,
		FixedVectorType::get(
		VectorizableTree.front()->Scalars.front()->getType(), VF),
		ShuffleMask);
		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
		<< " for final shuffle of insertelement operands "
		<< *VectorizableTree.front()->Scalars.front() << ".\n"
		<< "SLP: Current total cost = " << Cost << "\n");
		Cost += C;
		}

#ifndef NDEBUG		#ifndef NDEBUG
SmallString<256> Str;		SmallString<256> Str;
{		{
raw_svector_ostream OS(Str);		raw_svector_ostream OS(Str);
OS << "SLP: Spill Cost = " << SpillCost << ".\n"		OS << "SLP: Spill Cost = " << SpillCost << ".\n"
<< "SLP: Extract Cost = " << ExtractCost << ".\n"		<< "SLP: Extract Cost = " << ExtractCost << ".\n"
<< "SLP: Total Cost = " << Cost << ".\n";		<< "SLP: Total Cost = " << Cost << ".\n";
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	switch (ShuffleOrOp) {
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *InVec = vectorizeTree(E->getOperand(0));		Value *InVec = vectorizeTree(E->getOperand(0));

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
		RKSimonUnsubmitted Not Done Reply Inline Actions assert(Offset < UINT_MAX && "Failed to find vector index offset") ? Or should it be Offset < NumScalars ? RKSimon: assert(Offset < UINT_MAX && "Failed to find vector index offset") ? Or should it be Offset <…
return E->VectorizedValue;		return E->VectorizedValue;
}		}

auto *CI = cast<CastInst>(VL0);		auto *CI = cast<CastInst>(VL0);
Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);		Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

▲ Show 20 Lines • Show All 1,819 Lines • ▼ Show 20 Lines	for (unsigned I = NextInst; I < MaxInst; ++I) {
assert(Ops.size() == 2);		assert(Ops.size() == 2);
Value *ReorderedOps[] = {Ops[1], Ops[0]};		Value *ReorderedOps[] = {Ops[1], Ops[0]};
R.buildTree(ReorderedOps, None);		R.buildTree(ReorderedOps, None);
}		}
if (R.isTreeTinyAndNotFullyVectorizable())		if (R.isTreeTinyAndNotFullyVectorizable())
continue;		continue;

R.computeMinimumValueSizes();		R.computeMinimumValueSizes();
InstructionCost Cost = R.getTreeCost();		InstructionCost Cost =
		R.getTreeCost(CompensateUseCost ? InsertUses : None);
CandidateFound = true;		CandidateFound = true;
if (CompensateUseCost) {
// TODO: Use TTI's getScalarizationOverhead for sequence of inserts
// rather than sum of single inserts as the latter may overestimate
// cost. This work should imply improving cost estimation for extracts
// that added in for external (for vectorization tree) users,i.e. that
// part should also switch to same interface.
// For example, the following case is projected code after SLP:
// %4 = extractelement <4 x i64> %3, i32 0
// %v0 = insertelement <4 x i64> poison, i64 %4, i32 0
// %5 = extractelement <4 x i64> %3, i32 1
// %v1 = insertelement <4 x i64> %v0, i64 %5, i32 1
// %6 = extractelement <4 x i64> %3, i32 2
// %v2 = insertelement <4 x i64> %v1, i64 %6, i32 2
// %7 = extractelement <4 x i64> %3, i32 3
// %v3 = insertelement <4 x i64> %v2, i64 %7, i32 3
//
// Extracts here added by SLP in order to feed users (the inserts) of
// original scalars and contribute to "ExtractCost" at cost evaluation.
// The inserts in turn form sequence to build an aggregate that
// detected by findBuildAggregate routine.
// SLP makes an assumption that such sequence will be optimized away
// later (instcombine) so it tries to compensate ExctractCost with
// cost of insert sequence.
// Current per element cost calculation approach is not quite accurate
// and tends to create bias toward favoring vectorization.
// Switching to the TTI interface might help a bit.
// Alternative solution could be pattern-match to detect a no-op or
// shuffle.
InstructionCost UserCost = 0;
for (unsigned Lane = 0; Lane < OpsWidth; Lane++) {
auto *IE = cast<InsertElementInst>(InsertUses[I + Lane]);
if (auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2)))
UserCost += TTI->getVectorInstrCost(
Instruction::InsertElement, IE->getType(), CI->getZExtValue());
}
LLVM_DEBUG(dbgs() << "SLP: Compensate cost of users by: " << UserCost
<< ".\n");
Cost -= UserCost;
}

MinCost = std::min(MinCost, Cost);		MinCost = std::min(MinCost, Cost);

if (Cost < -SLPCostThreshold) {		if (Cost < -SLPCostThreshold) {
LLVM_DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n");
R.getORE()->emit(OptimizationRemark(SV_NAME, "VectorizedList",		R.getORE()->emit(OptimizationRemark(SV_NAME, "VectorizedList",
cast<Instruction>(Ops[0]))		cast<Instruction>(Ops[0]))
<< "SLP vectorized with cost " << ore::NV("Cost", Cost)		<< "SLP vectorized with cost " << ore::NV("Cost", Cost)
▲ Show 20 Lines • Show All 1,447 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_1(		; CHECK-LABEL: @build_vec_v4i32_reuse_1(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1		; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0		; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1
; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1		; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0		; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
		; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP1_2:%.*]] = sub i32 [[TMP0_2]], [[TMP0_3]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP1_3:%.*]] = sub i32 [[TMP0_3]], [[TMP0_2]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1_0]], i32 0		; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1_0]], i32 0
; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1		; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1
; CHECK-NEXT: [[TMP2_3:%.*]] = shufflevector <4 x i32> [[TMP2_1]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>		; CHECK-NEXT: [[TMP2_2:%.*]] = insertelement <4 x i32> [[TMP2_1]], i32 [[TMP1_2]], i32 2
		; CHECK-NEXT: [[TMP2_3:%.*]] = insertelement <4 x i32> [[TMP2_2]], i32 [[TMP1_3]], i32 3
; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
Show All 13 Lines
define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]
		; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
		; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP1_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP1_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]		; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP2_2:%.*]] = add i32 [[TMP0_2]], [[TMP0_3]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2_3:%.*]] = add i32 [[TMP1_2]], [[TMP1_3]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i32 0		; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1		; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>		; CHECK-NEXT: [[TMP3_2:%.*]] = insertelement <4 x i32> [[TMP3_1]], i32 [[TMP2_2]], i32 2
		; CHECK-NEXT: [[TMP3_3:%.*]] = insertelement <4 x i32> [[TMP3_2]], i32 [[TMP2_3]], i32 3
; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_1(		; CHECK-LABEL: @build_vec_v4i32_reuse_1(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1		; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0		; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1
; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1		; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0		; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
		; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP1_2:%.*]] = sub i32 [[TMP0_2]], [[TMP0_3]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP1_3:%.*]] = sub i32 [[TMP0_3]], [[TMP0_2]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1_0]], i32 0		; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1_0]], i32 0
; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1		; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1
; CHECK-NEXT: [[TMP2_3:%.*]] = shufflevector <4 x i32> [[TMP2_1]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>		; CHECK-NEXT: [[TMP2_2:%.*]] = insertelement <4 x i32> [[TMP2_1]], i32 [[TMP1_2]], i32 2
		; CHECK-NEXT: [[TMP2_3:%.*]] = insertelement <4 x i32> [[TMP2_2]], i32 [[TMP1_3]], i32 3
; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
Show All 13 Lines
define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]
		; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
		; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP1_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP1_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]		; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP2_2:%.*]] = add i32 [[TMP0_2]], [[TMP0_3]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2_3:%.*]] = add i32 [[TMP1_2]], [[TMP1_3]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i32 0		; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1		; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>		; CHECK-NEXT: [[TMP3_2:%.*]] = insertelement <4 x i32> [[TMP3_1]], i32 [[TMP2_2]], i32 2
		; CHECK-NEXT: [[TMP3_3:%.*]] = insertelement <4 x i32> [[TMP3_2]], i32 [[TMP2_3]], i32 3
; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[CONV_I_4_I:%.*]] = fpext half [[TMP0]] to float
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = bitcast float [[CONV_I_4_I]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1			; CHECK-NEXT: [[VECINS_I_4_I:%.*]] = insertelement <8 x i32> poison, i32 [[TMP1]], i32 4
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>			; CHECK-NEXT: [[CONV_I_5_I:%.*]] = fpext half [[TMP2]] to float
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = bitcast float [[CONV_I_5_I]] to i32
	; CHECK-NEXT: [[VECINS_I_4_I:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 4			; CHECK-NEXT: [[VECINS_I_5_I:%.*]] = insertelement <8 x i32> [[VECINS_I_4_I]], i32 [[TMP3]], i32 5
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[VECINS_I_5_I:%.*]] = insertelement <8 x i32> [[VECINS_I_4_I]], i32 [[TMP7]], i32 5
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[CONV_I_4_I:%.*]] = fpext half [[TMP0]] to float
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = bitcast float [[CONV_I_4_I]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1			; CHECK-NEXT: [[VECINS_I_4_I:%.*]] = insertelement <8 x i32> undef, i32 [[TMP1]], i32 4
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>			; CHECK-NEXT: [[CONV_I_5_I:%.*]] = fpext half [[TMP2]] to float
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = bitcast float [[CONV_I_5_I]] to i32
	; CHECK-NEXT: [[VECINS_I_4_I:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 4			; CHECK-NEXT: [[VECINS_I_5_I:%.*]] = insertelement <8 x i32> [[VECINS_I_4_I]], i32 [[TMP3]], i32 5
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[VECINS_I_5_I:%.*]] = insertelement <8 x i32> [[VECINS_I_4_I]], i32 [[TMP7]], i32 5
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512

define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @add_sub_v8i32(		; CHECK-LABEL: @add_sub_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_shl_v8i32(		; SSE-LABEL: @ashr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_shl_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
		; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; AVX1-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]
; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; AVX1-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP7:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[R3:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R5:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 10, i32 11, i32 undef, i32 undef>		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 14, i32 15>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32(		; AVX2-LABEL: @ashr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_shl_v8i32_const(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
		; SLM-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
Show All 35 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
		; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
		; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
		; SSE-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
		; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; SSE-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_lshr_shl_v8i32(
		; SLM-NEXT: [[A4:%.]] = extractelement <8 x i32> [[A:%.]], i32 4
		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; SLM-NEXT: [[B4:%.]] = extractelement <8 x i32> [[B:%.]], i32 4
		; SLM-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
		; SLM-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
		; SLM-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
		; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; SLM-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
		; SLM-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
		; SLM-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
		; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP5]], i32 0
		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
		; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP6]], i32 1
		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
		; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP7]], i32 2
		; SLM-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
		; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP8]], i32 3
		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; AVX1-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
		; AVX1-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
		; AVX1-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
		; AVX1-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX1-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
		; AVX1-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
		; AVX1-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; AVX1-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; AVX1-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; AVX1-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6
; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6		; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512

define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @add_sub_v8i32(		; CHECK-LABEL: @add_sub_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_shl_v8i32(		; SSE-LABEL: @ashr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_shl_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
		; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; AVX1-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]
; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; AVX1-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP7:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[R3:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R5:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 10, i32 11, i32 undef, i32 undef>		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 14, i32 15>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32(		; AVX2-LABEL: @ashr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_shl_v8i32_const(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
		; SLM-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
Show All 35 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
		; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
		; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
		; SSE-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
		; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; SSE-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_lshr_shl_v8i32(
		; SLM-NEXT: [[A4:%.]] = extractelement <8 x i32> [[A:%.]], i32 4
		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
		; SLM-NEXT: [[B4:%.]] = extractelement <8 x i32> [[B:%.]], i32 4
		; SLM-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
		; SLM-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
		; SLM-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
		; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; SLM-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
		; SLM-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
		; SLM-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
		; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP5]], i32 0
		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
		; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP6]], i32 1
		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
		; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP7]], i32 2
		; SLM-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
		; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP8]], i32 3
		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; AVX1-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
		; AVX1-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
		; AVX1-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
		; AVX1-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX1-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
		; AVX1-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
		; AVX1-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; AVX1-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; AVX1-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; AVX1-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6
; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6		; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A2:%.]] = extractelement <4 x double> [[A:%.]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
		; SSE-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 0
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[TMP5]], i32 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
; SSE-NEXT: ret <4 x double> [[R03]]		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0
Show All 32 Lines	;
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SSE-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[R07]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[A8:%.]] = extractelement <16 x i16> [[A:%.]], i32 8
		; SSE-NEXT: [[A9:%.*]] = extractelement <16 x i16> [[A]], i32 9
		; SSE-NEXT: [[A10:%.*]] = extractelement <16 x i16> [[A]], i32 10
		; SSE-NEXT: [[A11:%.*]] = extractelement <16 x i16> [[A]], i32 11
		; SSE-NEXT: [[A12:%.*]] = extractelement <16 x i16> [[A]], i32 12
		; SSE-NEXT: [[A13:%.*]] = extractelement <16 x i16> [[A]], i32 13
		; SSE-NEXT: [[A14:%.*]] = extractelement <16 x i16> [[A]], i32 14
		; SSE-NEXT: [[A15:%.*]] = extractelement <16 x i16> [[A]], i32 15
		; SSE-NEXT: [[B8:%.]] = extractelement <16 x i16> [[B:%.]], i32 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i32 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i32 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i32 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i32 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i32 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i32 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i32 15
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[R8:%.*]] = add i16 [[A8]], [[A9]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[R9:%.*]] = add i16 [[A10]], [[A11]]
; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[R10:%.*]] = add i16 [[A12]], [[A13]]
; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R11:%.*]] = add i16 [[A14]], [[A15]]
		; SSE-NEXT: [[R12:%.*]] = add i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = add i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = add i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = add i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
		; SSE-NEXT: [[RV0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
		; SSE-NEXT: [[RV1:%.*]] = insertelement <16 x i16> [[RV0]], i16 [[TMP5]], i32 1
		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
		; SSE-NEXT: [[RV2:%.*]] = insertelement <16 x i16> [[RV1]], i16 [[TMP6]], i32 2
		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
		; SSE-NEXT: [[RV3:%.*]] = insertelement <16 x i16> [[RV2]], i16 [[TMP7]], i32 3
		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[RV3]], i16 [[TMP8]], i32 4
		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[TMP9]], i32 5
		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[TMP10]], i32 6
		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[TMP11]], i32 7
		; SSE-NEXT: [[RV8:%.*]] = insertelement <16 x i16> [[RV7]], i16 [[R8]], i32 8
		; SSE-NEXT: [[RV9:%.*]] = insertelement <16 x i16> [[RV8]], i16 [[R9]], i32 9
		; SSE-NEXT: [[RV10:%.*]] = insertelement <16 x i16> [[RV9]], i16 [[R10]], i32 10
		; SSE-NEXT: [[RV11:%.*]] = insertelement <16 x i16> [[RV10]], i16 [[R11]], i32 11
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV11]], i16 [[R12]], i32 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i32 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i32 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i32 15
; SSE-NEXT: ret <16 x i16> [[RV15]]		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A2:%.]] = extractelement <4 x double> [[A:%.]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
		; SSE-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 0
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[TMP5]], i32 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
; SSE-NEXT: ret <4 x double> [[R03]]		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0
Show All 32 Lines	;
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SSE-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[R07]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[A8:%.]] = extractelement <16 x i16> [[A:%.]], i32 8
		; SSE-NEXT: [[A9:%.*]] = extractelement <16 x i16> [[A]], i32 9
		; SSE-NEXT: [[A10:%.*]] = extractelement <16 x i16> [[A]], i32 10
		; SSE-NEXT: [[A11:%.*]] = extractelement <16 x i16> [[A]], i32 11
		; SSE-NEXT: [[A12:%.*]] = extractelement <16 x i16> [[A]], i32 12
		; SSE-NEXT: [[A13:%.*]] = extractelement <16 x i16> [[A]], i32 13
		; SSE-NEXT: [[A14:%.*]] = extractelement <16 x i16> [[A]], i32 14
		; SSE-NEXT: [[A15:%.*]] = extractelement <16 x i16> [[A]], i32 15
		; SSE-NEXT: [[B8:%.]] = extractelement <16 x i16> [[B:%.]], i32 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i32 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i32 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i32 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i32 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i32 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i32 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i32 15
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[R8:%.*]] = add i16 [[A8]], [[A9]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[R9:%.*]] = add i16 [[A10]], [[A11]]
; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[R10:%.*]] = add i16 [[A12]], [[A13]]
; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R11:%.*]] = add i16 [[A14]], [[A15]]
		; SSE-NEXT: [[R12:%.*]] = add i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = add i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = add i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = add i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
		; SSE-NEXT: [[RV0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
		; SSE-NEXT: [[RV1:%.*]] = insertelement <16 x i16> [[RV0]], i16 [[TMP5]], i32 1
		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
		; SSE-NEXT: [[RV2:%.*]] = insertelement <16 x i16> [[RV1]], i16 [[TMP6]], i32 2
		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
		; SSE-NEXT: [[RV3:%.*]] = insertelement <16 x i16> [[RV2]], i16 [[TMP7]], i32 3
		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[RV3]], i16 [[TMP8]], i32 4
		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[TMP9]], i32 5
		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[TMP10]], i32 6
		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[TMP11]], i32 7
		; SSE-NEXT: [[RV8:%.*]] = insertelement <16 x i16> [[RV7]], i16 [[R8]], i32 8
		; SSE-NEXT: [[RV9:%.*]] = insertelement <16 x i16> [[RV8]], i16 [[R9]], i32 9
		; SSE-NEXT: [[RV10:%.*]] = insertelement <16 x i16> [[RV9]], i16 [[R10]], i32 10
		; SSE-NEXT: [[RV11:%.*]] = insertelement <16 x i16> [[RV10]], i16 [[R11]], i32 11
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV11]], i16 [[R12]], i32 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i32 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i32 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i32 15
; SSE-NEXT: ret <16 x i16> [[RV15]]		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A2:%.]] = extractelement <4 x double> [[A:%.]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
		; SSE-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 0
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[TMP5]], i32 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
; SSE-NEXT: ret <4 x double> [[R03]]		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0
Show All 32 Lines	;
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SSE-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[R07]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[A8:%.]] = extractelement <16 x i16> [[A:%.]], i32 8
		; SSE-NEXT: [[A9:%.*]] = extractelement <16 x i16> [[A]], i32 9
		; SSE-NEXT: [[A10:%.*]] = extractelement <16 x i16> [[A]], i32 10
		; SSE-NEXT: [[A11:%.*]] = extractelement <16 x i16> [[A]], i32 11
		; SSE-NEXT: [[A12:%.*]] = extractelement <16 x i16> [[A]], i32 12
		; SSE-NEXT: [[A13:%.*]] = extractelement <16 x i16> [[A]], i32 13
		; SSE-NEXT: [[A14:%.*]] = extractelement <16 x i16> [[A]], i32 14
		; SSE-NEXT: [[A15:%.*]] = extractelement <16 x i16> [[A]], i32 15
		; SSE-NEXT: [[B8:%.]] = extractelement <16 x i16> [[B:%.]], i32 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i32 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i32 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i32 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i32 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i32 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i32 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i32 15
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[R8:%.*]] = sub i16 [[A8]], [[A9]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[R9:%.*]] = sub i16 [[A10]], [[A11]]
; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[R10:%.*]] = sub i16 [[A12]], [[A13]]
; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R11:%.*]] = sub i16 [[A14]], [[A15]]
		; SSE-NEXT: [[R12:%.*]] = sub i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = sub i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = sub i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = sub i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
		; SSE-NEXT: [[RV0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
		; SSE-NEXT: [[RV1:%.*]] = insertelement <16 x i16> [[RV0]], i16 [[TMP5]], i32 1
		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
		; SSE-NEXT: [[RV2:%.*]] = insertelement <16 x i16> [[RV1]], i16 [[TMP6]], i32 2
		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
		; SSE-NEXT: [[RV3:%.*]] = insertelement <16 x i16> [[RV2]], i16 [[TMP7]], i32 3
		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[RV3]], i16 [[TMP8]], i32 4
		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[TMP9]], i32 5
		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[TMP10]], i32 6
		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[TMP11]], i32 7
		; SSE-NEXT: [[RV8:%.*]] = insertelement <16 x i16> [[RV7]], i16 [[R8]], i32 8
		; SSE-NEXT: [[RV9:%.*]] = insertelement <16 x i16> [[RV8]], i16 [[R9]], i32 9
		; SSE-NEXT: [[RV10:%.*]] = insertelement <16 x i16> [[RV9]], i16 [[R10]], i32 10
		; SSE-NEXT: [[RV11:%.*]] = insertelement <16 x i16> [[RV10]], i16 [[R11]], i32 11
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV11]], i16 [[R12]], i32 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i32 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i32 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i32 15
; SSE-NEXT: ret <16 x i16> [[RV15]]		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A2:%.]] = extractelement <4 x double> [[A:%.]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
		; SSE-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
		RKSimonUnsubmitted Not Done Reply Inline Actions These regressions looks like we need to do more in the shuffle costs to recognise when the shuffles don't cross subvector boundaries? Either for illegal types like this or across 128-bit subvector boundaries on AVX. RKSimon: These regressions looks like we need to do more in the shuffle costs to recognise when the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, need to subtract scalarization overhead for insertelement instruction, trying to handle it correctly in vectorization of InsertElement instructions patch. I'm going to abandon this patch when vectorization of InsertElements is landed. Keeping it just in case. ABataev: Yes, need to subtract scalarization overhead for insertelement instruction, trying to handle it…
; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 0
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[TMP5]], i32 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
; SSE-NEXT: ret <4 x double> [[R03]]		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0
Show All 32 Lines	;
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SSE-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[R07]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[A8:%.]] = extractelement <16 x i16> [[A:%.]], i32 8
		; SSE-NEXT: [[A9:%.*]] = extractelement <16 x i16> [[A]], i32 9
		; SSE-NEXT: [[A10:%.*]] = extractelement <16 x i16> [[A]], i32 10
		; SSE-NEXT: [[A11:%.*]] = extractelement <16 x i16> [[A]], i32 11
		; SSE-NEXT: [[A12:%.*]] = extractelement <16 x i16> [[A]], i32 12
		; SSE-NEXT: [[A13:%.*]] = extractelement <16 x i16> [[A]], i32 13
		; SSE-NEXT: [[A14:%.*]] = extractelement <16 x i16> [[A]], i32 14
		; SSE-NEXT: [[A15:%.*]] = extractelement <16 x i16> [[A]], i32 15
		; SSE-NEXT: [[B8:%.]] = extractelement <16 x i16> [[B:%.]], i32 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i32 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i32 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i32 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i32 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i32 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i32 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i32 15
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[R8:%.*]] = sub i16 [[A8]], [[A9]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[R9:%.*]] = sub i16 [[A10]], [[A11]]
; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[R10:%.*]] = sub i16 [[A12]], [[A13]]
; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R11:%.*]] = sub i16 [[A14]], [[A15]]
		; SSE-NEXT: [[R12:%.*]] = sub i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = sub i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = sub i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = sub i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
		; SSE-NEXT: [[RV0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
		; SSE-NEXT: [[RV1:%.*]] = insertelement <16 x i16> [[RV0]], i16 [[TMP5]], i32 1
		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
		; SSE-NEXT: [[RV2:%.*]] = insertelement <16 x i16> [[RV1]], i16 [[TMP6]], i32 2
		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
		; SSE-NEXT: [[RV3:%.*]] = insertelement <16 x i16> [[RV2]], i16 [[TMP7]], i32 3
		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[RV3]], i16 [[TMP8]], i32 4
		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[TMP9]], i32 5
		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[TMP10]], i32 6
		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[TMP11]], i32 7
		; SSE-NEXT: [[RV8:%.*]] = insertelement <16 x i16> [[RV7]], i16 [[R8]], i32 8
		; SSE-NEXT: [[RV9:%.*]] = insertelement <16 x i16> [[RV8]], i16 [[R9]], i32 9
		; SSE-NEXT: [[RV10:%.*]] = insertelement <16 x i16> [[RV9]], i16 [[R10]], i32 10
		; SSE-NEXT: [[RV11:%.*]] = insertelement <16 x i16> [[RV10]], i16 [[R11]], i32 11
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV11]], i16 [[R12]], i32 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i32 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i32 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i32 15
; SSE-NEXT: ret <16 x i16> [[RV15]]		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	%"struct.std::array" = type { [32 x i8] }			%"struct.std::array" = type { [32 x i8] }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define fastcc void @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv() unnamed_addr #0 align 2 {			define fastcc void @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv() unnamed_addr #0 align 2 {
	; CHECK-LABEL: @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv(			; CHECK-LABEL: @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]
	; CHECK: if.then22.i:			; CHECK: if.then22.i:
	; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1			; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1
	; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]			; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0
				; CHECK-NEXT: [[SHR_I_I:%.*]] = lshr i32 [[CONV31_I]], 1
	; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1
				; CHECK-NEXT: [[SHR_1_I_I:%.*]] = lshr i32 [[CONV31_I]], 2
	; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2
				; CHECK-NEXT: [[SHR_2_I_I:%.*]] = lshr i32 [[CONV31_I]], 3
	; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3
				; CHECK-NEXT: [[SHR_3_I_I:%.*]] = lshr i32 [[CONV31_I]], 4
	; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4
				; CHECK-NEXT: [[SHR_4_I_I:%.*]] = lshr i32 [[CONV31_I]], 5
	; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5
				; CHECK-NEXT: [[SHR_5_I_I:%.*]] = lshr i32 [[CONV31_I]], 6
	; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6
				; CHECK-NEXT: [[SHR_6_I_I:%.*]] = lshr i32 [[CONV31_I]], 7
	; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[SHR_7_I_I:%.*]] = lshr i32 [[CONV31_I]], 8
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[CONV31_I]], i32 4
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[CONV31_I]], i32 5
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[CONV31_I]], i32 6
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[CONV31_I]], i32 7
	; CHECK-NEXT: [[TMP9:%.*]] = lshr <8 x i32> [[TMP8]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8			; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8
				; CHECK-NEXT: [[SHR_8_I_I:%.*]] = lshr i32 [[CONV31_I]], 9
	; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9			; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9
				; CHECK-NEXT: [[SHR_9_I_I:%.*]] = lshr i32 [[CONV31_I]], 10
	; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10			; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10
				; CHECK-NEXT: [[SHR_10_I_I:%.*]] = lshr i32 [[CONV31_I]], 11
	; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11			; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[SHR_11_I_I:%.*]] = lshr i32 [[CONV31_I]], 12
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP14:%.*]] = lshr <4 x i32> [[TMP13]], <i32 9, i32 10, i32 11, i32 12>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12			; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12
	; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13			; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13
	; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13			; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13
	; CHECK-NEXT: [[SHR_13_I_I:%.*]] = lshr i32 [[CONV31_I]], 14			; CHECK-NEXT: [[SHR_13_I_I:%.*]] = lshr i32 [[CONV31_I]], 14
	; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14			; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14
	; CHECK-NEXT: [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15			; CHECK-NEXT: [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[SHR_I_I]], i32 1
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP16]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SHR_1_I_I]], i32 2
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[SHR_2_I_I]], i32 3
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP18]], i32 2			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[SHR_3_I_I]], i32 4
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[SHR_4_I_I]], i32 5
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP20]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[SHR_5_I_I]], i32 6
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SHR_6_I_I]], i32 7
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP22]], i32 4			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SHR_7_I_I]], i32 8
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP9]], i32 4			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[SHR_8_I_I]], i32 9
	; CHECK-NEXT: [[TMP25:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP24]], i32 5			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[SHR_9_I_I]], i32 10
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <8 x i32> [[TMP9]], i32 5			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[SHR_10_I_I]], i32 11
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <16 x i32> [[TMP25]], i32 [[TMP26]], i32 6			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[SHR_11_I_I]], i32 12
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP9]], i32 6			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 [[SHR_12_I_I]], i32 13
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <16 x i32> [[TMP27]], i32 [[TMP28]], i32 7			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[SHR_13_I_I]], i32 14
	; CHECK-NEXT: [[TMP30:%.*]] = extractelement <8 x i32> [[TMP9]], i32 7			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[SHR_14_I_I]], i32 15
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <16 x i32> [[TMP29]], i32 [[TMP30]], i32 8			; CHECK-NEXT: [[TMP17:%.*]] = trunc <16 x i32> [[TMP16]] to <16 x i8>
	; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0			; CHECK-NEXT: [[TMP18:%.*]] = and <16 x i8> [[TMP17]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <16 x i32> [[TMP31]], i32 [[TMP32]], i32 9
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <16 x i32> [[TMP33]], i32 [[TMP34]], i32 10
	; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <16 x i32> [[TMP35]], i32 [[TMP36]], i32 11
	; CHECK-NEXT: [[TMP38:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3
	; CHECK-NEXT: [[TMP39:%.*]] = insertelement <16 x i32> [[TMP37]], i32 [[TMP38]], i32 12
	; CHECK-NEXT: [[TMP40:%.*]] = insertelement <16 x i32> [[TMP39]], i32 [[SHR_12_I_I]], i32 13
	; CHECK-NEXT: [[TMP41:%.*]] = insertelement <16 x i32> [[TMP40]], i32 [[SHR_13_I_I]], i32 14
	; CHECK-NEXT: [[TMP42:%.*]] = insertelement <16 x i32> [[TMP41]], i32 [[SHR_14_I_I]], i32 15
	; CHECK-NEXT: [[TMP43:%.*]] = trunc <16 x i32> [[TMP42]] to <16 x i8>
	; CHECK-NEXT: [[TMP44:%.*]] = and <16 x i8> [[TMP43]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15			; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15
	; CHECK-NEXT: [[TMP45:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*			; CHECK-NEXT: [[TMP19:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP44]], <16 x i8>* [[TMP45]], align 1			; CHECK-NEXT: store <16 x i8> [[TMP18]], <16 x i8>* [[TMP19]], align 1
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end50.i:			; CHECK: if.end50.i:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end50.i, label %if.then22.i			br i1 undef, label %if.end50.i, label %if.then22.i

	if.then22.i: ; preds = %entry			if.then22.i: ; preds = %entry
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sext-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE-LABEL: @loadext_2i8_to_2i64(		; SSE2-LABEL: @loadext_2i8_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
		;
		; SLM-LABEL: @loadext_2i8_to_2i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
		; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
Show All 11 Lines	;
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {		define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i32(		; SSE2-LABEL: @loadext_4i8_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>		; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
;		;
; SLM-LABEL: @loadext_4i8_to_4i32(		; SLM-LABEL: @loadext_4i8_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]		; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i32(		; AVX-LABEL: @loadext_4i8_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
Show All 22 Lines	;
%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0		%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE-LABEL: @loadext_4i8_to_4i64(		; SSE2-LABEL: @loadext_4i8_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i64(		; SLM-LABEL: @loadext_4i8_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX1-LABEL: @loadext_4i8_to_4i64(
		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
		; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; AVX1-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; AVX1-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
		; AVX1-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64
		; AVX1-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i8_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i8_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
%x0 = sext i8 %i0 to i64		%x0 = sext i8 %i0 to i64
%x1 = sext i8 %i1 to i64		%x1 = sext i8 %i1 to i64
%x2 = sext i8 %i2 to i64		%x2 = sext i8 %i2 to i64
%x3 = sext i8 %i3 to i64		%x3 = sext i8 %i3 to i64
%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {		define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i16(		; SSE-LABEL: @loadext_8i8_to_8i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i16> [[V7]]		; SSE-NEXT: ret <8 x i16> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: ret <8 x i16> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i16(		; AVX-LABEL: @loadext_8i8_to_8i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	;
%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4		%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4
%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5		%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5
%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6		%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6
%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7		%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7
ret <8 x i16> %v7		ret <8 x i16> %v7
}		}

define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {		define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i32(		; SSE-LABEL: @loadext_8i8_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]		; SSE-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i32(		; AVX-LABEL: @loadext_8i8_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	;
%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4		%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4
%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5		%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5
%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6		%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6
%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7		%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {		define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {
; SSE2-LABEL: @loadext_16i8_to_16i16(		; SSE-LABEL: @loadext_16i8_to_16i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SSE-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SSE2-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SSE-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SSE2-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SSE-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SSE-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SSE-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SSE-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SSE-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SSE-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8		; SSE-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8		; SSE-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9		; SSE-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9		; SSE-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10		; SSE-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10		; SSE-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11		; SSE-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11		; SSE-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12		; SSE-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12		; SSE-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13		; SSE-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13		; SSE-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14		; SSE-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14		; SSE-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15		; SSE-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15		; SSE-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; SSE2-NEXT: ret <16 x i16> [[V15]]		; SSE-NEXT: ret <16 x i16> [[V15]]
;
; SLM-LABEL: @loadext_16i8_to_16i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[X8:%.*]] = sext i8 [[I8]] to i16
; SLM-NEXT: [[X9:%.*]] = sext i8 [[I9]] to i16
; SLM-NEXT: [[X10:%.*]] = sext i8 [[I10]] to i16
; SLM-NEXT: [[X11:%.*]] = sext i8 [[I11]] to i16
; SLM-NEXT: [[X12:%.*]] = sext i8 [[I12]] to i16
; SLM-NEXT: [[X13:%.*]] = sext i8 [[I13]] to i16
; SLM-NEXT: [[X14:%.*]] = sext i8 [[I14]] to i16
; SLM-NEXT: [[X15:%.*]] = sext i8 [[I15]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
; SLM-NEXT: ret <16 x i16> [[V15]]
;		;
; AVX-LABEL: @loadext_16i8_to_16i16(		; AVX-LABEL: @loadext_16i8_to_16i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	;
ret <16 x i16> %v15		ret <16 x i16> %v15
}		}

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE-LABEL: @loadext_2i16_to_2i64(		; SSE2-LABEL: @loadext_2i16_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
		;
		; SLM-LABEL: @loadext_2i16_to_2i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
		; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {		define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i32(		; SSE-LABEL: @loadext_4i16_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE-NEXT: ret <4 x i32> [[V3]]
;
; SLM-LABEL: @loadext_4i16_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i32(		; AVX-LABEL: @loadext_4i16_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
Show All 21 Lines	;
%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0		%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE-LABEL: @loadext_4i16_to_4i64(		; SSE2-LABEL: @loadext_4i16_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i64(		; SLM-LABEL: @loadext_4i16_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX1-LABEL: @loadext_4i16_to_4i64(
		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
		; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
		; AVX1-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
		; AVX1-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
		; AVX1-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64
		; AVX1-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i16_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i16_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%x2 = sext i16 %i2 to i64		%x2 = sext i16 %i2 to i64
%x3 = sext i16 %i3 to i64		%x3 = sext i16 %i3 to i64
%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {		define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {
; SSE2-LABEL: @loadext_8i16_to_8i32(		; SSE-LABEL: @loadext_8i16_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]		; SSE-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i16_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i16 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i16 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i16 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i16 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i16_to_8i32(		; AVX-LABEL: @loadext_8i16_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	;
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

;		;
; vXi32		; vXi32
;		;

define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {		define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
; SSE-LABEL: @loadext_2i32_to_2i64(		; SSE2-LABEL: @loadext_2i32_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
		;
		; SLM-LABEL: @loadext_2i32_to_2i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
		; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%x0 = sext i32 %i0 to i64		%x0 = sext i32 %i0 to i64
%x1 = sext i32 %i1 to i64		%x1 = sext i32 %i1 to i64
%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE-LABEL: @loadext_4i32_to_4i64(		; SSE2-LABEL: @loadext_4i32_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i32_to_4i64(		; SLM-LABEL: @loadext_4i32_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX1-LABEL: @loadext_4i32_to_4i64(
		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
		; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
		; AVX1-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
		; AVX1-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
		; AVX1-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64
		; AVX1-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i32_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i32_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%p2 = getelementptr inbounds i32, i32* %p0, i64 2		%p2 = getelementptr inbounds i32, i32* %p0, i64 2
%p3 = getelementptr inbounds i32, i32* %p0, i64 3		%p3 = getelementptr inbounds i32, i32* %p0, i64 3
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%i2 = load i32, i32* %p2, align 1		%i2 = load i32, i32* %p2, align 1
%i3 = load i32, i32* %p3, align 1		%i3 = load i32, i32* %p3, align 1
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/sext.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE-LABEL: @loadext_2i8_to_2i64(		; SSE2-LABEL: @loadext_2i8_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
		;
		; SLM-LABEL: @loadext_2i8_to_2i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
		; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
Show All 11 Lines	;
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {		define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i32(		; SSE2-LABEL: @loadext_4i8_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>		; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
;		;
; SLM-LABEL: @loadext_4i8_to_4i32(		; SLM-LABEL: @loadext_4i8_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]		; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i32(		; AVX-LABEL: @loadext_4i8_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
Show All 22 Lines	;
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE-LABEL: @loadext_4i8_to_4i64(		; SSE2-LABEL: @loadext_4i8_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i64(		; SLM-LABEL: @loadext_4i8_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX1-LABEL: @loadext_4i8_to_4i64(
		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
		; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; AVX1-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; AVX1-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
		; AVX1-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64
		; AVX1-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i8_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i8_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
%x0 = sext i8 %i0 to i64		%x0 = sext i8 %i0 to i64
%x1 = sext i8 %i1 to i64		%x1 = sext i8 %i1 to i64
%x2 = sext i8 %i2 to i64		%x2 = sext i8 %i2 to i64
%x3 = sext i8 %i3 to i64		%x3 = sext i8 %i3 to i64
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {		define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i16(		; SSE-LABEL: @loadext_8i8_to_8i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i16> [[V7]]		; SSE-NEXT: ret <8 x i16> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: ret <8 x i16> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i16(		; AVX-LABEL: @loadext_8i8_to_8i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	;
%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4		%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4
%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5		%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5
%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6		%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6
%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7		%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7
ret <8 x i16> %v7		ret <8 x i16> %v7
}		}

define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {		define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i32(		; SSE-LABEL: @loadext_8i8_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]		; SSE-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i32(		; AVX-LABEL: @loadext_8i8_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	;
%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4		%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4
%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5		%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5
%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6		%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6
%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7		%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {		define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {
; SSE2-LABEL: @loadext_16i8_to_16i16(		; SSE-LABEL: @loadext_16i8_to_16i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SSE-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SSE2-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SSE-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SSE2-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SSE-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SSE-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SSE-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SSE-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SSE-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SSE-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8		; SSE-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8		; SSE-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9		; SSE-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9		; SSE-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10		; SSE-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10		; SSE-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11		; SSE-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11		; SSE-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12		; SSE-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12		; SSE-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13		; SSE-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13		; SSE-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14		; SSE-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14		; SSE-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15		; SSE-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15		; SSE-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; SSE2-NEXT: ret <16 x i16> [[V15]]		; SSE-NEXT: ret <16 x i16> [[V15]]
;
; SLM-LABEL: @loadext_16i8_to_16i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[X8:%.*]] = sext i8 [[I8]] to i16
; SLM-NEXT: [[X9:%.*]] = sext i8 [[I9]] to i16
; SLM-NEXT: [[X10:%.*]] = sext i8 [[I10]] to i16
; SLM-NEXT: [[X11:%.*]] = sext i8 [[I11]] to i16
; SLM-NEXT: [[X12:%.*]] = sext i8 [[I12]] to i16
; SLM-NEXT: [[X13:%.*]] = sext i8 [[I13]] to i16
; SLM-NEXT: [[X14:%.*]] = sext i8 [[I14]] to i16
; SLM-NEXT: [[X15:%.*]] = sext i8 [[I15]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
; SLM-NEXT: ret <16 x i16> [[V15]]
;		;
; AVX-LABEL: @loadext_16i8_to_16i16(		; AVX-LABEL: @loadext_16i8_to_16i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	;
ret <16 x i16> %v15		ret <16 x i16> %v15
}		}

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE-LABEL: @loadext_2i16_to_2i64(		; SSE2-LABEL: @loadext_2i16_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
		;
		; SLM-LABEL: @loadext_2i16_to_2i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
		; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {		define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i32(		; SSE-LABEL: @loadext_4i16_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE-NEXT: ret <4 x i32> [[V3]]
;
; SLM-LABEL: @loadext_4i16_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i32(		; AVX-LABEL: @loadext_4i16_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
Show All 21 Lines	;
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE-LABEL: @loadext_4i16_to_4i64(		; SSE2-LABEL: @loadext_4i16_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i64(		; SLM-LABEL: @loadext_4i16_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX1-LABEL: @loadext_4i16_to_4i64(
		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
		; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
		; AVX1-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
		; AVX1-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
		; AVX1-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64
		; AVX1-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i16_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i16_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%x2 = sext i16 %i2 to i64		%x2 = sext i16 %i2 to i64
%x3 = sext i16 %i3 to i64		%x3 = sext i16 %i3 to i64
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {		define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {
; SSE2-LABEL: @loadext_8i16_to_8i32(		; SSE-LABEL: @loadext_8i16_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]		; SSE-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i16_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i16 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i16 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i16 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i16 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i16_to_8i32(		; AVX-LABEL: @loadext_8i16_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	;
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

;		;
; vXi32		; vXi32
;		;

define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {		define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
; SSE-LABEL: @loadext_2i32_to_2i64(		; SSE2-LABEL: @loadext_2i32_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
		;
		; SLM-LABEL: @loadext_2i32_to_2i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
		; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%x0 = sext i32 %i0 to i64		%x0 = sext i32 %i0 to i64
%x1 = sext i32 %i1 to i64		%x1 = sext i32 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE-LABEL: @loadext_4i32_to_4i64(		; SSE2-LABEL: @loadext_4i32_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE2-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE2-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i32_to_4i64(		; SLM-LABEL: @loadext_4i32_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX1-LABEL: @loadext_4i32_to_4i64(
		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
		; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
		; AVX1-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
		; AVX1-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
		; AVX1-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64
		; AVX1-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64
		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i32_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i32_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%p2 = getelementptr inbounds i32, i32* %p0, i64 2		%p2 = getelementptr inbounds i32, i32* %p0, i64 2
%p3 = getelementptr inbounds i32, i32* %p0, i64 3		%p3 = getelementptr inbounds i32, i32* %p0, i64 3
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%i2 = load i32, i32* %p2, align 1		%i2 = load i32, i32* %p2, align 1
%i3 = load i32, i32* %p3, align 1		%i3 = load i32, i32* %p3, align 1
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/zext-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE2-LABEL: @loadext_2i8_to_2i64(		; SSE2-LABEL: @loadext_2i8_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>		; SSE2-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
;		;
; SLM-LABEL: @loadext_2i8_to_2i64(		; SLM-LABEL: @loadext_2i8_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64		; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
Show All 29 Lines
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
;		;
; SLM-LABEL: @loadext_4i8_to_4i32(		; SLM-LABEL: @loadext_4i8_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]		; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i32(		; AVX-LABEL: @loadext_4i8_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
Show All 26 Lines	;
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i64(		; SSE2-LABEL: @loadext_4i8_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>		; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SSE2-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i8_to_4i64(		; SLM-LABEL: @loadext_4i8_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i64(		; AVX1-LABEL: @loadext_4i8_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>		; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i8_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i8_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
; SLM-LABEL: @loadext_8i8_to_8i16(		; SLM-LABEL: @loadext_8i8_to_8i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16		; SLM-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16		; SLM-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16		; SLM-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16		; SLM-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[X0]], i32 0		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1		; SLM-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: ret <8 x i16> [[V7]]		; SLM-NEXT: ret <8 x i16> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i16(		; AVX-LABEL: @loadext_8i8_to_8i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
; SLM-LABEL: @loadext_8i8_to_8i32(		; SLM-LABEL: @loadext_8i8_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32		; SLM-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32		; SLM-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i32		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i32		; SLM-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i32		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i32		; SLM-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]		; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i32(		; AVX-LABEL: @loadext_8i8_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1		; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1		; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1		; SLM-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1		; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1		; SLM-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1		; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1		; SLM-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16		; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16		; SLM-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16		; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16		; SLM-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16		; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16		; SLM-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16		; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16		; SLM-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; SLM-NEXT: [[X8:%.*]] = zext i8 [[I8]] to i16		; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; SLM-NEXT: [[X9:%.*]] = zext i8 [[I9]] to i16		; SLM-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; SLM-NEXT: [[X10:%.*]] = zext i8 [[I10]] to i16		; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; SLM-NEXT: [[X11:%.*]] = zext i8 [[I11]] to i16		; SLM-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; SLM-NEXT: [[X12:%.*]] = zext i8 [[I12]] to i16		; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; SLM-NEXT: [[X13:%.*]] = zext i8 [[I13]] to i16		; SLM-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; SLM-NEXT: [[X14:%.*]] = zext i8 [[I14]] to i16		; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; SLM-NEXT: [[X15:%.*]] = zext i8 [[I15]] to i16		; SLM-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[X0]], i32 0		; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1		; SLM-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2		; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
; SLM-NEXT: ret <16 x i16> [[V15]]		; SLM-NEXT: ret <16 x i16> [[V15]]
;		;
; AVX-LABEL: @loadext_16i8_to_16i16(		; AVX-LABEL: @loadext_16i8_to_16i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE2-LABEL: @loadext_2i16_to_2i64(		; SSE2-LABEL: @loadext_2i16_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>		; SSE2-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
;		;
; SLM-LABEL: @loadext_2i16_to_2i64(		; SLM-LABEL: @loadext_2i16_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64		; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
Show All 29 Lines
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
;		;
; SLM-LABEL: @loadext_4i16_to_4i32(		; SLM-LABEL: @loadext_4i16_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]		; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i32(		; AVX-LABEL: @loadext_4i16_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
Show All 26 Lines	;
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i64(		; SSE2-LABEL: @loadext_4i16_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>		; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SSE2-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i16_to_4i64(		; SLM-LABEL: @loadext_4i16_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i64(		; AVX1-LABEL: @loadext_4i16_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>		; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i16_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i16_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
; SLM-LABEL: @loadext_8i16_to_8i32(		; SLM-LABEL: @loadext_8i16_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32		; SLM-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32		; SLM-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SLM-NEXT: [[X4:%.*]] = zext i16 [[I4]] to i32		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SLM-NEXT: [[X5:%.*]] = zext i16 [[I5]] to i32		; SLM-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SLM-NEXT: [[X6:%.*]] = zext i16 [[I6]] to i32		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SLM-NEXT: [[X7:%.*]] = zext i16 [[I7]] to i32		; SLM-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]		; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i16_to_8i32(		; AVX-LABEL: @loadext_8i16_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
;		;
; SLM-LABEL: @loadext_2i32_to_2i64(		; SLM-LABEL: @loadext_2i32_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64		; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
Show All 12 Lines	;
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE2-LABEL: @loadext_4i32_to_4i64(		; SSE2-LABEL: @loadext_4i32_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>		; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
		; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
		; SSE2-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64
		; SSE2-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64
		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i32_to_4i64(		; SLM-LABEL: @loadext_4i32_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i32_to_4i64(		; AVX1-LABEL: @loadext_4i32_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>		; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i32_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i32_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%p2 = getelementptr inbounds i32, i32* %p0, i64 2		%p2 = getelementptr inbounds i32, i32* %p0, i64 2
%p3 = getelementptr inbounds i32, i32* %p0, i64 3		%p3 = getelementptr inbounds i32, i32* %p0, i64 3
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%i2 = load i32, i32* %p2, align 1		%i2 = load i32, i32* %p2, align 1
%i3 = load i32, i32* %p3, align 1		%i3 = load i32, i32* %p3, align 1
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/zext.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX,AVX512

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE2-LABEL: @loadext_2i8_to_2i64(		; SSE2-LABEL: @loadext_2i8_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>		; SSE2-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
;		;
; SLM-LABEL: @loadext_2i8_to_2i64(		; SLM-LABEL: @loadext_2i8_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64		; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
Show All 29 Lines
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
;		;
; SLM-LABEL: @loadext_4i8_to_4i32(		; SLM-LABEL: @loadext_4i8_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]		; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i32(		; AVX-LABEL: @loadext_4i8_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
Show All 26 Lines	;
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i64(		; SSE2-LABEL: @loadext_4i8_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>		; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SSE2-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i8_to_4i64(		; SLM-LABEL: @loadext_4i8_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i64(		; AVX1-LABEL: @loadext_4i8_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>		; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i8_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i8_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
; SLM-LABEL: @loadext_8i8_to_8i16(		; SLM-LABEL: @loadext_8i8_to_8i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16		; SLM-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16		; SLM-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16		; SLM-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16		; SLM-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[X0]], i32 0		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1		; SLM-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: ret <8 x i16> [[V7]]		; SLM-NEXT: ret <8 x i16> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i16(		; AVX-LABEL: @loadext_8i8_to_8i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
; SLM-LABEL: @loadext_8i8_to_8i32(		; SLM-LABEL: @loadext_8i8_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32		; SLM-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32		; SLM-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i32		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i32		; SLM-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i32		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i32		; SLM-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]		; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i32(		; AVX-LABEL: @loadext_8i8_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1		; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1		; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1		; SLM-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1		; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1		; SLM-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1		; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1		; SLM-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16		; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16		; SLM-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16		; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16		; SLM-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16		; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16		; SLM-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16		; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16		; SLM-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; SLM-NEXT: [[X8:%.*]] = zext i8 [[I8]] to i16		; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; SLM-NEXT: [[X9:%.*]] = zext i8 [[I9]] to i16		; SLM-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; SLM-NEXT: [[X10:%.*]] = zext i8 [[I10]] to i16		; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; SLM-NEXT: [[X11:%.*]] = zext i8 [[I11]] to i16		; SLM-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; SLM-NEXT: [[X12:%.*]] = zext i8 [[I12]] to i16		; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; SLM-NEXT: [[X13:%.*]] = zext i8 [[I13]] to i16		; SLM-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; SLM-NEXT: [[X14:%.*]] = zext i8 [[I14]] to i16		; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; SLM-NEXT: [[X15:%.*]] = zext i8 [[I15]] to i16		; SLM-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[X0]], i32 0		; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1		; SLM-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2		; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
; SLM-NEXT: ret <16 x i16> [[V15]]		; SLM-NEXT: ret <16 x i16> [[V15]]
;		;
; AVX-LABEL: @loadext_16i8_to_16i16(		; AVX-LABEL: @loadext_16i8_to_16i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE2-LABEL: @loadext_2i16_to_2i64(		; SSE2-LABEL: @loadext_2i16_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>		; SSE2-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
;		;
; SLM-LABEL: @loadext_2i16_to_2i64(		; SLM-LABEL: @loadext_2i16_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64		; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
Show All 29 Lines
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
;		;
; SLM-LABEL: @loadext_4i16_to_4i32(		; SLM-LABEL: @loadext_4i16_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]		; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i32(		; AVX-LABEL: @loadext_4i16_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
Show All 26 Lines	;
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i64(		; SSE2-LABEL: @loadext_4i16_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>		; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SSE2-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i16_to_4i64(		; SLM-LABEL: @loadext_4i16_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i64(		; AVX1-LABEL: @loadext_4i16_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>		; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i16_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i16_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
; SLM-LABEL: @loadext_8i16_to_8i32(		; SLM-LABEL: @loadext_8i16_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1		; SLM-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32		; SLM-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32		; SLM-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SLM-NEXT: [[X4:%.*]] = zext i16 [[I4]] to i32		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SLM-NEXT: [[X5:%.*]] = zext i16 [[I5]] to i32		; SLM-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SLM-NEXT: [[X6:%.*]] = zext i16 [[I6]] to i32		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SLM-NEXT: [[X7:%.*]] = zext i16 [[I7]] to i32		; SLM-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1		; SLM-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]		; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i16_to_8i32(		; AVX-LABEL: @loadext_8i16_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE2-NEXT: ret <2 x i64> [[V1]]
;		;
; SLM-LABEL: @loadext_2i32_to_2i64(		; SLM-LABEL: @loadext_2i32_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64		; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64		; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
		; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]		; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
Show All 12 Lines	;
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE2-LABEL: @loadext_4i32_to_4i64(		; SSE2-LABEL: @loadext_4i32_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>		; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
		; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
		; SSE2-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64
		; SSE2-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64
		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE2-NEXT: ret <4 x i64> [[V3]]
;		;
; SLM-LABEL: @loadext_4i32_to_4i64(		; SLM-LABEL: @loadext_4i32_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; SLM-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]		; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i32_to_4i64(		; AVX1-LABEL: @loadext_4i32_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>		; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX1-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1		; AVX1-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX1-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2		; AVX1-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2		; AVX1-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3		; AVX1-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3		; AVX1-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <4 x i64> [[V3]]		; AVX1-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
		; AVX1-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
		; AVX1-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX2-LABEL: @loadext_4i32_to_4i64(
		; AVX2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX2-NEXT: ret <4 x i64> [[V3]]
		;
		; AVX512-LABEL: @loadext_4i32_to_4i64(
		; AVX512-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
		; AVX512-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
		; AVX512-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
		; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
		; AVX512-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
		; AVX512-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
		; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
		; AVX512-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
		; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
		; AVX512-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
		; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; AVX512-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
		; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; AVX512-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
		; AVX512-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%p2 = getelementptr inbounds i32, i32* %p0, i64 2		%p2 = getelementptr inbounds i32, i32* %p0, i64 2
%p3 = getelementptr inbounds i32, i32* %p0, i64 3		%p3 = getelementptr inbounds i32, i32* %p0, i64 3
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%i2 = load i32, i32* %p2, align 1		%i2 = load i32, i32* %p2, align 1
%i3 = load i32, i32* %p3, align 1		%i3 = load i32, i32* %p3, align 1
Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve handling of compensate external uses cost.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 341583

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

llvm/test/Transforms/SLPVectorizer/X86/sext-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/sext.ll

llvm/test/Transforms/SLPVectorizer/X86/zext-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/zext.ll

[SLP]Improve handling of compensate external uses cost.
ClosedPublic