This is an archive of the discontinued LLVM Phabricator instance.

[x86] make SLM extract vector element more expensive than default
ClosedPublic

Authored by spatel on Nov 22 2019, 10:51 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
ABataev

Commits

rG5c166f1d1969: [x86] make SLM extract vector element more expensive than default

Summary

I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here:
https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc

The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont.

This is a small step towards the larger motivation discussed in PR43605:
https://bugs.llvm.org/show_bug.cgi?id=43605

Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Nov 22 2019, 10:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2019, 10:51 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

craig.topper added inline comments.Nov 22 2019, 11:07 AM

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll
302–305	I'm not sure I understand what's happening here. SLM doesn't have 256-bit vectors. Is this going to codegen well?

RKSimon added inline comments.Nov 22 2019, 12:19 PM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
2412	You should be able to do: int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD);
llvm/test/Transforms/SLPVectorizer/X86/hadd.ll
302–305	Probably the cost model type legalization has kicked in. It maybe that its not handling EXTRACT_SUBVECTOR shuffle costs or something so it ends up scalarizing?

spatel marked an inline comment as done.Nov 22 2019, 1:02 PM

spatel added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

302–305

I didn't step through SLP, but I agree this is suspicious. But then we end up with virtually identical asm before and after this change:

movdqa	%xmm0, %xmm4
movdqa	%xmm1, %xmm5
punpckhqdq	%xmm2, %xmm0    # xmm0 = xmm0[1],xmm2[1]
punpckhqdq	%xmm3, %xmm1    # xmm1 = xmm1[1],xmm3[1]
punpcklqdq	%xmm2, %xmm4    # xmm4 = xmm4[0],xmm2[0]
punpcklqdq	%xmm3, %xmm5    # xmm5 = xmm5[0],xmm3[0]
paddq	%xmm4, %xmm0
paddq	%xmm5, %xmm1

Patch updated:
Use InstructionOpcodeToISD() to simplify code.

spatel marked an inline comment as done.Nov 24 2019, 6:00 AM

spatel added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll
302–305	I'm still not clear on exactly how SLP does its accounting, but debug output shows that when it used to evaluate the 4-wide vector ops, it saw this: SLP: Spill Cost = 0. SLP: Extract Cost = 4. SLP: Total Cost = 6. ...and decided that would not be profitable. But then it evaluates doing the ops as 2-wide (128-bit), it sees this: SLP: Spill Cost = 0. SLP: Extract Cost = 2. SLP: Total Cost = -1. SLP: Vectorizing list at cost:-5. So that's worth doing. With this patch, it now sees this at 4-wide: SLP: Spill Cost = 0. SLP: Extract Cost = 56. SLP: Total Cost = -40. SLP: Vectorizing list at cost:-44. This seems more truthful - the cost of extract on SLM is very large relative to the cost of vector ops. The cost model itself deals with illegal types (as here - 256-bit on a subtarget where that is not legal) by doing a simple scaling: see lines 2393, 2412 in the source code diff in this patch.

LGTM

This revision is now accepted and ready to land.Nov 26 2019, 11:30 AM

Closed by commit rG5c166f1d1969: [x86] make SLM extract vector element more expensive than default (authored by spatel). · Explain WhyNov 27 2019, 11:13 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D59710: [SLP] remove lower limit for forming reduction patterns.Nov 27 2019, 11:22 AM

spatel mentioned this in D71023: [x86] add cost model special-case for insert/extract from element 0.Dec 4 2019, 8:34 AM

spatel mentioned this in rG7ff0fcb53f6e: [x86] add cost model special-case for insert/extract from element 0.Dec 6 2019, 10:51 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

14 lines

test/

Analysis/

CostModel/

X86/

fptosi.ll

59 lines

fptoui.ll

59 lines

shuffle-extract_subvector.ll

654 lines

vector-extract.ll

680 lines

Transforms/

LoopVectorize/

X86/

interleaving.ll

12 lines

SLPVectorizer/

X86/

98 lines

41 lines

57 lines

57 lines

938 lines

954 lines

Diff 231302

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,371 Lines • ▼ Show 20 Lines	if (ISD != ISD::DELETED_NODE) {
if (const auto *Entry = CostTableLookup(X86CostTbl, ISD, MTy))		if (const auto *Entry = CostTableLookup(X86CostTbl, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
}		}

return BaseT::getIntrinsicInstrCost(IID, RetTy, Args, FMF, VF);		return BaseT::getIntrinsicInstrCost(IID, RetTy, Args, FMF, VF);
}		}

int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {		int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
		static const CostTblEntry SLMCostTbl[] = {
		{ ISD::EXTRACT_VECTOR_ELT, MVT::i8, 4 },
		{ ISD::EXTRACT_VECTOR_ELT, MVT::i16, 4 },
		{ ISD::EXTRACT_VECTOR_ELT, MVT::i32, 4 },
		{ ISD::EXTRACT_VECTOR_ELT, MVT::i64, 7 }
		};

assert(Val->isVectorTy() && "This must be a vector type");		assert(Val->isVectorTy() && "This must be a vector type");

Type *ScalarType = Val->getScalarType();		Type *ScalarType = Val->getScalarType();

if (Index != -1U) {		if (Index != -1U) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Val);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Val);

// This type is legalized to a scalar type.		// This type is legalized to a scalar type.
if (!LT.second.isVector())		if (!LT.second.isVector())
return 0;		return 0;

// The type may be split. Normalize the index to the new type.		// The type may be split. Normalize the index to the new type.
unsigned Width = LT.second.getVectorNumElements();		unsigned Width = LT.second.getVectorNumElements();
Index = Index % Width;		Index = Index % Width;

// Floating point scalars are already located in index #0.		// Floating point scalars are already located in index #0.
if (ScalarType->isFloatingPointTy() && Index == 0)		if (ScalarType->isFloatingPointTy() && Index == 0)
return 0;		return 0;

		int ISD = TLI->InstructionOpcodeToISD(Opcode);
		assert(ISD && "Unexpected vector opcode");
		MVT MScalarTy = LT.second.getScalarType();
		if (ST->isSLM())
		if (auto *Entry = CostTableLookup(SLMCostTbl, ISD, MScalarTy))
		return LT.first * Entry->Cost;
		RKSimonUnsubmitted Done Reply Inline Actions You should be able to do: int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD); RKSimon: You should be able to do: ``` int ISD = TLI->InstructionOpcodeToISD(Opcode); assert(ISD)…
}		}

// Add to the base cost if we know that the extracted element of a vector is		// Add to the base cost if we know that the extracted element of a vector is
// destined to be moved to and used in the integer register file.		// destined to be moved to and used in the integer register file.
int RegisterFileMoveCost = 0;		int RegisterFileMoveCost = 0;
if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())		if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())
RegisterFileMoveCost = 1;		RegisterFileMoveCost = 1;

▲ Show 20 Lines • Show All 1,393 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/fptosi.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f,+avx512dq \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512DQ			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f,+avx512dq \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512DQ
	;			;
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2

	define i32 @fptosi_double_i64(i32 %arg) {			define i32 @fptosi_double_i64(i32 %arg) {
	; SSE-LABEL: 'fptosi_double_i64'			; SSE-LABEL: 'fptosi_double_i64'
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64
	; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>			; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>
	; SSE-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>			; SSE-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>
	Show All 16 Lines
	;			;
	; AVX512DQ-LABEL: 'fptosi_double_i64'			; AVX512DQ-LABEL: 'fptosi_double_i64'
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptosi <8 x double> undef to <8 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptosi <8 x double> undef to <8 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_double_i64'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64
				; SLM-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 75 for instruction: %V8I64 = fptosi <8 x double> undef to <8 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_double_i64'			; BTVER2-LABEL: 'fptosi_double_i64'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi double undef to i64
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x double> undef to <2 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I64 = fptosi <4 x double> undef to <4 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I64 = fptosi <8 x double> undef to <8 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I64 = fptosi <8 x double> undef to <8 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I64 = fptosi double undef to i64			%I64 = fptosi double undef to i64
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptosi_double_i32'			; AVX512-LABEL: 'fptosi_double_i32'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi double undef to i32			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi double undef to i32
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptosi <2 x double> undef to <2 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptosi <2 x double> undef to <2 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x double> undef to <4 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x double> undef to <4 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x double> undef to <8 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x double> undef to <8 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_double_i32'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi double undef to i32
				; SLM-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptosi <2 x double> undef to <2 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4I32 = fptosi <4 x double> undef to <4 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8I32 = fptosi <8 x double> undef to <8 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_double_i32'			; BTVER2-LABEL: 'fptosi_double_i32'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi double undef to i32			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi double undef to i32
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptosi <2 x double> undef to <2 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptosi <2 x double> undef to <2 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x double> undef to <4 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x double> undef to <4 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8I32 = fptosi <8 x double> undef to <8 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8I32 = fptosi <8 x double> undef to <8 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I32 = fptosi double undef to i32			%I32 = fptosi double undef to i32
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptosi_double_i16'			; AVX512-LABEL: 'fptosi_double_i16'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_double_i16'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
				; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_double_i16'			; BTVER2-LABEL: 'fptosi_double_i16'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I16 = fptosi double undef to i16			%I16 = fptosi double undef to i16
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptosi_double_i8'			; AVX512-LABEL: 'fptosi_double_i8'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_double_i8'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
				; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_double_i8'			; BTVER2-LABEL: 'fptosi_double_i8'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I8 = fptosi double undef to i8			%I8 = fptosi double undef to i8
	Show All 31 Lines
	; AVX512DQ-LABEL: 'fptosi_float_i64'			; AVX512DQ-LABEL: 'fptosi_float_i64'
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi float undef to i64			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi float undef to i64
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptosi <2 x float> undef to <2 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptosi <2 x float> undef to <2 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptosi <4 x float> undef to <4 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptosi <4 x float> undef to <4 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptosi <8 x float> undef to <8 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptosi <8 x float> undef to <8 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I64 = fptosi <16 x float> undef to <16 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I64 = fptosi <16 x float> undef to <16 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_float_i64'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi float undef to i64
				; SLM-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = fptosi <2 x float> undef to <2 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I64 = fptosi <4 x float> undef to <4 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 75 for instruction: %V8I64 = fptosi <8 x float> undef to <8 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 151 for instruction: %V16I64 = fptosi <16 x float> undef to <16 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_float_i64'			; BTVER2-LABEL: 'fptosi_float_i64'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi float undef to i64			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptosi float undef to i64
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x float> undef to <2 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptosi <2 x float> undef to <2 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I64 = fptosi <4 x float> undef to <4 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I64 = fptosi <4 x float> undef to <4 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I64 = fptosi <8 x float> undef to <8 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I64 = fptosi <8 x float> undef to <8 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V16I64 = fptosi <16 x float> undef to <16 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V16I64 = fptosi <16 x float> undef to <16 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I64 = fptosi float undef to i64			%I64 = fptosi float undef to i64
	%V2I64 = fptosi <2 x float> undef to <2 x i64>			%V2I64 = fptosi <2 x float> undef to <2 x i64>
	%V4I64 = fptosi <4 x float> undef to <4 x i64>			%V4I64 = fptosi <4 x float> undef to <4 x i64>
	%V8I64 = fptosi <8 x float> undef to <8 x i64>			%V8I64 = fptosi <8 x float> undef to <8 x i64>
	%V16I64 = fptosi <16 x float> undef to <16 x i64>			%V16I64 = fptosi <16 x float> undef to <16 x i64>
	ret i32 undef			ret i32 undef
	}			}

	define i32 @fptosi_float_i32(i32 %arg) {			define i32 @fptosi_float_i32(i32 %arg) {
	; CHECK-LABEL: 'fptosi_float_i32'			; CHECK-LABEL: 'fptosi_float_i32'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi float undef to i32			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi float undef to i32
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x float> undef to <4 x i32>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x float> undef to <4 x i32>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x float> undef to <8 x i32>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x float> undef to <8 x i32>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptosi <16 x float> undef to <16 x i32>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptosi <16 x float> undef to <16 x i32>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_float_i32'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi float undef to i32
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x float> undef to <4 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x float> undef to <8 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptosi <16 x float> undef to <16 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_float_i32'			; BTVER2-LABEL: 'fptosi_float_i32'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi float undef to i32			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptosi float undef to i32
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x float> undef to <4 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptosi <4 x float> undef to <4 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x float> undef to <8 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptosi <8 x float> undef to <8 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptosi <16 x float> undef to <16 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptosi <16 x float> undef to <16 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I32 = fptosi float undef to i32			%I32 = fptosi float undef to i32
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptosi_float_i16'			; AVX512-LABEL: 'fptosi_float_i16'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi float undef to i16			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi float undef to i16
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = fptosi <4 x float> undef to <4 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = fptosi <4 x float> undef to <4 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x float> undef to <8 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x float> undef to <8 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I16 = fptosi <16 x float> undef to <16 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I16 = fptosi <16 x float> undef to <16 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_float_i16'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi float undef to i16
				; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = fptosi <4 x float> undef to <4 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V8I16 = fptosi <8 x float> undef to <8 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V16I16 = fptosi <16 x float> undef to <16 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_float_i16'			; BTVER2-LABEL: 'fptosi_float_i16'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi float undef to i16			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi float undef to i16
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = fptosi <4 x float> undef to <4 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = fptosi <4 x float> undef to <4 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x float> undef to <8 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x float> undef to <8 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptosi <16 x float> undef to <16 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptosi <16 x float> undef to <16 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I16 = fptosi float undef to i16			%I16 = fptosi float undef to i16
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptosi_float_i8'			; AVX512-LABEL: 'fptosi_float_i8'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptosi_float_i8'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8
				; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptosi_float_i8'			; BTVER2-LABEL: 'fptosi_float_i8'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I8 = fptosi float undef to i8			%I8 = fptosi float undef to i8
	%V4I8 = fptosi <4 x float> undef to <4 x i8>			%V4I8 = fptosi <4 x float> undef to <4 x i8>
	%V8I8 = fptosi <8 x float> undef to <8 x i8>			%V8I8 = fptosi <8 x float> undef to <8 x i8>
	%V16I8 = fptosi <16 x float> undef to <16 x i8>			%V16I8 = fptosi <16 x float> undef to <16 x i8>
	ret i32 undef			ret i32 undef
	}			}

llvm/test/Analysis/CostModel/X86/fptoui.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f,+avx512dq \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512DQ			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mattr=+avx512f,+avx512dq \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512DQ
	;			;
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42
	; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2			; RUN: opt < %s -mtriple=x86_64-apple-darwin -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2

	define i32 @fptoui_double_i64(i32 %arg) {			define i32 @fptoui_double_i64(i32 %arg) {
	; SSE-LABEL: 'fptoui_double_i64'			; SSE-LABEL: 'fptoui_double_i64'
	; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64
	; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; SSE-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; SSE-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	Show All 16 Lines
	;			;
	; AVX512DQ-LABEL: 'fptoui_double_i64'			; AVX512DQ-LABEL: 'fptoui_double_i64'
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui double undef to i64			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui double undef to i64
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_double_i64'
				; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64
				; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_double_i64'			; BTVER2-LABEL: 'fptoui_double_i64'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64			; BTVER2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I64 = fptoui double undef to i64			%I64 = fptoui double undef to i64
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptoui_double_i32'			; AVX512-LABEL: 'fptoui_double_i32'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_double_i32'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
				; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_double_i32'			; BTVER2-LABEL: 'fptoui_double_i32'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I32 = fptoui double undef to i32			%I32 = fptoui double undef to i32
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptoui_double_i16'			; AVX512-LABEL: 'fptoui_double_i16'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_double_i16'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
				; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_double_i16'			; BTVER2-LABEL: 'fptoui_double_i16'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I16 = fptoui double undef to i16			%I16 = fptoui double undef to i16
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptoui_double_i8'			; AVX512-LABEL: 'fptoui_double_i8'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_double_i8'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
				; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_double_i8'			; BTVER2-LABEL: 'fptoui_double_i8'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I8 = fptoui double undef to i8			%I8 = fptoui double undef to i8
	Show All 31 Lines
	; AVX512DQ-LABEL: 'fptoui_float_i64'			; AVX512DQ-LABEL: 'fptoui_float_i64'
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui float undef to i64			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui float undef to i64
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_float_i64'
				; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64
				; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 199 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_float_i64'			; BTVER2-LABEL: 'fptoui_float_i64'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64			; BTVER2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	Show All 22 Lines
	;			;
	; AVX512-LABEL: 'fptoui_float_i32'			; AVX512-LABEL: 'fptoui_float_i32'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_float_i32'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
				; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_float_i32'			; BTVER2-LABEL: 'fptoui_float_i32'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 65 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 65 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I32 = fptoui float undef to i32			%I32 = fptoui float undef to i32
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptoui_float_i16'			; AVX512-LABEL: 'fptoui_float_i16'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_float_i16'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16
				; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_float_i16'			; BTVER2-LABEL: 'fptoui_float_i16'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I16 = fptoui float undef to i16			%I16 = fptoui float undef to i16
	Show All 20 Lines
	;			;
	; AVX512-LABEL: 'fptoui_float_i8'			; AVX512-LABEL: 'fptoui_float_i8'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
				; SLM-LABEL: 'fptoui_float_i8'
				; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8
				; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
				; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
	; BTVER2-LABEL: 'fptoui_float_i8'			; BTVER2-LABEL: 'fptoui_float_i8'
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8			; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>			; BTVER2-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
	; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I8 = fptoui float undef to i8			%I8 = fptoui float undef to i8
	%V4I8 = fptoui <4 x float> undef to <4 x i8>			%V4I8 = fptoui <4 x float> undef to <4 x i8>
	%V8I8 = fptoui <8 x float> undef to <8 x i8>			%V8I8 = fptoui <8 x float> undef to <8 x i8>
	%V16I8 = fptoui <16 x float> undef to <16 x i8>			%V16I8 = fptoui <16 x float> undef to <16 x i8>
	ret i32 undef			ret i32 undef
	}			}

llvm/test/Analysis/CostModel/X86/shuffle-extract_subvector.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+sse2 \| FileCheck %s -check-prefixes=CHECK,SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+sse2 \| FileCheck %s -check-prefixes=CHECK,SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+ssse3 \| FileCheck %s -check-prefixes=CHECK,SSE,SSSE3		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+ssse3 \| FileCheck %s -check-prefixes=CHECK,SSE,SSSE3
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s -check-prefixes=CHECK,SSE,SSE42		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s -check-prefixes=CHECK,SSE,SSE42
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx \| FileCheck %s -check-prefixes=CHECK,AVX,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx \| FileCheck %s -check-prefixes=CHECK,AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx2 \| FileCheck %s -check-prefixes=CHECK,AVX,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx2 \| FileCheck %s -check-prefixes=CHECK,AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx512f,+avx512bw,+avx512vbmi \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mattr=+avx512f,+avx512bw,+avx512vbmi \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW
;		;
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42,SLM
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42,GLM
; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2		; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2

;		;
; Verify the cost model for extract_subector style shuffles.		; Verify the cost model for extract_subector style shuffles.
;		;

define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {		define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
; SSE-LABEL: 'test_vXf64'		; SSE-LABEL: 'test_vXf64'
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	;
%V512_89AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>		%V512_89AB = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
%V512_CDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>		%V512_CDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
%V512_01234567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%V512_01234567 = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%V512_89ABCDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%V512_89ABCDEF = shufflevector <16 x i32> %src512, <16 x i32> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret void		ret void
}		}

define void @test_vXi16(<4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {		define void @test_vXi16(<4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
; SSE-LABEL: 'test_vXi16'		; SSE2-LABEL: 'test_vXi16'
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 2, i32 3>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 2, i32 3>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_45 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 4, i32 5>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_45 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_67 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_67 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 10, i32 11>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 10, i32 11>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CD = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 12, i32 13>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CD = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 12, i32 13>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_EF = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 14, i32 15>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_EF = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 14, i32 15>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_2345 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_2345 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_4567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_4567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_6789 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_6789 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01234567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01234567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89ABCDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89ABCDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 0, i32 1>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 0, i32 1>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 2, i32 3>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 2, i32 3>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 4, i32 5>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 8, i32 9>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 8, i32 9>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 10, i32 11>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 10, i32 11>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 12, i32 13>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 12, i32 13>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 14, i32 15>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 14, i32 15>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 16, i32 17>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 16, i32 17>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 18, i32 19>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 18, i32 19>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 20, i32 21>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 20, i32 21>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 22, i32 23>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 22, i32 23>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 24, i32 25>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 24, i32 25>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 26, i32 27>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 26, i32 27>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 28, i32 29>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 28, i32 29>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 30, i32 31>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 30, i32 31>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_02_03_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_02_03_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_06_07_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_06_07_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
		;
		; SSSE3-LABEL: 'test_vXi16'
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_45 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 4, i32 5>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_67 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 10, i32 11>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CD = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 12, i32 13>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_EF = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 14, i32 15>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_2345 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_4567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_6789 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01234567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89ABCDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 4, i32 5>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 8, i32 9>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 10, i32 11>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 12, i32 13>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 14, i32 15>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 16, i32 17>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 18, i32 19>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 20, i32 21>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 22, i32 23>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 24, i32 25>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 26, i32 27>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 28, i32 29>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 30, i32 31>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_02_03_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_06_07_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; AVX-LABEL: 'test_vXi16'		; AVX-LABEL: 'test_vXi16'
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
		; SLM-LABEL: 'test_vXi16'
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_45 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_67 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CD = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 12, i32 13>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_EF = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V256_2345 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_4567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V256_6789 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01234567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89ABCDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 12, i32 13>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 16, i32 17>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 18, i32 19>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 20, i32 21>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 24, i32 25>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 26, i32 27>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 28, i32 29>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V512_02_03_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V512_06_07_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
		;
		; GLM-LABEL: 'test_vXi16'
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_23 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_45 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_67 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89 = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CD = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 12, i32 13>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_EF = shufflevector <16 x i16> %src256, <16 x i16> undef, <2 x i32> <i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_0123 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_2345 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_4567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_6789 = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89AB = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_CDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_01234567 = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_89ABCDEF = shufflevector <16 x i16> %src256, <16 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 12, i32 13>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 16, i32 17>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 18, i32 19>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 20, i32 21>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19 = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 24, i32 25>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 26, i32 27>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 28, i32 29>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <2 x i32> <i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_02_03_04_05 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512_06_07_08_09 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i16> %src512, <32 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
		;
; BTVER2-LABEL: 'test_vXi16'		; BTVER2-LABEL: 'test_vXi16'
; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <4 x i16> %src64, <4 x i16> undef, <2 x i32> <i32 2, i32 3>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 0, i32 1>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 2, i32 3>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 4, i32 5>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <8 x i16> %src128, <8 x i16> undef, <2 x i32> <i32 6, i32 7>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <8 x i16> %src128, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test_vXi8'
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 0, i32 1>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_45 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 4, i32 5>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_67 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_0123 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_4567 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 4, i32 5>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 8, i32 9>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_AB = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 10, i32 11>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_CD = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 12, i32 13>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_EF = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V128_2345 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V128_6789 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89AB = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_CDEF = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01234567 = shufflevector <16 x i8> %src128, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89ABCDEF = shufflevector <16 x i8> %src128, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 0, i32 1>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_02_03 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_04_05 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 4, i32 5>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 8, i32 9>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0A_0B = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 10, i32 11>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0C_0D = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 12, i32 13>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 16, i32 17>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_12_13 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 18, i32 19>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_14_15 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 20, i32 21>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 22, i32 23>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 24, i32 25>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1A_1B = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 26, i32 27>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1C_1D = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 28, i32 29>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_02_03_04_05 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_04_05_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_06_07_08_09 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09_0A_0B = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_14_15_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19_1A_1B = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03_04_05_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13_14_15_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 0, i32 1>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 4, i32 5>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 8, i32 9>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 10, i32 11>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 12, i32 13>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 16, i32 17>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 18, i32 19>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 20, i32 21>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 22, i32 23>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 24, i32 25>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 26, i32 27>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 28, i32 29>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 32, i32 33>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_22_23 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 34, i32 35>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_24_25 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 36, i32 37>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 38, i32 39>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 40, i32 41>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2A_2B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 42, i32 43>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2C_2D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 44, i32 45>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 46, i32 47>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 48, i32 49>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_32_33 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 50, i32 51>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_34_35 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 52, i32 53>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 54, i32 55>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 56, i32 57>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3A_3B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 58, i32 59>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3C_3D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 60, i32 61>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 62, i32 63>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 32, i32 33, i32 34, i32 35>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_24_25_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 36, i32 37, i32 38, i32 39>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29_2A_2B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 40, i32 41, i32 42, i32 43>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 44, i32 45, i32 46, i32 47>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 48, i32 49, i32 50, i32 51>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_34_35_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 52, i32 53, i32 54, i32 55>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39_3A_3B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 56, i32 57, i32 58, i32 59>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 60, i32 61, i32 62, i32 63>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55>
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; AVX-LABEL: 'test_vXi8'		; AVX-LABEL: 'test_vXi8'
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 0, i32 1>		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 0, i32 1>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 2, i32 3>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 2, i32 3>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_45 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 4, i32 5>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_45 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 4, i32 5>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_67 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 6, i32 7>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_67 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 6, i32 7>
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_0123 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_0123 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_4567 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_4567 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512BW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
		; SLM-LABEL: 'test_vXi8'
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_45 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_67 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_0123 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_4567 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_AB = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_CD = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 12, i32 13>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_EF = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V128_2345 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V128_6789 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89AB = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_CDEF = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01234567 = shufflevector <16 x i8> %src128, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89ABCDEF = shufflevector <16 x i8> %src128, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_02_03 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_04_05 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0A_0B = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0C_0D = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 12, i32 13>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 16, i32 17>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_12_13 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 18, i32 19>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_14_15 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 20, i32 21>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 24, i32 25>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1A_1B = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 26, i32 27>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1C_1D = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 28, i32 29>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V256_02_03_04_05 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_04_05_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V256_06_07_08_09 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09_0A_0B = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_14_15_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19_1A_1B = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03_04_05_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13_14_15_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 0, i32 1>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 4, i32 5>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 8, i32 9>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 12, i32 13>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 16, i32 17>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 18, i32 19>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 20, i32 21>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 24, i32 25>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 26, i32 27>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 28, i32 29>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 32, i32 33>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_22_23 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 34, i32 35>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_24_25 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 36, i32 37>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 38, i32 39>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 40, i32 41>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2A_2B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 42, i32 43>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2C_2D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 44, i32 45>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 46, i32 47>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 48, i32 49>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_32_33 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 50, i32 51>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_34_35 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 52, i32 53>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 54, i32 55>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 56, i32 57>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3A_3B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 58, i32 59>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3C_3D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 60, i32 61>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 62, i32 63>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 32, i32 33, i32 34, i32 35>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_24_25_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 36, i32 37, i32 38, i32 39>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29_2A_2B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 40, i32 41, i32 42, i32 43>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 44, i32 45, i32 46, i32 47>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 48, i32 49, i32 50, i32 51>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_34_35_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 52, i32 53, i32 54, i32 55>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39_3A_3B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 56, i32 57, i32 58, i32 59>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 60, i32 61, i32 62, i32 63>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55>
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
		;
		; GLM-LABEL: 'test_vXi8'
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_45 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_67 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_0123 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_4567 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_23 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_45 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_67 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_AB = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_CD = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 12, i32 13>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_EF = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_0123 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V128_2345 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_4567 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V128_6789 = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89AB = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_CDEF = shufflevector <16 x i8> %src128, <16 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01234567 = shufflevector <16 x i8> %src128, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128_89ABCDEF = shufflevector <16 x i8> %src128, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_02_03 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_04_05 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0A_0B = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0C_0D = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 12, i32 13>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 16, i32 17>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_12_13 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 18, i32 19>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_14_15 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 20, i32 21>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19 = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 24, i32 25>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1A_1B = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 26, i32 27>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1C_1D = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 28, i32 29>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <2 x i32> <i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_02_03_04_05 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_04_05_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256_06_07_08_09 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 6, i32 7, i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09_0A_0B = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_14_15_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19_1A_1B = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03_04_05_06_07 = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13_14_15_16_17 = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <32 x i8> %src256, <32 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <32 x i8> %src256, <32 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 0, i32 1>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_02_03 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 4, i32 5>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 8, i32 9>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0A_0B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 12, i32 13>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 16, i32 17>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_12_13 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 18, i32 19>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 20, i32 21>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 24, i32 25>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1A_1B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 26, i32 27>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 28, i32 29>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 32, i32 33>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_22_23 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 34, i32 35>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_24_25 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 36, i32 37>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 38, i32 39>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 40, i32 41>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2A_2B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 42, i32 43>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2C_2D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 44, i32 45>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 46, i32 47>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 48, i32 49>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_32_33 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 50, i32 51>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_34_35 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 52, i32 53>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 54, i32 55>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39 = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 56, i32 57>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3A_3B = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 58, i32 59>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3C_3D = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 60, i32 61>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <2 x i32> <i32 62, i32 63>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_04_05_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_14_15_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 32, i32 33, i32 34, i32 35>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_24_25_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 36, i32 37, i32 38, i32 39>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29_2A_2B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 40, i32 41, i32 42, i32 43>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 44, i32 45, i32 46, i32 47>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 48, i32 49, i32 50, i32 51>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_34_35_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 52, i32 53, i32 54, i32 55>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39_3A_3B = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 56, i32 57, i32 58, i32 59>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <4 x i32> <i32 60, i32 61, i32 62, i32 63>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37 = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55>
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <8 x i32> <i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <16 x i32> <i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_00_01_02_03_04_05_06_07_08_09_0A_0B_0C_0D_0E_0F_10_11_12_13_14_15_16_17_18_19_1A_1B_1C_1D_1E_1F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512_20_21_22_23_24_25_26_27_28_29_2A_2B_2C_2D_2E_2F_30_31_32_33_34_35_36_37_38_39_3A_3B_3C_3D_3E_3F = shufflevector <64 x i8> %src512, <64 x i8> undef, <32 x i32> <i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
		;
; BTVER2-LABEL: 'test_vXi8'		; BTVER2-LABEL: 'test_vXi8'
; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 0, i32 1>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_01 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 0, i32 1>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 2, i32 3>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_23 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 2, i32 3>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_45 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 4, i32 5>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_45 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 4, i32 5>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_67 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 6, i32 7>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_67 = shufflevector <8 x i8> %src64, <8 x i8> undef, <2 x i32> <i32 6, i32 7>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_0123 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64_0123 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_4567 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64_4567 = shufflevector <8 x i8> %src64, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 0, i32 1>		; BTVER2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128_01 = shufflevector <16 x i8> %src128, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/vector-extract.ll

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse3 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE3		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse3 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE3
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+ssse3 \| FileCheck %s --check-prefixes=CHECK,SSE,SSSE3		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+ssse3 \| FileCheck %s --check-prefixes=CHECK,SSE,SSSE3
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse4.1 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE41		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse4.1 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE41
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+sse4.2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW
;		;
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mcpu=slm \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42,SLM
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mcpu=goldmont \| FileCheck %s --check-prefixes=CHECK,SSE,SSE42,GLM
; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2		; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -cost-model -analyze -mcpu=btver2 \| FileCheck %s --check-prefixes=BTVER2

define i32 @extract_double(i32 %arg) {		define i32 @extract_double(i32 %arg) {
; SSE-LABEL: 'extract_double'		; SSE-LABEL: 'extract_double'
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f64_a = extractelement <2 x double> undef, i32 %arg		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f64_a = extractelement <2 x double> undef, i32 %arg
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v2f64_0 = extractelement <2 x double> undef, i32 0		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v2f64_0 = extractelement <2 x double> undef, i32 0
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f64_1 = extractelement <2 x double> undef, i32 1		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f64_1 = extractelement <2 x double> undef, i32 1
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f64_a = extractelement <4 x double> undef, i32 %arg		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f64_a = extractelement <4 x double> undef, i32 %arg
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	;
%v16f32_3 = extractelement <16 x float> undef, i32 3		%v16f32_3 = extractelement <16 x float> undef, i32 3
%v16f32_8 = extractelement <16 x float> undef, i32 8		%v16f32_8 = extractelement <16 x float> undef, i32 8
%v16f32_15 = extractelement <16 x float> undef, i32 15		%v16f32_15 = extractelement <16 x float> undef, i32 15

ret i32 undef		ret i32 undef
}		}

define i32 @extract_i64(i32 %arg) {		define i32 @extract_i64(i32 %arg) {
; CHECK-LABEL: 'extract_i64'		; SSE2-LABEL: 'extract_i64'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE3-LABEL: 'extract_i64'
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
		; SSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSSE3-LABEL: 'extract_i64'
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE41-LABEL: 'extract_i64'
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
		; SSE41-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX-LABEL: 'extract_i64'
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX512-LABEL: 'extract_i64'
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SLM-LABEL: 'extract_i64'
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
		; SLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
		; SLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; GLM-LABEL: 'extract_i64'
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_a = extractelement <8 x i64> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_0 = extractelement <8 x i64> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_3 = extractelement <8 x i64> undef, i32 3
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_4 = extractelement <8 x i64> undef, i32 4
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i64_7 = extractelement <8 x i64> undef, i32 7
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; BTVER2-LABEL: 'extract_i64'		; BTVER2-LABEL: 'extract_i64'
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_a = extractelement <2 x i64> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_0 = extractelement <2 x i64> undef, i32 0
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64_1 = extractelement <2 x i64> undef, i32 1
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_a = extractelement <4 x i64> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_0 = extractelement <4 x i64> undef, i32 0
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i64_3 = extractelement <4 x i64> undef, i32 3
Show All 17 Lines	;
%v8i64_3 = extractelement <8 x i64> undef, i32 3		%v8i64_3 = extractelement <8 x i64> undef, i32 3
%v8i64_4 = extractelement <8 x i64> undef, i32 4		%v8i64_4 = extractelement <8 x i64> undef, i32 4
%v8i64_7 = extractelement <8 x i64> undef, i32 7		%v8i64_7 = extractelement <8 x i64> undef, i32 7

ret i32 undef		ret i32 undef
}		}

define i32 @extract_i32(i32 %arg) {		define i32 @extract_i32(i32 %arg) {
; CHECK-LABEL: 'extract_i32'		; SSE2-LABEL: 'extract_i32'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE3-LABEL: 'extract_i32'
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
		; SSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSSE3-LABEL: 'extract_i32'
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE41-LABEL: 'extract_i32'
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
		; SSE41-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX-LABEL: 'extract_i32'
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX512-LABEL: 'extract_i32'
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SLM-LABEL: 'extract_i32'
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; GLM-LABEL: 'extract_i32'
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_a = extractelement <8 x i32> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_0 = extractelement <8 x i32> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_3 = extractelement <8 x i32> undef, i32 3
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_4 = extractelement <8 x i32> undef, i32 4
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_7 = extractelement <8 x i32> undef, i32 7
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_a = extractelement <16 x i32> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_0 = extractelement <16 x i32> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_3 = extractelement <16 x i32> undef, i32 3
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_8 = extractelement <16 x i32> undef, i32 8
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_15 = extractelement <16 x i32> undef, i32 15
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; BTVER2-LABEL: 'extract_i32'		; BTVER2-LABEL: 'extract_i32'
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_a = extractelement <2 x i32> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_0 = extractelement <2 x i32> undef, i32 0
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32_1 = extractelement <2 x i32> undef, i32 1
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_a = extractelement <4 x i32> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_0 = extractelement <4 x i32> undef, i32 0
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_3 = extractelement <4 x i32> undef, i32 3
Show All 28 Lines	;
%v16i32_3 = extractelement <16 x i32> undef, i32 3		%v16i32_3 = extractelement <16 x i32> undef, i32 3
%v16i32_8 = extractelement <16 x i32> undef, i32 8		%v16i32_8 = extractelement <16 x i32> undef, i32 8
%v16i32_15 = extractelement <16 x i32> undef, i32 15		%v16i32_15 = extractelement <16 x i32> undef, i32 15

ret i32 undef		ret i32 undef
}		}

define i32 @extract_i16(i32 %arg) {		define i32 @extract_i16(i32 %arg) {
; CHECK-LABEL: 'extract_i16'		; SSE2-LABEL: 'extract_i16'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE3-LABEL: 'extract_i16'
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
		; SSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSSE3-LABEL: 'extract_i16'
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE41-LABEL: 'extract_i16'
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
		; SSE41-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX-LABEL: 'extract_i16'
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX512-LABEL: 'extract_i16'
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SLM-LABEL: 'extract_i16'
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; GLM-LABEL: 'extract_i16'
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_8 = extractelement <16 x i16> undef, i32 8
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_15 = extractelement <16 x i16> undef, i32 15
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_a = extractelement <32 x i16> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_0 = extractelement <32 x i16> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_7 = extractelement <32 x i16> undef, i32 7
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_8 = extractelement <32 x i16> undef, i32 8
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_15 = extractelement <32 x i16> undef, i32 15
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_16 = extractelement <32 x i16> undef, i32 16
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_24 = extractelement <32 x i16> undef, i32 24
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i16_31 = extractelement <32 x i16> undef, i32 31
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; BTVER2-LABEL: 'extract_i16'		; BTVER2-LABEL: 'extract_i16'
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_a = extractelement <8 x i16> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_0 = extractelement <8 x i16> undef, i32 0
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_7 = extractelement <8 x i16> undef, i32 7
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_a = extractelement <16 x i16> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_0 = extractelement <16 x i16> undef, i32 0
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_7 = extractelement <16 x i16> undef, i32 7
Show All 27 Lines	;
%v32i16_16 = extractelement <32 x i16> undef, i32 16		%v32i16_16 = extractelement <32 x i16> undef, i32 16
%v32i16_24 = extractelement <32 x i16> undef, i32 24		%v32i16_24 = extractelement <32 x i16> undef, i32 24
%v32i16_31 = extractelement <32 x i16> undef, i32 31		%v32i16_31 = extractelement <32 x i16> undef, i32 31

ret i32 undef		ret i32 undef
}		}

define i32 @extract_i8(i32 %arg) {		define i32 @extract_i8(i32 %arg) {
; CHECK-LABEL: 'extract_i8'		; SSE2-LABEL: 'extract_i8'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE3-LABEL: 'extract_i8'
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
		; SSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
		; SSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSSE3-LABEL: 'extract_i8'
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SSE41-LABEL: 'extract_i8'
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
		; SSE41-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
		; SSE41-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX-LABEL: 'extract_i8'
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
		; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; AVX512-LABEL: 'extract_i8'
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; SLM-LABEL: 'extract_i8'
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
		; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
		;
		; GLM-LABEL: 'extract_i8'
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_7 = extractelement <32 x i8> undef, i32 7
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_8 = extractelement <32 x i8> undef, i32 8
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_15 = extractelement <32 x i8> undef, i32 15
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_24 = extractelement <32 x i8> undef, i32 24
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_31 = extractelement <32 x i8> undef, i32 31
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_a = extractelement <64 x i8> undef, i32 %arg
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_0 = extractelement <64 x i8> undef, i32 0
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_7 = extractelement <64 x i8> undef, i32 7
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_8 = extractelement <64 x i8> undef, i32 8
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_15 = extractelement <64 x i8> undef, i32 15
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_24 = extractelement <64 x i8> undef, i32 24
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_31 = extractelement <64 x i8> undef, i32 31
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_32 = extractelement <64 x i8> undef, i32 32
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_48 = extractelement <64 x i8> undef, i32 48
		; GLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64i8_63 = extractelement <64 x i8> undef, i32 63
		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; BTVER2-LABEL: 'extract_i8'		; BTVER2-LABEL: 'extract_i8'
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_a = extractelement <16 x i8> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_0 = extractelement <16 x i8> undef, i32 0
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_8 = extractelement <16 x i8> undef, i32 8
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_15 = extractelement <16 x i8> undef, i32 15
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_a = extractelement <32 x i8> undef, i32 %arg
; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0		; BTVER2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i8_0 = extractelement <32 x i8> undef, i32 0
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/interleaving.ll

	; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine < %s \| FileCheck %s --check-prefix=NORMAL			; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine < %s \| FileCheck %s --check-prefix=NORMAL
	; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s \| FileCheck %s --check-prefix=NORMAL			; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=slm < %s \| FileCheck %s --check-prefix=SLOW
	; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s \| FileCheck %s --check-prefix=ATOM			; RUN: opt -S -mtriple=x86_64-pc_linux -loop-vectorize -instcombine -mcpu=atom < %s \| FileCheck %s --check-prefix=SLOW

	; NORMAL-LABEL: foo			; NORMAL-LABEL: foo
	; NORMAL: %[[WIDE:.]] = load <8 x i32>, <8 x i32> %{{.*}}, align 4			; NORMAL: %[[WIDE:.]] = load <8 x i32>, <8 x i32> %{{.*}}, align 4
	; NORMAL: %[[STRIDED1:.*]] = shufflevector <8 x i32> %[[WIDE]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			; NORMAL: %[[STRIDED1:.*]] = shufflevector <8 x i32> %[[WIDE]], <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; NORMAL: %[[STRIDED2:.*]] = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			; NORMAL: %[[STRIDED2:.*]] = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; NORMAL: add nsw <4 x i32> %[[STRIDED2]], %[[STRIDED1]]			; NORMAL: add nsw <4 x i32> %[[STRIDED2]], %[[STRIDED1]]

	; ATOM-LABEL: foo			; SLOW-LABEL: foo
	; ATOM: load i32			; SLOW: load i32
	; ATOM: load i32			; SLOW: load i32
	; ATOM: store i32			; SLOW: store i32
	define void @foo(i32* noalias nocapture %a, i32* noalias nocapture readonly %b) {			define void @foo(i32* noalias nocapture %a, i32* noalias nocapture readonly %b) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body			for.cond.cleanup: ; preds = %for.body
	ret void			ret void

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

Show All 29 Lines
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]		; SSE-NEXT: ret <8 x float> [[R7]]
;		;
; SLM-LABEL: @sitofp_uitofp(		; SLM-LABEL: @sitofp_uitofp(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; SLM-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SLM-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float
; SLM-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; SLM-NEXT: [[AB2:%.*]] = sitofp i32 [[A2]] to float
; SLM-NEXT: [[AB3:%.*]] = sitofp i32 [[A3]] to float
; SLM-NEXT: [[AB4:%.*]] = uitofp i32 [[A4]] to float
; SLM-NEXT: [[AB5:%.*]] = uitofp i32 [[A5]] to float
; SLM-NEXT: [[AB6:%.*]] = uitofp i32 [[A6]] to float
; SLM-NEXT: [[AB7:%.*]] = uitofp i32 [[A7]] to float
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SLM-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[R7]]
;		;
; AVX-LABEL: @sitofp_uitofp(		; AVX-LABEL: @sitofp_uitofp(
; AVX-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; AVX-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX-NEXT: ret <8 x float> [[R7]]		; AVX-NEXT: ret <8 x float> [[R7]]
;		;
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	;
%r4 = insertelement <8 x float> %r3, float %ac4, i32 4		%r4 = insertelement <8 x float> %r3, float %ac4, i32 4
%r5 = insertelement <8 x float> %r4, float %ac5, i32 5		%r5 = insertelement <8 x float> %r4, float %ac5, i32 5
%r6 = insertelement <8 x float> %r5, float %ac6, i32 6		%r6 = insertelement <8 x float> %r5, float %ac6, i32 6
%r7 = insertelement <8 x float> %r6, float %ac7, i32 7		%r7 = insertelement <8 x float> %r6, float %ac7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x i32> @sext_zext(<8 x i16> %a) {		define <8 x i32> @sext_zext(<8 x i16> %a) {
; CHECK-LABEL: @sext_zext(		; SSE-LABEL: @sext_zext(
; CHECK-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>		; SSE-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>
; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>		; SSE-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
		;
		; SLM-LABEL: @sext_zext(
		; SLM-NEXT: [[A0:%.]] = extractelement <8 x i16> [[A:%.]], i32 0
		; SLM-NEXT: [[A1:%.*]] = extractelement <8 x i16> [[A]], i32 1
		; SLM-NEXT: [[A2:%.*]] = extractelement <8 x i16> [[A]], i32 2
		; SLM-NEXT: [[A3:%.*]] = extractelement <8 x i16> [[A]], i32 3
		; SLM-NEXT: [[A4:%.*]] = extractelement <8 x i16> [[A]], i32 4
		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i16> [[A]], i32 5
		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i16> [[A]], i32 6
		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i16> [[A]], i32 7
		; SLM-NEXT: [[AB0:%.*]] = sext i16 [[A0]] to i32
		; SLM-NEXT: [[AB1:%.*]] = sext i16 [[A1]] to i32
		; SLM-NEXT: [[AB2:%.*]] = sext i16 [[A2]] to i32
		; SLM-NEXT: [[AB3:%.*]] = sext i16 [[A3]] to i32
		; SLM-NEXT: [[AB4:%.*]] = zext i16 [[A4]] to i32
		; SLM-NEXT: [[AB5:%.*]] = zext i16 [[A5]] to i32
		; SLM-NEXT: [[AB6:%.*]] = zext i16 [[A6]] to i32
		; SLM-NEXT: [[AB7:%.*]] = zext i16 [[A7]] to i32
		; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
		; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
		; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
		; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX-LABEL: @sext_zext(
		; AVX-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>
		; AVX-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>
		; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; AVX-NEXT: ret <8 x i32> [[R7]]
		;
		; AVX512-LABEL: @sext_zext(
		; AVX512-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>
		; AVX512-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>
		; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
%a0 = extractelement <8 x i16> %a, i32 0		%a0 = extractelement <8 x i16> %a, i32 0
%a1 = extractelement <8 x i16> %a, i32 1		%a1 = extractelement <8 x i16> %a, i32 1
%a2 = extractelement <8 x i16> %a, i32 2		%a2 = extractelement <8 x i16> %a, i32 2
%a3 = extractelement <8 x i16> %a, i32 3		%a3 = extractelement <8 x i16> %a, i32 3
%a4 = extractelement <8 x i16> %a, i32 4		%a4 = extractelement <8 x i16> %a, i32 4
%a5 = extractelement <8 x i16> %a, i32 5		%a5 = extractelement <8 x i16> %a, i32 5
%a6 = extractelement <8 x i16> %a, i32 6		%a6 = extractelement <8 x i16> %a, i32 6
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]		; SSE-NEXT: ret <8 x float> [[R7]]
;		;
; SLM-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; SLM-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1		; SLM-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; SLM-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0		; SLM-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; SLM-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1		; SLM-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; SLM-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float		; SLM-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; SLM-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float		; SLM-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; SLM-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float
; SLM-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float
; SLM-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float		; SLM-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; SLM-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float		; SLM-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; SLM-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float		; SLM-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; SLM-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float		; SLM-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0		; SLM-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1		; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP3]], i32 0
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2		; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3		; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP4]], i32 1
		; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
		; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP5]], i32 2
		; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
		; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP6]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SLM-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[R7]]
;		;
; AVX-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; AVX-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; AVX-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0		; AVX-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
%r0 = insertelement <4 x i32> undef, i32 %ab0, i32 0		%r0 = insertelement <4 x i32> undef, i32 %ab0, i32 0
%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1		%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1
%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2		%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2
%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3		%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3
ret <4 x i32> %r3		ret <4 x i32> %r3
}		}

define <4 x i32> @add_mul_v4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @add_mul_v4i32(<4 x i32> %a, <4 x i32> %b) {
; SSE-LABEL: @add_mul_v4i32(		; CHECK-LABEL: @add_mul_v4i32(
; SSE-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]
; SSE-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; SSE-NEXT: ret <4 x i32> [[R3]]		; CHECK-NEXT: ret <4 x i32> [[R3]]
;
; SLM-LABEL: @add_mul_v4i32(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x i32> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <4 x i32> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <4 x i32> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <4 x i32> [[B]], i32 3
; SLM-NEXT: [[AB0:%.*]] = mul i32 [[A0]], [[B0]]
; SLM-NEXT: [[AB1:%.*]] = add i32 [[A1]], [[B1]]
; SLM-NEXT: [[AB2:%.*]] = add i32 [[A2]], [[B2]]
; SLM-NEXT: [[AB3:%.*]] = mul i32 [[A3]], [[B3]]
; SLM-NEXT: [[R0:%.*]] = insertelement <4 x i32> undef, i32 [[AB0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <4 x i32> [[R0]], i32 [[AB1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x i32> [[R1]], i32 [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: ret <4 x i32> [[R3]]
;
; AVX-LABEL: @add_mul_v4i32(
; AVX-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]
; AVX-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; AVX-NEXT: ret <4 x i32> [[R3]]
;
; AVX512-LABEL: @add_mul_v4i32(
; AVX512-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]
; AVX512-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; AVX512-NEXT: ret <4 x i32> [[R3]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <4 x i32> %b, i32 0		%b0 = extractelement <4 x i32> %b, i32 0
%b1 = extractelement <4 x i32> %b, i32 1		%b1 = extractelement <4 x i32> %b, i32 1
%b2 = extractelement <4 x i32> %b, i32 2		%b2 = extractelement <4 x i32> %b, i32 2
▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	;
%r00 = insertelement <4 x float> undef, float %r0, i32 0		%r00 = insertelement <4 x float> undef, float %r0, i32 0
%r01 = insertelement <4 x float> %r00, float %r1, i32 1		%r01 = insertelement <4 x float> %r00, float %r1, i32 1
%r02 = insertelement <4 x float> %r01, float %r2, i32 2		%r02 = insertelement <4 x float> %r01, float %r2, i32 2
%r03 = insertelement <4 x float> %r02, float %r3, i32 3		%r03 = insertelement <4 x float> %r02, float %r3, i32 3
ret <4 x float> %r03		ret <4 x float> %r03
}		}

define <2 x i64> @test_v2i64(<2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @test_v2i64(<2 x i64> %a, <2 x i64> %b) {
; SSE-LABEL: @test_v2i64(		; CHECK-LABEL: @test_v2i64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <2 x i64> [[TMP3]]		; CHECK-NEXT: ret <2 x i64> [[TMP3]]
;
; SLM-LABEL: @test_v2i64(
; SLM-NEXT: [[A0:%.]] = extractelement <2 x i64> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <2 x i64> [[A]], i32 1
; SLM-NEXT: [[B0:%.]] = extractelement <2 x i64> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <2 x i64> [[B]], i32 1
; SLM-NEXT: [[R0:%.*]] = add i64 [[A0]], [[A1]]
; SLM-NEXT: [[R1:%.*]] = add i64 [[B0]], [[B1]]
; SLM-NEXT: [[R00:%.*]] = insertelement <2 x i64> undef, i64 [[R0]], i32 0
; SLM-NEXT: [[R01:%.*]] = insertelement <2 x i64> [[R00]], i64 [[R1]], i32 1
; SLM-NEXT: ret <2 x i64> [[R01]]
;
; AVX-LABEL: @test_v2i64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>
; AVX-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <2 x i64> [[TMP3]]
;
; AVX512-LABEL: @test_v2i64(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <2 x i64> [[TMP3]]
;		;
%a0 = extractelement <2 x i64> %a, i32 0		%a0 = extractelement <2 x i64> %a, i32 0
%a1 = extractelement <2 x i64> %a, i32 1		%a1 = extractelement <2 x i64> %a, i32 1
%b0 = extractelement <2 x i64> %b, i32 0		%b0 = extractelement <2 x i64> %b, i32 0
%b1 = extractelement <2 x i64> %b, i32 1		%b1 = extractelement <2 x i64> %b, i32 1
%r0 = add i64 %a0, %a1		%r0 = add i64 %a0, %a1
%r1 = add i64 %b0, %b1		%r1 = add i64 %b0, %b1
%r00 = insertelement <2 x i64> undef, i64 %r0, i32 0		%r00 = insertelement <2 x i64> undef, i64 %r0, i32 0
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: ret <4 x i64> [[R03]]		; SSE-NEXT: ret <4 x i64> [[R03]]
;		;
; SLM-LABEL: @test_v4i64(		; SLM-LABEL: @test_v4i64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; SLM-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: ret <4 x i64> [[TMP3]]
		craig.topperUnsubmitted Not Done Reply Inline Actions I'm not sure I understand what's happening here. SLM doesn't have 256-bit vectors. Is this going to codegen well? craig.topper: I'm not sure I understand what's happening here. SLM doesn't have 256-bit vectors. Is this…
		RKSimonUnsubmitted Not Done Reply Inline Actions Probably the cost model type legalization has kicked in. It maybe that its not handling EXTRACT_SUBVECTOR shuffle costs or something so it ends up scalarizing? RKSimon: Probably the cost model type legalization has kicked in. It maybe that its not handling…
		spatelAuthorUnsubmitted Done Reply Inline Actions I didn't step through SLP, but I agree this is suspicious. But then we end up with virtually identical asm before and after this change: movdqa %xmm0, %xmm4 movdqa %xmm1, %xmm5 punpckhqdq %xmm2, %xmm0 # xmm0 = xmm0[1],xmm2[1] punpckhqdq %xmm3, %xmm1 # xmm1 = xmm1[1],xmm3[1] punpcklqdq %xmm2, %xmm4 # xmm4 = xmm4[0],xmm2[0] punpcklqdq %xmm3, %xmm5 # xmm5 = xmm5[0],xmm3[0] paddq %xmm4, %xmm0 paddq %xmm5, %xmm1 spatel: I didn't step through SLP, but I agree this is suspicious. But then we end up with virtually…
		spatelAuthorUnsubmitted Done Reply Inline Actions I'm still not clear on exactly how SLP does its accounting, but debug output shows that when it used to evaluate the 4-wide vector ops, it saw this: SLP: Spill Cost = 0. SLP: Extract Cost = 4. SLP: Total Cost = 6. ...and decided that would not be profitable. But then it evaluates doing the ops as 2-wide (128-bit), it sees this: SLP: Spill Cost = 0. SLP: Extract Cost = 2. SLP: Total Cost = -1. SLP: Vectorizing list at cost:-5. So that's worth doing. With this patch, it now sees this at 4-wide: SLP: Spill Cost = 0. SLP: Extract Cost = 56. SLP: Total Cost = -40. SLP: Vectorizing list at cost:-44. This seems more truthful - the cost of extract on SLM is very large relative to the cost of vector ops. The cost model itself deals with illegal types (as here - 256-bit on a subtarget where that is not legal) by doing a simple scaling: see lines 2393, 2412 in the source code diff in this patch. spatel: I'm still not clear on exactly how SLP does its accounting, but debug output shows that when it…
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: ret <4 x i64> [[R03]]
;		;
; AVX-LABEL: @test_v4i64(		; AVX-LABEL: @test_v4i64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x i64> [[TMP3]]		; AVX-NEXT: ret <4 x i64> [[TMP3]]
;		;
; AVX512-LABEL: @test_v4i64(		; AVX512-LABEL: @test_v4i64(
Show All 28 Lines
; SSE-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R07]]		; SSE-NEXT: ret <8 x i32> [[R07]]
;		;
; SLM-LABEL: @test_v8i32(		; SLM-LABEL: @test_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: ret <8 x i32> [[TMP3]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R07]]
;		;
; AVX-LABEL: @test_v8i32(		; AVX-LABEL: @test_v8i32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x i32> [[TMP3]]		; AVX-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX512-LABEL: @test_v8i32(		; AVX512-LABEL: @test_v8i32(
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	;
%r00 = insertelement <4 x float> undef, float %r0, i32 0		%r00 = insertelement <4 x float> undef, float %r0, i32 0
%r01 = insertelement <4 x float> %r00, float %r1, i32 1		%r01 = insertelement <4 x float> %r00, float %r1, i32 1
%r02 = insertelement <4 x float> %r01, float %r2, i32 2		%r02 = insertelement <4 x float> %r01, float %r2, i32 2
%r03 = insertelement <4 x float> %r02, float %r3, i32 3		%r03 = insertelement <4 x float> %r02, float %r3, i32 3
ret <4 x float> %r03		ret <4 x float> %r03
}		}

define <2 x i64> @test_v2i64(<2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @test_v2i64(<2 x i64> %a, <2 x i64> %b) {
; SSE-LABEL: @test_v2i64(		; CHECK-LABEL: @test_v2i64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <2 x i64> [[TMP3]]		; CHECK-NEXT: ret <2 x i64> [[TMP3]]
;
; SLM-LABEL: @test_v2i64(
; SLM-NEXT: [[A0:%.]] = extractelement <2 x i64> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <2 x i64> [[A]], i32 1
; SLM-NEXT: [[B0:%.]] = extractelement <2 x i64> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <2 x i64> [[B]], i32 1
; SLM-NEXT: [[R0:%.*]] = sub i64 [[A0]], [[A1]]
; SLM-NEXT: [[R1:%.*]] = sub i64 [[B0]], [[B1]]
; SLM-NEXT: [[R00:%.*]] = insertelement <2 x i64> undef, i64 [[R0]], i32 0
; SLM-NEXT: [[R01:%.*]] = insertelement <2 x i64> [[R00]], i64 [[R1]], i32 1
; SLM-NEXT: ret <2 x i64> [[R01]]
;
; AVX-LABEL: @test_v2i64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>
; AVX-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <2 x i64> [[TMP3]]
;
; AVX512-LABEL: @test_v2i64(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[A:%.]], <2 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 2>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[B]], <2 x i32> <i32 1, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <2 x i64> [[TMP3]]
;		;
%a0 = extractelement <2 x i64> %a, i32 0		%a0 = extractelement <2 x i64> %a, i32 0
%a1 = extractelement <2 x i64> %a, i32 1		%a1 = extractelement <2 x i64> %a, i32 1
%b0 = extractelement <2 x i64> %b, i32 0		%b0 = extractelement <2 x i64> %b, i32 0
%b1 = extractelement <2 x i64> %b, i32 1		%b1 = extractelement <2 x i64> %b, i32 1
%r0 = sub i64 %a0, %a1		%r0 = sub i64 %a0, %a1
%r1 = sub i64 %b0, %b1		%r1 = sub i64 %b0, %b1
%r00 = insertelement <2 x i64> undef, i64 %r0, i32 0		%r00 = insertelement <2 x i64> undef, i64 %r0, i32 0
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: ret <4 x i64> [[R03]]		; SSE-NEXT: ret <4 x i64> [[R03]]
;		;
; SLM-LABEL: @test_v4i64(		; SLM-LABEL: @test_v4i64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; SLM-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: ret <4 x i64> [[TMP3]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: ret <4 x i64> [[R03]]
;		;
; AVX-LABEL: @test_v4i64(		; AVX-LABEL: @test_v4i64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x i64> [[TMP3]]		; AVX-NEXT: ret <4 x i64> [[TMP3]]
;		;
; AVX512-LABEL: @test_v4i64(		; AVX512-LABEL: @test_v4i64(
Show All 28 Lines
; SSE-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R07]]		; SSE-NEXT: ret <8 x i32> [[R07]]
;		;
; SLM-LABEL: @test_v8i32(		; SLM-LABEL: @test_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: ret <8 x i32> [[TMP3]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R07]]
;		;
; AVX-LABEL: @test_v8i32(		; AVX-LABEL: @test_v8i32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x i32> [[TMP3]]		; AVX-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX512-LABEL: @test_v8i32(		; AVX512-LABEL: @test_v8i32(
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sext.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512F		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512F
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512BW		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512BW

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE2-LABEL: @loadext_2i8_to_2i64(		; SSE-LABEL: @loadext_2i8_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE-NEXT: ret <2 x i64> [[V1]]
;
; SLM-LABEL: @loadext_2i8_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%x0 = sext i8 %i0 to i64		%x0 = sext i8 %i0 to i64
%x1 = sext i8 %i1 to i64		%x1 = sext i8 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {		define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i32(		; SSE-LABEL: @loadext_4i8_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SSE-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SSE-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32		; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32		; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SSE2-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32		; SSE-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SSE2-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32		; SSE-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]		; SSE-NEXT: ret <4 x i32> [[V3]]
;
; SLM-LABEL: @loadext_4i8_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i32(		; AVX-LABEL: @loadext_4i8_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
Show All 21 Lines	;
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i64(		; SSE-LABEL: @loadext_4i8_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SSE-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SSE-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE2-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64		; SSE-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64
; SSE2-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64		; SSE-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE-NEXT: ret <4 x i64> [[V3]]
;
; SLM-LABEL: @loadext_4i8_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i8_to_4i64(		; AVX1-LABEL: @loadext_4i8_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {		define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {
; CHECK-LABEL: @loadext_8i8_to_8i16(		; SSE2-LABEL: @loadext_8i8_to_8i16(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>		; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; CHECK-NEXT: ret <8 x i16> [[V7]]		; SSE2-NEXT: ret <8 x i16> [[V7]]
		;
		; SLM-LABEL: @loadext_8i8_to_8i16(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
		; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
		; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
		; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
		; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
		; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
		; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
		; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
		; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
		; SLM-NEXT: ret <8 x i16> [[V7]]
		;
		; AVX-LABEL: @loadext_8i8_to_8i16(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
		; AVX-NEXT: ret <8 x i16> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4		%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4
%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5		%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5
%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6		%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6
%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7		%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7
ret <8 x i16> %v7		ret <8 x i16> %v7
}		}

define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {		define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {
; CHECK-LABEL: @loadext_8i8_to_8i32(		; SSE2-LABEL: @loadext_8i8_to_8i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>		; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; CHECK-NEXT: ret <8 x i32> [[V7]]		; SSE2-NEXT: ret <8 x i32> [[V7]]
		;
		; SLM-LABEL: @loadext_8i8_to_8i32(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
		; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
		; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
		; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
		; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
		; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i32
		; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i32
		; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i32
		; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i32
		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[V7]]
		;
		; AVX-LABEL: @loadext_8i8_to_8i32(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
		; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4		%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4
%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5		%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5
%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6		%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6
%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7		%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {		define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {
; CHECK-LABEL: @loadext_16i8_to_16i16(		; SSE2-LABEL: @loadext_16i8_to_16i16(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; CHECK-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SSE2-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; CHECK-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SSE2-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; CHECK-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SSE2-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; CHECK-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; CHECK-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; CHECK-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; CHECK-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; CHECK-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; CHECK-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>		; SSE2-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8		; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; CHECK-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8		; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9		; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; CHECK-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9		; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10		; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; CHECK-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10		; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11		; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; CHECK-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11		; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12		; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; CHECK-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12		; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13		; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; CHECK-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13		; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14		; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; CHECK-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14		; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15		; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; CHECK-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15		; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; CHECK-NEXT: ret <16 x i16> [[V15]]		; SSE2-NEXT: ret <16 x i16> [[V15]]
		;
		; SLM-LABEL: @loadext_16i8_to_16i16(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
		; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
		; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
		; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
		; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
		; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
		; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
		; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
		; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
		; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
		; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
		; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
		; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
		; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
		; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
		; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
		; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
		; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
		; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
		; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
		; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
		; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
		; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
		; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
		; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
		; SLM-NEXT: [[X8:%.*]] = sext i8 [[I8]] to i16
		; SLM-NEXT: [[X9:%.*]] = sext i8 [[I9]] to i16
		; SLM-NEXT: [[X10:%.*]] = sext i8 [[I10]] to i16
		; SLM-NEXT: [[X11:%.*]] = sext i8 [[I11]] to i16
		; SLM-NEXT: [[X12:%.*]] = sext i8 [[I12]] to i16
		; SLM-NEXT: [[X13:%.*]] = sext i8 [[I13]] to i16
		; SLM-NEXT: [[X14:%.*]] = sext i8 [[I14]] to i16
		; SLM-NEXT: [[X15:%.*]] = sext i8 [[I15]] to i16
		; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
		; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
		; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
		; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
		; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
		; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
		; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
		; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
		; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
		; SLM-NEXT: ret <16 x i16> [[V15]]
		;
		; AVX-LABEL: @loadext_16i8_to_16i16(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
		; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
		; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
		; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
		; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
		; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
		; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
		; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
		; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
		; AVX-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
		; AVX-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
		; AVX-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
		; AVX-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
		; AVX-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
		; AVX-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
		; AVX-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
		; AVX-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
		; AVX-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
		; AVX-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
		; AVX-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
		; AVX-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
		; AVX-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
		; AVX-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
		; AVX-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
		; AVX-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
		; AVX-NEXT: ret <16 x i16> [[V15]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	;
ret <16 x i16> %v15		ret <16 x i16> %v15
}		}

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE2-LABEL: @loadext_2i16_to_2i64(		; SSE-LABEL: @loadext_2i16_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE-NEXT: ret <2 x i64> [[V1]]
;
; SLM-LABEL: @loadext_2i16_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {		define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
; CHECK-LABEL: @loadext_4i16_to_4i32(		; SSE2-LABEL: @loadext_4i16_to_4i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; CHECK-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; CHECK-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
		;
		; SLM-LABEL: @loadext_4i16_to_4i32(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
		; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
		; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
		; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
		; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
		; SLM-NEXT: ret <4 x i32> [[V3]]
		;
		; AVX-LABEL: @loadext_4i16_to_4i32(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
		; AVX-NEXT: ret <4 x i32> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = sext i16 %i0 to i32		%x0 = sext i16 %i0 to i32
%x1 = sext i16 %i1 to i32		%x1 = sext i16 %i1 to i32
%x2 = sext i16 %i2 to i32		%x2 = sext i16 %i2 to i32
%x3 = sext i16 %i3 to i32		%x3 = sext i16 %i3 to i32
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i64(		; SSE-LABEL: @loadext_4i16_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SSE-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SSE-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64
; SSE2-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64		; SSE-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64
; SSE2-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64		; SSE-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE-NEXT: ret <4 x i64> [[V3]]
;
; SLM-LABEL: @loadext_4i16_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i16_to_4i64(		; AVX1-LABEL: @loadext_4i16_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {		define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {
; CHECK-LABEL: @loadext_8i16_to_8i32(		; SSE2-LABEL: @loadext_8i16_to_8i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>		; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; CHECK-NEXT: ret <8 x i32> [[V7]]		; SSE2-NEXT: ret <8 x i32> [[V7]]
		;
		; SLM-LABEL: @loadext_8i16_to_8i32(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
		; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
		; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
		; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
		; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
		; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
		; SLM-NEXT: [[X4:%.*]] = sext i16 [[I4]] to i32
		; SLM-NEXT: [[X5:%.*]] = sext i16 [[I5]] to i32
		; SLM-NEXT: [[X6:%.*]] = sext i16 [[I6]] to i32
		; SLM-NEXT: [[X7:%.*]] = sext i16 [[I7]] to i32
		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[V7]]
		;
		; AVX-LABEL: @loadext_8i16_to_8i32(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
		; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
		; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%p4 = getelementptr inbounds i16, i16* %p0, i64 4		%p4 = getelementptr inbounds i16, i16* %p0, i64 4
%p5 = getelementptr inbounds i16, i16* %p0, i64 5		%p5 = getelementptr inbounds i16, i16* %p0, i64 5
%p6 = getelementptr inbounds i16, i16* %p0, i64 6		%p6 = getelementptr inbounds i16, i16* %p0, i64 6
%p7 = getelementptr inbounds i16, i16* %p0, i64 7		%p7 = getelementptr inbounds i16, i16* %p0, i64 7
Show All 24 Lines	;
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

;		;
; vXi32		; vXi32
;		;

define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {		define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
; SSE2-LABEL: @loadext_2i32_to_2i64(		; SSE-LABEL: @loadext_2i32_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE-NEXT: ret <2 x i64> [[V1]]
;
; SLM-LABEL: @loadext_2i32_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%x0 = sext i32 %i0 to i64		%x0 = sext i32 %i0 to i64
%x1 = sext i32 %i1 to i64		%x1 = sext i32 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE2-LABEL: @loadext_4i32_to_4i64(		; SSE-LABEL: @loadext_4i32_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
; SSE2-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE2-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64		; SSE-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64
; SSE2-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64		; SSE-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE-NEXT: ret <4 x i64> [[V3]]
;
; SLM-LABEL: @loadext_4i32_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i32_to_4i64(		; AVX1-LABEL: @loadext_4i32_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/zext.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512F		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512F
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512BW		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basicaa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512BW

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE2-LABEL: @loadext_2i8_to_2i64(		; SSE-LABEL: @loadext_2i8_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE-NEXT: ret <2 x i64> [[V1]]
;
; SLM-LABEL: @loadext_2i8_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%x0 = zext i8 %i0 to i64		%x0 = zext i8 %i0 to i64
%x1 = zext i8 %i1 to i64		%x1 = zext i8 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {		define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
; CHECK-LABEL: @loadext_4i8_to_4i32(		; SSE2-LABEL: @loadext_4i8_to_4i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; CHECK-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>		; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; CHECK-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
		;
		; SLM-LABEL: @loadext_4i8_to_4i32(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32
		; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32
		; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32
		; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32
		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
		; SLM-NEXT: ret <4 x i32> [[V3]]
		;
		; AVX-LABEL: @loadext_4i8_to_4i32(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
		; AVX-NEXT: ret <4 x i32> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
%x0 = zext i8 %i0 to i32		%x0 = zext i8 %i0 to i32
%x1 = zext i8 %i1 to i32		%x1 = zext i8 %i1 to i32
%x2 = zext i8 %i2 to i32		%x2 = zext i8 %i2 to i32
%x3 = zext i8 %i3 to i32		%x3 = zext i8 %i3 to i32
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i64(		; SSE-LABEL: @loadext_4i8_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SSE-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SSE2-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SSE-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SSE2-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
; SSE2-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64		; SSE-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64
; SSE2-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64		; SSE-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE-NEXT: ret <4 x i64> [[V3]]
;
; SLM-LABEL: @loadext_4i8_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i8_to_4i64(		; AVX1-LABEL: @loadext_4i8_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; AVX1-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {		define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {
; CHECK-LABEL: @loadext_8i8_to_8i16(		; SSE2-LABEL: @loadext_8i8_to_8i16(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>		; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; CHECK-NEXT: ret <8 x i16> [[V7]]		; SSE2-NEXT: ret <8 x i16> [[V7]]
		;
		; SLM-LABEL: @loadext_8i8_to_8i16(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
		; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16
		; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16
		; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16
		; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16
		; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16
		; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16
		; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16
		; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16
		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
		; SLM-NEXT: ret <8 x i16> [[V7]]
		;
		; AVX-LABEL: @loadext_8i8_to_8i16(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
		; AVX-NEXT: ret <8 x i16> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4		%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4
%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5		%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5
%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6		%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6
%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7		%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7
ret <8 x i16> %v7		ret <8 x i16> %v7
}		}

define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {		define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {
; CHECK-LABEL: @loadext_8i8_to_8i32(		; SSE2-LABEL: @loadext_8i8_to_8i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>		; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; CHECK-NEXT: ret <8 x i32> [[V7]]		; SSE2-NEXT: ret <8 x i32> [[V7]]
		;
		; SLM-LABEL: @loadext_8i8_to_8i32(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
		; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32
		; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32
		; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32
		; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32
		; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i32
		; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i32
		; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i32
		; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i32
		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[V7]]
		;
		; AVX-LABEL: @loadext_8i8_to_8i32(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
		; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4		%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4
%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5		%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5
%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6		%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6
%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7		%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {		define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {
; CHECK-LABEL: @loadext_16i8_to_16i16(		; SSE2-LABEL: @loadext_16i8_to_16i16(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; CHECK-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SSE2-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; CHECK-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SSE2-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; CHECK-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SSE2-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; CHECK-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; CHECK-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; CHECK-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; CHECK-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; CHECK-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; CHECK-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>		; SSE2-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8		; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; CHECK-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8		; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9		; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; CHECK-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9		; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10		; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; CHECK-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10		; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11		; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; CHECK-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11		; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12		; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; CHECK-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12		; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13		; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; CHECK-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13		; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14		; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; CHECK-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14		; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15		; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; CHECK-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15		; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; CHECK-NEXT: ret <16 x i16> [[V15]]		; SSE2-NEXT: ret <16 x i16> [[V15]]
		;
		; SLM-LABEL: @loadext_16i8_to_16i16(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
		; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
		; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
		; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
		; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
		; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
		; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
		; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
		; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
		; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
		; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
		; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
		; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
		; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
		; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
		; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
		; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
		; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16
		; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16
		; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16
		; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16
		; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16
		; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16
		; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16
		; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16
		; SLM-NEXT: [[X8:%.*]] = zext i8 [[I8]] to i16
		; SLM-NEXT: [[X9:%.*]] = zext i8 [[I9]] to i16
		; SLM-NEXT: [[X10:%.*]] = zext i8 [[I10]] to i16
		; SLM-NEXT: [[X11:%.*]] = zext i8 [[I11]] to i16
		; SLM-NEXT: [[X12:%.*]] = zext i8 [[I12]] to i16
		; SLM-NEXT: [[X13:%.*]] = zext i8 [[I13]] to i16
		; SLM-NEXT: [[X14:%.*]] = zext i8 [[I14]] to i16
		; SLM-NEXT: [[X15:%.*]] = zext i8 [[I15]] to i16
		; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
		; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
		; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
		; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
		; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
		; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
		; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
		; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
		; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
		; SLM-NEXT: ret <16 x i16> [[V15]]
		;
		; AVX-LABEL: @loadext_16i8_to_16i16(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
		; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
		; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
		; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
		; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
		; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
		; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
		; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
		; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
		; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
		; AVX-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
		; AVX-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
		; AVX-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
		; AVX-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
		; AVX-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
		; AVX-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
		; AVX-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
		; AVX-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
		; AVX-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
		; AVX-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
		; AVX-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
		; AVX-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
		; AVX-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
		; AVX-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
		; AVX-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
		; AVX-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
		; AVX-NEXT: ret <16 x i16> [[V15]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	;
ret <16 x i16> %v15		ret <16 x i16> %v15
}		}

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE2-LABEL: @loadext_2i16_to_2i64(		; SSE-LABEL: @loadext_2i16_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE-NEXT: ret <2 x i64> [[V1]]
;
; SLM-LABEL: @loadext_2i16_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%x0 = zext i16 %i0 to i64		%x0 = zext i16 %i0 to i64
%x1 = zext i16 %i1 to i64		%x1 = zext i16 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {		define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
; CHECK-LABEL: @loadext_4i16_to_4i32(		; SSE2-LABEL: @loadext_4i16_to_4i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; CHECK-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>		; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; CHECK-NEXT: ret <4 x i32> [[V3]]		; SSE2-NEXT: ret <4 x i32> [[V3]]
		;
		; SLM-LABEL: @loadext_4i16_to_4i32(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
		; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32
		; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32
		; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32
		; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32
		; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
		; SLM-NEXT: ret <4 x i32> [[V3]]
		;
		; AVX-LABEL: @loadext_4i16_to_4i32(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
		; AVX-NEXT: ret <4 x i32> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = zext i16 %i0 to i32		%x0 = zext i16 %i0 to i32
%x1 = zext i16 %i1 to i32		%x1 = zext i16 %i1 to i32
%x2 = zext i16 %i2 to i32		%x2 = zext i16 %i2 to i32
%x3 = zext i16 %i3 to i32		%x3 = zext i16 %i3 to i32
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i64(		; SSE-LABEL: @loadext_4i16_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SSE-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SSE-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE2-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
; SSE2-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64		; SSE-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64
; SSE2-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64		; SSE-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE-NEXT: ret <4 x i64> [[V3]]
;
; SLM-LABEL: @loadext_4i16_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i16_to_4i64(		; AVX1-LABEL: @loadext_4i16_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; AVX1-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {		define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {
; CHECK-LABEL: @loadext_8i16_to_8i32(		; SSE2-LABEL: @loadext_8i16_to_8i32(
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>		; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0		; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1		; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1		; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2		; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2		; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3		; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3		; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4		; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; CHECK-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4		; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5		; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; CHECK-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5		; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6		; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; CHECK-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6		; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7		; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; CHECK-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7		; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; CHECK-NEXT: ret <8 x i32> [[V7]]		; SSE2-NEXT: ret <8 x i32> [[V7]]
		;
		; SLM-LABEL: @loadext_8i16_to_8i32(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
		; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
		; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
		; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
		; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
		; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
		; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
		; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
		; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
		; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
		; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
		; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
		; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32
		; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32
		; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32
		; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32
		; SLM-NEXT: [[X4:%.*]] = zext i16 [[I4]] to i32
		; SLM-NEXT: [[X5:%.*]] = zext i16 [[I5]] to i32
		; SLM-NEXT: [[X6:%.*]] = zext i16 [[I6]] to i32
		; SLM-NEXT: [[X7:%.*]] = zext i16 [[I7]] to i32
		; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
		; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
		; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
		; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
		; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
		; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
		; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
		; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
		; SLM-NEXT: ret <8 x i32> [[V7]]
		;
		; AVX-LABEL: @loadext_8i16_to_8i32(
		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
		; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
		; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
		; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
		; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
		; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
		; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
		; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
		; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
		; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
		; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
		; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
		; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
		; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
		; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
		; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
		; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
		; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
		; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%p4 = getelementptr inbounds i16, i16* %p0, i64 4		%p4 = getelementptr inbounds i16, i16* %p0, i64 4
%p5 = getelementptr inbounds i16, i16* %p0, i64 5		%p5 = getelementptr inbounds i16, i16* %p0, i64 5
%p6 = getelementptr inbounds i16, i16* %p0, i64 6		%p6 = getelementptr inbounds i16, i16* %p0, i64 6
%p7 = getelementptr inbounds i16, i16* %p0, i64 7		%p7 = getelementptr inbounds i16, i16* %p0, i64 7
Show All 24 Lines	;
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

;		;
; vXi32		; vXi32
;		;

define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {		define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
; SSE2-LABEL: @loadext_2i32_to_2i64(		; SSE-LABEL: @loadext_2i32_to_2i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE2-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: ret <2 x i64> [[V1]]		; SSE-NEXT: ret <2 x i64> [[V1]]
;
; SLM-LABEL: @loadext_2i32_to_2i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0		; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1		; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]		; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%x0 = zext i32 %i0 to i64		%x0 = zext i32 %i0 to i64
%x1 = zext i32 %i1 to i64		%x1 = zext i32 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE2-LABEL: @loadext_4i32_to_4i64(		; SSE-LABEL: @loadext_4i32_to_4i64(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE2-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1
; SSE2-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1
; SSE2-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
; SSE2-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1
; SSE2-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64		; SSE-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64
; SSE2-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64		; SSE-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64
; SSE2-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64		; SSE-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64
; SSE2-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64		; SSE-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE2-NEXT: ret <4 x i64> [[V3]]		; SSE-NEXT: ret <4 x i64> [[V3]]
;
; SLM-LABEL: @loadext_4i32_to_4i64(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; SLM-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; SLM-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX1-LABEL: @loadext_4i32_to_4i64(		; AVX1-LABEL: @loadext_4i32_to_4i64(
; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX1-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX1-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX1-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX1-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; AVX1-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] make SLM extract vector element more expensive than defaultClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 231302

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/X86/fptosi.ll

llvm/test/Analysis/CostModel/X86/fptoui.ll

llvm/test/Analysis/CostModel/X86/shuffle-extract_subvector.ll

llvm/test/Analysis/CostModel/X86/vector-extract.ll

llvm/test/Transforms/LoopVectorize/X86/interleaving.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

llvm/test/Transforms/SLPVectorizer/X86/sext.ll

llvm/test/Transforms/SLPVectorizer/X86/zext.ll

[x86] make SLM extract vector element more expensive than default
ClosedPublic