This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Fix domains for VZEXT_LOAD type instructions
ClosedPublic

Authored by RKSimon on Dec 12 2016, 11:58 AM.

Download Raw Diff

Details

Reviewers

spatel
andreadb
mkuper
craig.topper
zansari
DavidKreitzer
m_zuckerman

Commits

rGd7518896fff0: [X86][SSE] Fix domains for VZEXT_LOAD type instructions
rL289825: [X86][SSE] Fix domains for VZEXT_LOAD type instructions

Summary

Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 81122.Dec 12 2016, 11:58 AM

RKSimon retitled this revision from to [X86][SSE] Fix domains for VZEXT_LOAD type instructions.

RKSimon updated this object.

RKSimon added reviewers: mkuper, m_zuckerman, craig.topper, spatel, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

Is there actually a domain crossing penalty for these cases?
(Adding Dave & Zia as authoritative sources of truth :-) )

In D27684#620266, @mkuper wrote:

Is there actually a domain crossing penalty for these cases?
(Adding Dave & Zia as authoritative sources of truth :-) )

The penalties are minor (and non-existent on some latest architectures), but definitely present on pre-AVX targets. It does allow us to be consistent along an instruction chain which isn't a bad thing. By allowing domain switching we also encourage float domain instructions which often have shorter encodings.

I'm working on getting some confirmation on the latest ones, but most current Core architectures suffer a 1-clk penalty switching between fp and int domains. This doesn't include the Atom line, which can do it for free.

The 1 clk isn't insignificant if you're latency bound and you do a lot of switching on the critical path. I'm not familiar with the code that decides to switch, but can it take architectures and maybe code size into consideration (i.e. favor smaller encoding with Os/Oz)?

RKSimon mentioned this in D27692: [x86] use a single shufps when it can save instructions.Dec 13 2016, 3:59 AM

In D27684#620495, @zansari wrote:

I'm working on getting some confirmation on the latest ones, but most current Core architectures suffer a 1-clk penalty switching between fp and int domains. This doesn't include the Atom line, which can do it for free.

The 1 clk isn't insignificant if you're latency bound and you do a lot of switching on the critical path. I'm not familiar with the code that decides to switch, but can it take architectures and maybe code size into consideration (i.e. favor smaller encoding with Os/Oz)?

Float domain is the default as we assume that float instructions are at least as small as the equivalent double/integer alternatives (this was true in SSE days, not so certain about the latest instruction sets) - this is why most domain agnostic code ends up using floats. Through that we get some optsize automatically without requiring Os/Oz. There is nothing to ensure we always use the shortest instruction (domain switches be damned).

We don't do much for specific architectures - we currently filter just by a target's instruction set - as the code is really only there to try and maintain a particular domain as long as possible.

LGTM.

This revision is now accepted and ready to land.Dec 15 2016, 6:55 AM

Closed by commit rL289825: [X86][SSE] Fix domains for VZEXT_LOAD type instructions (authored by RKSimon). · Explain WhyDec 15 2016, 8:16 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

	X86InstrInfo.cpp
	X86InstrInfo.cpp (revision 289461)

6 lines

test/

CodeGen/

X86/

	2008-02-06-LoadFoldingBug.ll
	2008-02-06-LoadFoldingBug.ll (revision 289461)

2 lines

	2012-1-10-buildvector.ll
	2012-1-10-buildvector.ll (revision 289461)

4 lines

	avx-intrinsics-x86-upgrade.ll
	avx-intrinsics-x86-upgrade.ll (revision 289461)

2 lines

	avx-shuffle-x86_32.ll
	avx-shuffle-x86_32.ll (revision 289461)

2 lines

	avx2-vbroadcast.ll
	avx2-vbroadcast.ll (revision 289461)

2 lines

	avx512-mov.ll
	avx512-mov.ll (revision 289461)

6 lines

	fp-logic.ll
	fp-logic.ll (revision 289461)

2 lines

	fp128-cast.ll
	fp128-cast.ll (revision 289461)

2 lines

	i64-mem-copy.ll
	i64-mem-copy.ll (revision 289461)

1 line

	logical-load-fold.ll
	logical-load-fold.ll (revision 289461)

17 lines

	merge-consecutive-loads-128.ll
	merge-consecutive-loads-128.ll (revision 289461)

54 lines

	merge-consecutive-loads-256.ll
	merge-consecutive-loads-256.ll (revision 289461)

50 lines

	merge-consecutive-loads-512.ll
	merge-consecutive-loads-512.ll (revision 289461)

32 lines

	mmx-arg-passing-x86-64.ll
	mmx-arg-passing-x86-64.ll (revision 289461)

2 lines

	pr11334.ll
	pr11334.ll (revision 289461)

4 lines

	pr2656.ll
	pr2656.ll (revision 289461)

4 lines

	scalar-int-to-fp.ll
	scalar-int-to-fp.ll (revision 289461)

8 lines

	sse-fcopysign.ll
	sse-fcopysign.ll (revision 289461)

8 lines

	sse-minmax.ll
	sse-minmax.ll (revision 289461)

76 lines

	sse2-intrinsics-fast-isel-x86_64.ll
	sse2-intrinsics-fast-isel-x86_64.ll (revision 289461)

2 lines

	sse2-intrinsics-fast-isel.ll
	sse2-intrinsics-fast-isel.ll (revision 289461)

8 lines

	sse2-intrinsics-x86-upgrade.ll
	sse2-intrinsics-x86-upgrade.ll (revision 289461)

4 lines

	sse2.ll
	sse2.ll (revision 289461)

2 lines

	uint64-to-float.ll
	uint64-to-float.ll (revision 289461)

4 lines

	uint_to_fp-2.ll
	uint_to_fp-2.ll (revision 289461)

4 lines

	vec_extract-avx.ll
	vec_extract-avx.ll (revision 289461)

4 lines

	vec_extract-mmx.ll
	vec_extract-mmx.ll (revision 289461)

8 lines

	vec_i64.ll
	vec_i64.ll (revision 289461)

8 lines

	vec_insert-2.ll
	vec_insert-2.ll (revision 289461)

2 lines

	vec_insert-3.ll
	vec_insert-3.ll (revision 289461)

4 lines

	vec_insert-mmx.ll
	vec_insert-mmx.ll (revision 289461)

4 lines

	vec_int_to_fp.ll
	vec_int_to_fp.ll (revision 289461)

4 lines

	vec_set-2.ll
	vec_set-2.ll (revision 289461)

2 lines

	vec_set-C.ll
	vec_set-C.ll (revision 289461)

2 lines

	vec_set-D.ll
	vec_set-D.ll (revision 289461)

2 lines

	vec_set-F.ll
	vec_set-F.ll (revision 289461)

2 lines

	vector-shuffle-128-v2.ll
	vector-shuffle-128-v2.ll (revision 289461)

4 lines

	vector-shuffle-128-v4.ll
	vector-shuffle-128-v4.ll (revision 289461)

4 lines

	vector-shuffle-256-v4.ll
	vector-shuffle-256-v4.ll (revision 289461)

2 lines

	vector-shuffle-256-v8.ll
	vector-shuffle-256-v8.ll (revision 289461)

8 lines

	vector-shuffle-512-v16.ll
	vector-shuffle-512-v16.ll (revision 289461)

2 lines

	vector-shuffle-combining-xop.ll
	vector-shuffle-combining-xop.ll (revision 289461)

2 lines

	vector-shuffle-combining.ll
	vector-shuffle-combining.ll (revision 289461)

8 lines

	vector-shuffle-mmx.ll
	vector-shuffle-mmx.ll (revision 289461)

2 lines

	vector-shuffle-variable-256.ll
	vector-shuffle-variable-256.ll (revision 289461)

4 lines

	vector-zmov.ll
	vector-zmov.ll (revision 289461)

8 lines

	widen_load-2.ll
	widen_load-2.ll (revision 289461)

4 lines

Diff 81122

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,462 Lines • ▼ Show 20 Lines
static const uint16_t ReplaceableInstrs[][3] = {		static const uint16_t ReplaceableInstrs[][3] = {
//PackedSingle PackedDouble PackedInt		//PackedSingle PackedDouble PackedInt
{ X86::MOVAPSmr, X86::MOVAPDmr, X86::MOVDQAmr },		{ X86::MOVAPSmr, X86::MOVAPDmr, X86::MOVDQAmr },
{ X86::MOVAPSrm, X86::MOVAPDrm, X86::MOVDQArm },		{ X86::MOVAPSrm, X86::MOVAPDrm, X86::MOVDQArm },
{ X86::MOVAPSrr, X86::MOVAPDrr, X86::MOVDQArr },		{ X86::MOVAPSrr, X86::MOVAPDrr, X86::MOVDQArr },
{ X86::MOVUPSmr, X86::MOVUPDmr, X86::MOVDQUmr },		{ X86::MOVUPSmr, X86::MOVUPDmr, X86::MOVDQUmr },
{ X86::MOVUPSrm, X86::MOVUPDrm, X86::MOVDQUrm },		{ X86::MOVUPSrm, X86::MOVUPDrm, X86::MOVDQUrm },
{ X86::MOVLPSmr, X86::MOVLPDmr, X86::MOVPQI2QImr },		{ X86::MOVLPSmr, X86::MOVLPDmr, X86::MOVPQI2QImr },
		{ X86::MOVSDrm, X86::MOVSDrm, X86::MOVQI2PQIrm },
		{ X86::MOVSSrm, X86::MOVSSrm, X86::MOVDI2PDIrm },
{ X86::MOVNTPSmr, X86::MOVNTPDmr, X86::MOVNTDQmr },		{ X86::MOVNTPSmr, X86::MOVNTPDmr, X86::MOVNTDQmr },
{ X86::ANDNPSrm, X86::ANDNPDrm, X86::PANDNrm },		{ X86::ANDNPSrm, X86::ANDNPDrm, X86::PANDNrm },
{ X86::ANDNPSrr, X86::ANDNPDrr, X86::PANDNrr },		{ X86::ANDNPSrr, X86::ANDNPDrr, X86::PANDNrr },
{ X86::ANDPSrm, X86::ANDPDrm, X86::PANDrm },		{ X86::ANDPSrm, X86::ANDPDrm, X86::PANDrm },
{ X86::ANDPSrr, X86::ANDPDrr, X86::PANDrr },		{ X86::ANDPSrr, X86::ANDPDrr, X86::PANDrr },
{ X86::ORPSrm, X86::ORPDrm, X86::PORrm },		{ X86::ORPSrm, X86::ORPDrm, X86::PORrm },
{ X86::ORPSrr, X86::ORPDrr, X86::PORrr },		{ X86::ORPSrr, X86::ORPDrr, X86::PORrr },
{ X86::XORPSrm, X86::XORPDrm, X86::PXORrm },		{ X86::XORPSrm, X86::XORPDrm, X86::PXORrm },
{ X86::XORPSrr, X86::XORPDrr, X86::PXORrr },		{ X86::XORPSrr, X86::XORPDrr, X86::PXORrr },
// AVX 128-bit support		// AVX 128-bit support
{ X86::VMOVAPSmr, X86::VMOVAPDmr, X86::VMOVDQAmr },		{ X86::VMOVAPSmr, X86::VMOVAPDmr, X86::VMOVDQAmr },
{ X86::VMOVAPSrm, X86::VMOVAPDrm, X86::VMOVDQArm },		{ X86::VMOVAPSrm, X86::VMOVAPDrm, X86::VMOVDQArm },
{ X86::VMOVAPSrr, X86::VMOVAPDrr, X86::VMOVDQArr },		{ X86::VMOVAPSrr, X86::VMOVAPDrr, X86::VMOVDQArr },
{ X86::VMOVUPSmr, X86::VMOVUPDmr, X86::VMOVDQUmr },		{ X86::VMOVUPSmr, X86::VMOVUPDmr, X86::VMOVDQUmr },
{ X86::VMOVUPSrm, X86::VMOVUPDrm, X86::VMOVDQUrm },		{ X86::VMOVUPSrm, X86::VMOVUPDrm, X86::VMOVDQUrm },
{ X86::VMOVLPSmr, X86::VMOVLPDmr, X86::VMOVPQI2QImr },		{ X86::VMOVLPSmr, X86::VMOVLPDmr, X86::VMOVPQI2QImr },
		{ X86::VMOVSDrm, X86::VMOVSDrm, X86::VMOVQI2PQIrm },
		{ X86::VMOVSSrm, X86::VMOVSSrm, X86::VMOVDI2PDIrm },
{ X86::VMOVNTPSmr, X86::VMOVNTPDmr, X86::VMOVNTDQmr },		{ X86::VMOVNTPSmr, X86::VMOVNTPDmr, X86::VMOVNTDQmr },
{ X86::VANDNPSrm, X86::VANDNPDrm, X86::VPANDNrm },		{ X86::VANDNPSrm, X86::VANDNPDrm, X86::VPANDNrm },
{ X86::VANDNPSrr, X86::VANDNPDrr, X86::VPANDNrr },		{ X86::VANDNPSrr, X86::VANDNPDrr, X86::VPANDNrr },
{ X86::VANDPSrm, X86::VANDPDrm, X86::VPANDrm },		{ X86::VANDPSrm, X86::VANDPDrm, X86::VPANDrm },
{ X86::VANDPSrr, X86::VANDPDrr, X86::VPANDrr },		{ X86::VANDPSrr, X86::VANDPDrr, X86::VPANDrr },
{ X86::VORPSrm, X86::VORPDrm, X86::VPORrm },		{ X86::VORPSrm, X86::VORPDrm, X86::VPORrm },
{ X86::VORPSrr, X86::VORPDrr, X86::VPORrr },		{ X86::VORPSrr, X86::VORPDrr, X86::VPORrr },
{ X86::VXORPSrm, X86::VXORPDrm, X86::VPXORrm },		{ X86::VXORPSrm, X86::VXORPDrm, X86::VPXORrm },
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	static const uint16_t ReplaceableInstrsAVX512[][4] = {
{ X86::VMOVAPSZ256rr, X86::VMOVAPDZ256rr, X86::VMOVDQA64Z256rr, X86::VMOVDQA32Z256rr },		{ X86::VMOVAPSZ256rr, X86::VMOVAPDZ256rr, X86::VMOVDQA64Z256rr, X86::VMOVDQA32Z256rr },
{ X86::VMOVUPSZ256mr, X86::VMOVUPDZ256mr, X86::VMOVDQU64Z256mr, X86::VMOVDQU32Z256mr },		{ X86::VMOVUPSZ256mr, X86::VMOVUPDZ256mr, X86::VMOVDQU64Z256mr, X86::VMOVDQU32Z256mr },
{ X86::VMOVUPSZ256rm, X86::VMOVUPDZ256rm, X86::VMOVDQU64Z256rm, X86::VMOVDQU32Z256rm },		{ X86::VMOVUPSZ256rm, X86::VMOVUPDZ256rm, X86::VMOVDQU64Z256rm, X86::VMOVDQU32Z256rm },
{ X86::VMOVAPSZmr, X86::VMOVAPDZmr, X86::VMOVDQA64Zmr, X86::VMOVDQA32Zmr },		{ X86::VMOVAPSZmr, X86::VMOVAPDZmr, X86::VMOVDQA64Zmr, X86::VMOVDQA32Zmr },
{ X86::VMOVAPSZrm, X86::VMOVAPDZrm, X86::VMOVDQA64Zrm, X86::VMOVDQA32Zrm },		{ X86::VMOVAPSZrm, X86::VMOVAPDZrm, X86::VMOVDQA64Zrm, X86::VMOVDQA32Zrm },
{ X86::VMOVAPSZrr, X86::VMOVAPDZrr, X86::VMOVDQA64Zrr, X86::VMOVDQA32Zrr },		{ X86::VMOVAPSZrr, X86::VMOVAPDZrr, X86::VMOVDQA64Zrr, X86::VMOVDQA32Zrr },
{ X86::VMOVUPSZmr, X86::VMOVUPDZmr, X86::VMOVDQU64Zmr, X86::VMOVDQU32Zmr },		{ X86::VMOVUPSZmr, X86::VMOVUPDZmr, X86::VMOVDQU64Zmr, X86::VMOVDQU32Zmr },
{ X86::VMOVUPSZrm, X86::VMOVUPDZrm, X86::VMOVDQU64Zrm, X86::VMOVDQU32Zrm },		{ X86::VMOVUPSZrm, X86::VMOVUPDZrm, X86::VMOVDQU64Zrm, X86::VMOVDQU32Zrm },
		{ X86::VMOVSDZrm, X86::VMOVSDZrm, X86::VMOVQI2PQIZrm, X86::VMOVQI2PQIZrm, },
		{ X86::VMOVSSZrm, X86::VMOVSSZrm, X86::VMOVDI2PDIZrm, X86::VMOVDI2PDIZrm, },
};		};

static const uint16_t ReplaceableInstrsAVX512DQ[][4] = {		static const uint16_t ReplaceableInstrsAVX512DQ[][4] = {
// Two integer columns for 64-bit and 32-bit elements.		// Two integer columns for 64-bit and 32-bit elements.
//PackedSingle PackedDouble PackedInt PackedInt		//PackedSingle PackedDouble PackedInt PackedInt
{ X86::VANDNPSZ128rm, X86::VANDNPDZ128rm, X86::VPANDNQZ128rm, X86::VPANDNDZ128rm },		{ X86::VANDNPSZ128rm, X86::VANDNPDZ128rm, X86::VPANDNQZ128rm, X86::VPANDNDZ128rm },
{ X86::VANDNPSZ128rr, X86::VANDNPDZ128rr, X86::VPANDNQZ128rr, X86::VPANDNDZ128rr },		{ X86::VANDNPSZ128rr, X86::VANDNPDZ128rr, X86::VPANDNQZ128rr, X86::VPANDNDZ128rr },
{ X86::VANDPSZ128rm, X86::VANDPDZ128rm, X86::VPANDQZ128rm, X86::VPANDDZ128rm },		{ X86::VANDPSZ128rm, X86::VANDPDZ128rm, X86::VPANDQZ128rm, X86::VPANDDZ128rm },
▲ Show 20 Lines • Show All 1,131 Lines • Show Last 20 Lines

test/CodeGen/X86/2008-02-06-LoadFoldingBug.ll

	; RUN: llc < %s -march=x86 -mattr=+sse2 \| FileCheck %s			; RUN: llc < %s -march=x86 -mattr=+sse2 \| FileCheck %s

	; CHECK: xorpd {{.*}}{{LCPI0_0\|__xmm@}}			; CHECK: xorps {{.*}}{{LCPI0_0\|__xmm@}}
	define void @casin({ double, double }* sret %agg.result, double %z.0, double %z.1) nounwind {			define void @casin({ double, double }* sret %agg.result, double %z.0, double %z.1) nounwind {
	entry:			entry:
	%memtmp = alloca { double, double }, align 8 ; <{ double, double }*> [#uses=3]			%memtmp = alloca { double, double }, align 8 ; <{ double, double }*> [#uses=3]
	%tmp4 = fsub double -0.000000e+00, %z.1 ; <double> [#uses=1]			%tmp4 = fsub double -0.000000e+00, %z.1 ; <double> [#uses=1]
	call void @casinh( { double, double }* sret %memtmp, double %tmp4, double %z.0 ) nounwind			call void @casinh( { double, double }* sret %memtmp, double %tmp4, double %z.0 ) nounwind
	%tmp19 = getelementptr { double, double }, { double, double }* %memtmp, i32 0, i32 0 ; <double*> [#uses=1]			%tmp19 = getelementptr { double, double }, { double, double }* %memtmp, i32 0, i32 0 ; <double*> [#uses=1]
	%tmp20 = load double, double* %tmp19, align 8 ; <double> [#uses=1]			%tmp20 = load double, double* %tmp19, align 8 ; <double> [#uses=1]
	%tmp22 = getelementptr { double, double }, { double, double }* %memtmp, i32 0, i32 1 ; <double*> [#uses=1]			%tmp22 = getelementptr { double, double }, { double, double }* %memtmp, i32 0, i32 1 ; <double*> [#uses=1]
	Show All 10 Lines

test/CodeGen/X86/2012-1-10-buildvector.ll

Show All 12 Lines	; CHECK-NEXT: retl
%vecinit8.i = shufflevector <3 x i64> zeroinitializer, <3 x i64> %vext.i, <3 x i32> <i32 0, i32 3, i32 4>		%vecinit8.i = shufflevector <3 x i64> zeroinitializer, <3 x i64> %vext.i, <3 x i32> <i32 0, i32 3, i32 4>
store <3 x i64> %vecinit8.i, <3 x i64>* undef, align 32		store <3 x i64> %vecinit8.i, <3 x i64>* undef, align 32
ret void		ret void
}		}

define void @bad_insert(i32 %t) {		define void @bad_insert(i32 %t) {
; CHECK-LABEL: bad_insert:		; CHECK-LABEL: bad_insert:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; CHECK-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; CHECK-NEXT: vmovdqa %ymm0, (%eax)		; CHECK-NEXT: vmovaps %ymm0, (%eax)
; CHECK-NEXT: vzeroupper		; CHECK-NEXT: vzeroupper
; CHECK-NEXT: retl		; CHECK-NEXT: retl
%v2 = insertelement <8 x i32> zeroinitializer, i32 %t, i32 0		%v2 = insertelement <8 x i32> zeroinitializer, i32 %t, i32 0
store <8 x i32> %v2, <8 x i32> addrspace(1)* undef, align 32		store <8 x i32> %v2, <8 x i32> addrspace(1)* undef, align 32
ret void		ret void
}		}

test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

	Show First 20 Lines • Show All 397 Lines • ▼ Show 20 Lines
	declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind			declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind


	define void @test_x86_sse2_storeu_pd(i8* %a0, <2 x double> %a1) {			define void @test_x86_sse2_storeu_pd(i8* %a0, <2 x double> %a1) {
	; fadd operation forces the execution domain.			; fadd operation forces the execution domain.
	; CHECK-LABEL: test_x86_sse2_storeu_pd:			; CHECK-LABEL: test_x86_sse2_storeu_pd:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; CHECK-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
	; CHECK-NEXT: vpslldq {{.*#+}} xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]			; CHECK-NEXT: vpslldq {{.*#+}} xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
	; CHECK-NEXT: vaddpd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vaddpd %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vmovupd %xmm0, (%eax)			; CHECK-NEXT: vmovupd %xmm0, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%a2 = fadd <2 x double> %a1, <double 0x0, double 0x4200000000000000>			%a2 = fadd <2 x double> %a1, <double 0x0, double 0x4200000000000000>
	call void @llvm.x86.sse2.storeu.pd(i8* %a0, <2 x double> %a2)			call void @llvm.x86.sse2.storeu.pd(i8* %a0, <2 x double> %a2)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

test/CodeGen/X86/avx-shuffle-x86_32.ll

	Show All 10 Lines
	%b = shufflevector <4 x i64> %a, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%b = shufflevector <4 x i64> %a, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	ret <4 x i64>%b			ret <4 x i64>%b
	}			}

	define <8 x i16> @test2(<4 x i16>* %v) nounwind {			define <8 x i16> @test2(<4 x i16>* %v) nounwind {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero			; CHECK-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%v9 = load <4 x i16>, <4 x i16> * %v, align 8			%v9 = load <4 x i16>, <4 x i16> * %v, align 8
	%v10 = shufflevector <4 x i16> %v9, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%v10 = shufflevector <4 x i16> %v9, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	%v11 = shufflevector <8 x i16> <i16 undef, i16 undef, i16 undef, i16 undef, i16 0, i16 0, i16 0, i16 0>, <8 x i16> %v10, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>			%v11 = shufflevector <8 x i16> <i16 undef, i16 undef, i16 undef, i16 undef, i16 0, i16 0, i16 0, i16 0>, <8 x i16> %v10, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>
	ret <8 x i16> %v11			ret <8 x i16> %v11
	}			}

test/CodeGen/X86/avx2-vbroadcast.ll

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	; X64-AVX512VL-NEXT: retq
%shuf = shufflevector <4 x i16> %load, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>		%shuf = shufflevector <4 x i16> %load, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
ret <8 x i16> %shuf		ret <8 x i16> %shuf
}		}

define <16 x i16> @broadcast_mem_v4i16_v16i16(<4 x i16>* %ptr) {		define <16 x i16> @broadcast_mem_v4i16_v16i16(<4 x i16>* %ptr) {
; X32-AVX2-LABEL: broadcast_mem_v4i16_v16i16:		; X32-AVX2-LABEL: broadcast_mem_v4i16_v16i16:
; X32-AVX2: ## BB#0:		; X32-AVX2: ## BB#0:
; X32-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX2-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; X32-AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; X32-AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,6,7,4,5,6,7,6,7],zero,zero		; X32-AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,6,7,4,5,6,7,6,7],zero,zero
; X32-AVX2-NEXT: vpbroadcastq %xmm0, %ymm0		; X32-AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
; X32-AVX2-NEXT: retl		; X32-AVX2-NEXT: retl
;		;
; X64-AVX2-LABEL: broadcast_mem_v4i16_v16i16:		; X64-AVX2-LABEL: broadcast_mem_v4i16_v16i16:
; X64-AVX2: ## BB#0:		; X64-AVX2: ## BB#0:
; X64-AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X64-AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; X64-AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,6,7,4,5,6,7,6,7],zero,zero		; X64-AVX2-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,6,7,4,5,6,7,6,7],zero,zero
▲ Show 20 Lines • Show All 1,505 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-mov.ll

	Show All 25 Lines
	; CHECK-NEXT: retq ## encoding: [0xc3]			; CHECK-NEXT: retq ## encoding: [0xc3]
	%res = insertelement <2 x i64>undef, i64 %x, i32 0			%res = insertelement <2 x i64>undef, i64 %x, i32 0
	ret <2 x i64>%res			ret <2 x i64>%res
	}			}

	define <4 x i32> @test4(i32* %x) {			define <4 x i32> @test4(i32* %x) {
	; CHECK-LABEL: test4:			; CHECK-LABEL: test4:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: vmovd (%rdi), %xmm0 ## encoding: [0x62,0xf1,0x7d,0x08,0x6e,0x07]			; CHECK-NEXT: vmovss (%rdi), %xmm0 ## encoding: [0x62,0xf1,0x7e,0x08,0x10,0x07]
	; CHECK-NEXT: ## xmm0 = mem[0],zero,zero,zero			; CHECK-NEXT: ## xmm0 = mem[0],zero,zero,zero
	; CHECK-NEXT: retq ## encoding: [0xc3]			; CHECK-NEXT: retq ## encoding: [0xc3]
	%y = load i32, i32* %x			%y = load i32, i32* %x
	%res = insertelement <4 x i32>undef, i32 %y, i32 0			%res = insertelement <4 x i32>undef, i32 %y, i32 0
	ret <4 x i32>%res			ret <4 x i32>%res
	}			}

	define void @test5(float %x, float* %y) {			define void @test5(float %x, float* %y) {
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: retq ## encoding: [0xc3]			; CHECK-NEXT: retq ## encoding: [0xc3]
	%res = extractelement <2 x i64> %x, i32 0			%res = extractelement <2 x i64> %x, i32 0
	ret i64 %res			ret i64 %res
	}			}

	define <4 x i32> @test10(i32* %x) {			define <4 x i32> @test10(i32* %x) {
	; CHECK-LABEL: test10:			; CHECK-LABEL: test10:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: vmovd (%rdi), %xmm0 ## encoding: [0x62,0xf1,0x7d,0x08,0x6e,0x07]			; CHECK-NEXT: vmovss (%rdi), %xmm0 ## encoding: [0x62,0xf1,0x7e,0x08,0x10,0x07]
	; CHECK-NEXT: ## xmm0 = mem[0],zero,zero,zero			; CHECK-NEXT: ## xmm0 = mem[0],zero,zero,zero
	; CHECK-NEXT: retq ## encoding: [0xc3]			; CHECK-NEXT: retq ## encoding: [0xc3]
	%y = load i32, i32* %x, align 4			%y = load i32, i32* %x, align 4
	%res = insertelement <4 x i32>zeroinitializer, i32 %y, i32 0			%res = insertelement <4 x i32>zeroinitializer, i32 %y, i32 0
	ret <4 x i32>%res			ret <4 x i32>%res
	}			}

	define <4 x float> @test11(float* %x) {			define <4 x float> @test11(float* %x) {
	Show All 34 Lines
	; CHECK-NEXT: retq ## encoding: [0xc3]			; CHECK-NEXT: retq ## encoding: [0xc3]
	%res = insertelement <4 x i32>zeroinitializer, i32 %x, i32 0			%res = insertelement <4 x i32>zeroinitializer, i32 %x, i32 0
	ret <4 x i32>%res			ret <4 x i32>%res
	}			}

	define <4 x i32> @test15(i32* %x) {			define <4 x i32> @test15(i32* %x) {
	; CHECK-LABEL: test15:			; CHECK-LABEL: test15:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: vmovd (%rdi), %xmm0 ## encoding: [0x62,0xf1,0x7d,0x08,0x6e,0x07]			; CHECK-NEXT: vmovss (%rdi), %xmm0 ## encoding: [0x62,0xf1,0x7e,0x08,0x10,0x07]
	; CHECK-NEXT: ## xmm0 = mem[0],zero,zero,zero			; CHECK-NEXT: ## xmm0 = mem[0],zero,zero,zero
	; CHECK-NEXT: retq ## encoding: [0xc3]			; CHECK-NEXT: retq ## encoding: [0xc3]
	%y = load i32, i32* %x, align 4			%y = load i32, i32* %x, align 4
	%res = insertelement <4 x i32>zeroinitializer, i32 %y, i32 0			%res = insertelement <4 x i32>zeroinitializer, i32 %y, i32 0
	ret <4 x i32>%res			ret <4 x i32>%res
	}			}

	define <16 x i32> @test16(i8 * %addr) {			define <16 x i32> @test16(i8 * %addr) {
	▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

test/CodeGen/X86/fp-logic.ll

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	;
%bc3 = bitcast i64 %and to double		%bc3 = bitcast i64 %and to double
ret double %bc3		ret double %bc3
}		}

define double @f7_double(double %x) {		define double @f7_double(double %x) {
; CHECK-LABEL: f7_double:		; CHECK-LABEL: f7_double:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero		; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; CHECK-NEXT: andpd %xmm1, %xmm0		; CHECK-NEXT: andps %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
;		;
%bc1 = bitcast double %x to i64		%bc1 = bitcast double %x to i64
%and = and i64 %bc1, 3		%and = and i64 %bc1, 3
%bc2 = bitcast i64 %and to double		%bc2 = bitcast i64 %and to double
ret double %bc2		ret double %bc2
}		}

▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

test/CodeGen/X86/fp128-cast.ll

	Show All 40 Lines
	; X32: fldl vf64			; X32: fldl vf64
	; X32: fstpl			; X32: fstpl
	; X32: calll __extenddftf2			; X32: calll __extenddftf2
	; X32: retl			; X32: retl
	;			;
	; X64-LABEL: TestFPExtF64_F128:			; X64-LABEL: TestFPExtF64_F128:
	; X64: movsd vf64(%rip), %xmm0			; X64: movsd vf64(%rip), %xmm0
	; X64-NEXT: callq __extenddftf2			; X64-NEXT: callq __extenddftf2
	; X64-NEXT: movapd %xmm0, vf128(%rip)			; X64-NEXT: movaps %xmm0, vf128(%rip)
	; X64: ret			; X64: ret
	}			}

	define void @TestFPToSIF128_I32() {			define void @TestFPToSIF128_I32() {
	entry:			entry:
	%0 = load fp128, fp128* @vf128, align 16			%0 = load fp128, fp128* @vf128, align 16
	%conv = fptosi fp128 %0 to i32			%conv = fptosi fp128 %0 to i32
	store i32 %conv, i32* @vi32, align 4			store i32 %conv, i32* @vi32, align 4
	▲ Show 20 Lines • Show All 307 Lines • Show Last 20 Lines

test/CodeGen/X86/i64-mem-copy.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	}			}

	; PR23476			; PR23476
	; Handle extraction from a non-simple / pre-legalization type.			; Handle extraction from a non-simple / pre-legalization type.

	define void @PR23476(<5 x i64> %in, i64* %out, i32 %index) {			define void @PR23476(<5 x i64> %in, i64* %out, i32 %index) {
	; X32-LABEL: PR23476:			; X32-LABEL: PR23476:
	; X32: movsd {{.*#+}} xmm0 = mem[0],zero			; X32: movsd {{.*#+}} xmm0 = mem[0],zero
				; X32: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: movsd %xmm0, (%eax)			; X32-NEXT: movsd %xmm0, (%eax)
	%ext = extractelement <5 x i64> %in, i32 %index			%ext = extractelement <5 x i64> %in, i32 %index
	store i64 %ext, i64* %out, align 8			store i64 %ext, i64* %out, align 8
	ret void			ret void
	}			}

test/CodeGen/X86/logical-load-fold.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2,sse-unaligned-mem \| FileCheck %s --check-prefix=SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2,sse-unaligned-mem \| FileCheck %s --check-prefix=SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=AVX			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=AVX

	; Although we have the ability to fold an unaligned load with AVX			; Although we have the ability to fold an unaligned load with AVX
	; and under special conditions with some SSE implementations, we			; and under special conditions with some SSE implementations, we
	; can not fold the load under any circumstances in these test			; can not fold the load under any circumstances in these test
	; cases because they are not 16-byte loads. The load must be			; cases because they are not 16-byte loads. The load must be
	; executed as a scalar ('movs*') with a zero extension to			; executed as a scalar ('movs*') with a zero extension to
	; 128-bits and then used in the packed logical ('andp*') op.			; 128-bits and then used in the packed logical ('andp*') op.
	; PR22371 - http://llvm.org/bugs/show_bug.cgi?id=22371			; PR22371 - http://llvm.org/bugs/show_bug.cgi?id=22371

	define double @load_double_no_fold(double %x, double %y) {			define double @load_double_no_fold(double %x, double %y) {
	; SSE2-LABEL: load_double_no_fold:			; SSE2-LABEL: load_double_no_fold:
	; SSE2: BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: cmplesd %xmm0, %xmm1			; SSE2-NEXT: cmplesd %xmm0, %xmm1
	; SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE2-NEXT: andpd %xmm1, %xmm0			; SSE2-NEXT: andps %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: load_double_no_fold:			; AVX-LABEL: load_double_no_fold:
	; AVX: BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vcmplesd %xmm0, %xmm1, %xmm0			; AVX-NEXT: vcmplesd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vandpd %xmm1, %xmm0, %xmm0			; AVX-NEXT: vandps %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%cmp = fcmp oge double %x, %y			%cmp = fcmp oge double %x, %y
	%zext = zext i1 %cmp to i32			%zext = zext i1 %cmp to i32
	%conv = sitofp i32 %zext to double			%conv = sitofp i32 %zext to double
	ret double %conv			ret double %conv
	}			}

	define float @load_float_no_fold(float %x, float %y) {			define float @load_float_no_fold(float %x, float %y) {
	; SSE2-LABEL: load_float_no_fold:			; SSE2-LABEL: load_float_no_fold:
	; SSE2: BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: cmpless %xmm0, %xmm1			; SSE2-NEXT: cmpless %xmm0, %xmm1
	; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE2-NEXT: andps %xmm1, %xmm0			; SSE2-NEXT: andps %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: load_float_no_fold:			; AVX-LABEL: load_float_no_fold:
	; AVX: BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vcmpless %xmm0, %xmm1, %xmm0			; AVX-NEXT: vcmpless %xmm0, %xmm1, %xmm0
	; AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; AVX-NEXT: vandps %xmm1, %xmm0, %xmm0			; AVX-NEXT: vandps %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%cmp = fcmp oge float %x, %y			%cmp = fcmp oge float %x, %y
	%zext = zext i1 %cmp to i32			%zext = zext i1 %cmp to i32
	%conv = sitofp i32 %zext to float			%conv = sitofp i32 %zext to float
	ret float %conv			ret float %conv
	}			}

test/CodeGen/X86/merge-consecutive-loads-128.ll

Show First 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	; X32-SSE41-NEXT: retl
%res1 = insertelement <4 x i32> %res0, i32 %val1, i32 1		%res1 = insertelement <4 x i32> %res0, i32 %val1, i32 1
%res3 = insertelement <4 x i32> %res1, i32 %val3, i32 3		%res3 = insertelement <4 x i32> %res1, i32 %val3, i32 3
ret <4 x i32> %res3		ret <4 x i32> %res3
}		}

define <4 x i32> @merge_4i32_i32_3zuu(i32* %ptr) nounwind uwtable noinline ssp {		define <4 x i32> @merge_4i32_i32_3zuu(i32* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_4i32_i32_3zuu:		; SSE-LABEL: merge_4i32_i32_3zuu:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_4i32_i32_3zuu:		; AVX-LABEL: merge_4i32_i32_3zuu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_4i32_i32_3zuu:		; X32-SSE1-LABEL: merge_4i32_i32_3zuu:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-SSE1-NEXT: movl 12(%ecx), %ecx		; X32-SSE1-NEXT: movl 12(%ecx), %ecx
; X32-SSE1-NEXT: movl %ecx, (%eax)		; X32-SSE1-NEXT: movl %ecx, (%eax)
; X32-SSE1-NEXT: movl $0, 4(%eax)		; X32-SSE1-NEXT: movl $0, 4(%eax)
; X32-SSE1-NEXT: retl $4		; X32-SSE1-NEXT: retl $4
;		;
; X32-SSE41-LABEL: merge_4i32_i32_3zuu:		; X32-SSE41-LABEL: merge_4i32_i32_3zuu:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-SSE41-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 3		%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 3
%val0 = load i32, i32* %ptr0		%val0 = load i32, i32* %ptr0
%res0 = insertelement <4 x i32> undef, i32 %val0, i32 0		%res0 = insertelement <4 x i32> undef, i32 %val0, i32 0
%res1 = insertelement <4 x i32> %res0, i32 0, i32 1		%res1 = insertelement <4 x i32> %res0, i32 0, i32 1
ret <4 x i32> %res1		ret <4 x i32> %res1
}		}

define <4 x i32> @merge_4i32_i32_34uu(i32* %ptr) nounwind uwtable noinline ssp {		define <4 x i32> @merge_4i32_i32_34uu(i32* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_4i32_i32_34uu:		; SSE-LABEL: merge_4i32_i32_34uu:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_4i32_i32_34uu:		; AVX-LABEL: merge_4i32_i32_34uu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_4i32_i32_34uu:		; X32-SSE1-LABEL: merge_4i32_i32_34uu:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-SSE1-NEXT: movl 12(%ecx), %edx		; X32-SSE1-NEXT: movl 12(%ecx), %edx
; X32-SSE1-NEXT: movl 16(%ecx), %ecx		; X32-SSE1-NEXT: movl 16(%ecx), %ecx
; X32-SSE1-NEXT: movl %ecx, 4(%eax)		; X32-SSE1-NEXT: movl %ecx, 4(%eax)
; X32-SSE1-NEXT: movl %edx, (%eax)		; X32-SSE1-NEXT: movl %edx, (%eax)
; X32-SSE1-NEXT: retl $4		; X32-SSE1-NEXT: retl $4
;		;
; X32-SSE41-LABEL: merge_4i32_i32_34uu:		; X32-SSE41-LABEL: merge_4i32_i32_34uu:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 3		%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 3
%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 4		%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 4
%val0 = load i32, i32* %ptr0		%val0 = load i32, i32* %ptr0
%val1 = load i32, i32* %ptr1		%val1 = load i32, i32* %ptr1
%res0 = insertelement <4 x i32> undef, i32 %val0, i32 0		%res0 = insertelement <4 x i32> undef, i32 %val0, i32 0
%res1 = insertelement <4 x i32> %res0, i32 %val1, i32 1		%res1 = insertelement <4 x i32> %res0, i32 %val1, i32 1
ret <4 x i32> %res1		ret <4 x i32> %res1
}		}

define <4 x i32> @merge_4i32_i32_45zz(i32* %ptr) nounwind uwtable noinline ssp {		define <4 x i32> @merge_4i32_i32_45zz(i32* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_4i32_i32_45zz:		; SSE-LABEL: merge_4i32_i32_45zz:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_4i32_i32_45zz:		; AVX-LABEL: merge_4i32_i32_45zz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_4i32_i32_45zz:		; X32-SSE1-LABEL: merge_4i32_i32_45zz:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-SSE1-NEXT: movl 16(%ecx), %edx		; X32-SSE1-NEXT: movl 16(%ecx), %edx
; X32-SSE1-NEXT: movl 20(%ecx), %ecx		; X32-SSE1-NEXT: movl 20(%ecx), %ecx
; X32-SSE1-NEXT: movl %ecx, 4(%eax)		; X32-SSE1-NEXT: movl %ecx, 4(%eax)
; X32-SSE1-NEXT: movl %edx, (%eax)		; X32-SSE1-NEXT: movl %edx, (%eax)
; X32-SSE1-NEXT: movl $0, 12(%eax)		; X32-SSE1-NEXT: movl $0, 12(%eax)
; X32-SSE1-NEXT: movl $0, 8(%eax)		; X32-SSE1-NEXT: movl $0, 8(%eax)
; X32-SSE1-NEXT: retl $4		; X32-SSE1-NEXT: retl $4
;		;
; X32-SSE41-LABEL: merge_4i32_i32_45zz:		; X32-SSE41-LABEL: merge_4i32_i32_45zz:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 4		%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 4
%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 5		%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 5
%val0 = load i32, i32* %ptr0		%val0 = load i32, i32* %ptr0
%val1 = load i32, i32* %ptr1		%val1 = load i32, i32* %ptr1
%res0 = insertelement <4 x i32> zeroinitializer, i32 %val0, i32 0		%res0 = insertelement <4 x i32> zeroinitializer, i32 %val0, i32 0
%res1 = insertelement <4 x i32> %res0, i32 %val1, i32 1		%res1 = insertelement <4 x i32> %res0, i32 %val1, i32 1
ret <4 x i32> %res1		ret <4 x i32> %res1
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	; X32-SSE41-NEXT: retl
%res5 = insertelement <8 x i16> %res4, i16 %val5, i32 5		%res5 = insertelement <8 x i16> %res4, i16 %val5, i32 5
%res7 = insertelement <8 x i16> %res5, i16 %val7, i32 7		%res7 = insertelement <8 x i16> %res5, i16 %val7, i32 7
ret <8 x i16> %res7		ret <8 x i16> %res7
}		}

define <8 x i16> @merge_8i16_i16_34uuuuuu(i16* %ptr) nounwind uwtable noinline ssp {		define <8 x i16> @merge_8i16_i16_34uuuuuu(i16* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_8i16_i16_34uuuuuu:		; SSE-LABEL: merge_8i16_i16_34uuuuuu:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_8i16_i16_34uuuuuu:		; AVX-LABEL: merge_8i16_i16_34uuuuuu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_8i16_i16_34uuuuuu:		; X32-SSE1-LABEL: merge_8i16_i16_34uuuuuu:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-SSE1-NEXT: movzwl 6(%ecx), %edx		; X32-SSE1-NEXT: movzwl 6(%ecx), %edx
; X32-SSE1-NEXT: movzwl 8(%ecx), %ecx		; X32-SSE1-NEXT: movzwl 8(%ecx), %ecx
; X32-SSE1-NEXT: movw %cx, 2(%eax)		; X32-SSE1-NEXT: movw %cx, 2(%eax)
; X32-SSE1-NEXT: movw %dx, (%eax)		; X32-SSE1-NEXT: movw %dx, (%eax)
; X32-SSE1-NEXT: retl $4		; X32-SSE1-NEXT: retl $4
;		;
; X32-SSE41-LABEL: merge_8i16_i16_34uuuuuu:		; X32-SSE41-LABEL: merge_8i16_i16_34uuuuuu:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-SSE41-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 3		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 3
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 4		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 4
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%res0 = insertelement <8 x i16> undef, i16 %val0, i32 0		%res0 = insertelement <8 x i16> undef, i16 %val0, i32 0
%res1 = insertelement <8 x i16> %res0, i16 %val1, i32 1		%res1 = insertelement <8 x i16> %res0, i16 %val1, i32 1
ret <8 x i16> %res1		ret <8 x i16> %res1
}		}

define <8 x i16> @merge_8i16_i16_45u7zzzz(i16* %ptr) nounwind uwtable noinline ssp {		define <8 x i16> @merge_8i16_i16_45u7zzzz(i16* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_8i16_i16_45u7zzzz:		; SSE-LABEL: merge_8i16_i16_45u7zzzz:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_8i16_i16_45u7zzzz:		; AVX-LABEL: merge_8i16_i16_45u7zzzz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_8i16_i16_45u7zzzz:		; X32-SSE1-LABEL: merge_8i16_i16_45u7zzzz:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: pushl %esi		; X32-SSE1-NEXT: pushl %esi
; X32-SSE1-NEXT: .Lcfi14:		; X32-SSE1-NEXT: .Lcfi14:
; X32-SSE1-NEXT: .cfi_def_cfa_offset 8		; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
; X32-SSE1-NEXT: .Lcfi15:		; X32-SSE1-NEXT: .Lcfi15:
Show All 11 Lines
; X32-SSE1-NEXT: movw $0, 10(%eax)		; X32-SSE1-NEXT: movw $0, 10(%eax)
; X32-SSE1-NEXT: movw $0, 8(%eax)		; X32-SSE1-NEXT: movw $0, 8(%eax)
; X32-SSE1-NEXT: popl %esi		; X32-SSE1-NEXT: popl %esi
; X32-SSE1-NEXT: retl $4		; X32-SSE1-NEXT: retl $4
;		;
; X32-SSE41-LABEL: merge_8i16_i16_45u7zzzz:		; X32-SSE41-LABEL: merge_8i16_i16_45u7zzzz:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 5		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 5
%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 7		%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 7
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%val3 = load i16, i16* %ptr3		%val3 = load i16, i16* %ptr3
%res0 = insertelement <8 x i16> undef, i16 %val0, i32 0		%res0 = insertelement <8 x i16> undef, i16 %val0, i32 0
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	; X32-SSE41-NEXT: retl
%resD = insertelement <16 x i8> %resC, i8 %valD, i32 13		%resD = insertelement <16 x i8> %resC, i8 %valD, i32 13
%resF = insertelement <16 x i8> %resD, i8 %valF, i32 15		%resF = insertelement <16 x i8> %resD, i8 %valF, i32 15
ret <16 x i8> %resF		ret <16 x i8> %resF
}		}

define <16 x i8> @merge_16i8_i8_01u3uuzzuuuuuzzz(i8* %ptr) nounwind uwtable noinline ssp {		define <16 x i8> @merge_16i8_i8_01u3uuzzuuuuuzzz(i8* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:		; SSE-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:		; AVX-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:		; X32-SSE1-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-SSE1-NEXT: movb (%ecx), %dl		; X32-SSE1-NEXT: movb (%ecx), %dl
; X32-SSE1-NEXT: movb 1(%ecx), %dh		; X32-SSE1-NEXT: movb 1(%ecx), %dh
; X32-SSE1-NEXT: movb 3(%ecx), %cl		; X32-SSE1-NEXT: movb 3(%ecx), %cl
; X32-SSE1-NEXT: movb %dh, 1(%eax)		; X32-SSE1-NEXT: movb %dh, 1(%eax)
; X32-SSE1-NEXT: movb %dl, (%eax)		; X32-SSE1-NEXT: movb %dl, (%eax)
; X32-SSE1-NEXT: movb %cl, 3(%eax)		; X32-SSE1-NEXT: movb %cl, 3(%eax)
; X32-SSE1-NEXT: movb $0, 15(%eax)		; X32-SSE1-NEXT: movb $0, 15(%eax)
; X32-SSE1-NEXT: movb $0, 14(%eax)		; X32-SSE1-NEXT: movb $0, 14(%eax)
; X32-SSE1-NEXT: movb $0, 13(%eax)		; X32-SSE1-NEXT: movb $0, 13(%eax)
; X32-SSE1-NEXT: movb $0, 7(%eax)		; X32-SSE1-NEXT: movb $0, 7(%eax)
; X32-SSE1-NEXT: movb $0, 6(%eax)		; X32-SSE1-NEXT: movb $0, 6(%eax)
; X32-SSE1-NEXT: retl $4		; X32-SSE1-NEXT: retl $4
;		;
; X32-SSE41-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:		; X32-SSE41-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-SSE41-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 3		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 3
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
%res0 = insertelement <16 x i8> undef, i8 %val0, i32 0		%res0 = insertelement <16 x i8> undef, i8 %val0, i32 0
%res1 = insertelement <16 x i8> %res0, i8 %val1, i32 1		%res1 = insertelement <16 x i8> %res0, i8 %val1, i32 1
%res3 = insertelement <16 x i8> %res1, i8 %val3, i32 3		%res3 = insertelement <16 x i8> %res1, i8 %val3, i32 3
%res6 = insertelement <16 x i8> %res3, i8 0, i32 6		%res6 = insertelement <16 x i8> %res3, i8 0, i32 6
%res7 = insertelement <16 x i8> %res6, i8 0, i32 7		%res7 = insertelement <16 x i8> %res6, i8 0, i32 7
%resD = insertelement <16 x i8> %res7, i8 0, i32 13		%resD = insertelement <16 x i8> %res7, i8 0, i32 13
%resE = insertelement <16 x i8> %resD, i8 0, i32 14		%resE = insertelement <16 x i8> %resD, i8 0, i32 14
%resF = insertelement <16 x i8> %resE, i8 0, i32 15		%resF = insertelement <16 x i8> %resE, i8 0, i32 15
ret <16 x i8> %resF		ret <16 x i8> %resF
}		}

define <16 x i8> @merge_16i8_i8_0123uu67uuuuuzzz(i8* %ptr) nounwind uwtable noinline ssp {		define <16 x i8> @merge_16i8_i8_0123uu67uuuuuzzz(i8* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:		; SSE-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:		; AVX-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:		; X32-SSE1-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: pushl %ebx		; X32-SSE1-NEXT: pushl %ebx
; X32-SSE1-NEXT: .Lcfi19:		; X32-SSE1-NEXT: .Lcfi19:
; X32-SSE1-NEXT: .cfi_def_cfa_offset 8		; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
; X32-SSE1-NEXT: pushl %eax		; X32-SSE1-NEXT: pushl %eax
Show All 22 Lines
; X32-SSE1-NEXT: movb $0, 13(%eax)		; X32-SSE1-NEXT: movb $0, 13(%eax)
; X32-SSE1-NEXT: addl $4, %esp		; X32-SSE1-NEXT: addl $4, %esp
; X32-SSE1-NEXT: popl %ebx		; X32-SSE1-NEXT: popl %ebx
; X32-SSE1-NEXT: retl $4		; X32-SSE1-NEXT: retl $4
;		;
; X32-SSE41-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:		; X32-SSE41-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1
%ptr2 = getelementptr inbounds i8, i8* %ptr, i64 2		%ptr2 = getelementptr inbounds i8, i8* %ptr, i64 2
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 3		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 3
%ptr6 = getelementptr inbounds i8, i8* %ptr, i64 6		%ptr6 = getelementptr inbounds i8, i8* %ptr, i64 6
%ptr7 = getelementptr inbounds i8, i8* %ptr, i64 7		%ptr7 = getelementptr inbounds i8, i8* %ptr, i64 7
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
Show All 12 Lines	; X32-SSE41-NEXT: retl
%resE = insertelement <16 x i8> %resD, i8 0, i32 14		%resE = insertelement <16 x i8> %resD, i8 0, i32 14
%resF = insertelement <16 x i8> %resE, i8 0, i32 15		%resF = insertelement <16 x i8> %resE, i8 0, i32 15
ret <16 x i8> %resF		ret <16 x i8> %resF
}		}

define void @merge_4i32_i32_combine(<4 x i32>* %dst, i32* %src) {		define void @merge_4i32_i32_combine(<4 x i32>* %dst, i32* %src) {
; SSE-LABEL: merge_4i32_i32_combine:		; SSE-LABEL: merge_4i32_i32_combine:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: movdqa %xmm0, (%rdi)		; SSE-NEXT: movaps %xmm0, (%rdi)
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_4i32_i32_combine:		; AVX-LABEL: merge_4i32_i32_combine:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: vmovdqa %xmm0, (%rdi)		; AVX-NEXT: vmovaps %xmm0, (%rdi)
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE1-LABEL: merge_4i32_i32_combine:		; X32-SSE1-LABEL: merge_4i32_i32_combine:
; X32-SSE1: # BB#0:		; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-SSE1-NEXT: movl (%ecx), %ecx		; X32-SSE1-NEXT: movl (%ecx), %ecx
; X32-SSE1-NEXT: movl %ecx, (%eax)		; X32-SSE1-NEXT: movl %ecx, (%eax)
; X32-SSE1-NEXT: movl $0, 12(%eax)		; X32-SSE1-NEXT: movl $0, 12(%eax)
; X32-SSE1-NEXT: movl $0, 8(%eax)		; X32-SSE1-NEXT: movl $0, 8(%eax)
; X32-SSE1-NEXT: movl $0, 4(%eax)		; X32-SSE1-NEXT: movl $0, 4(%eax)
; X32-SSE1-NEXT: retl		; X32-SSE1-NEXT: retl
;		;
; X32-SSE41-LABEL: merge_4i32_i32_combine:		; X32-SSE41-LABEL: merge_4i32_i32_combine:
; X32-SSE41: # BB#0:		; X32-SSE41: # BB#0:
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-SSE41-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-SSE41-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-SSE41-NEXT: movdqa %xmm0, (%eax)		; X32-SSE41-NEXT: movaps %xmm0, (%eax)
; X32-SSE41-NEXT: retl		; X32-SSE41-NEXT: retl
%1 = getelementptr i32, i32* %src, i32 0		%1 = getelementptr i32, i32* %src, i32 0
%2 = load i32, i32* %1		%2 = load i32, i32* %1
%3 = insertelement <4 x i32> undef, i32 %2, i32 0		%3 = insertelement <4 x i32> undef, i32 %2, i32 0
%4 = shufflevector <4 x i32> %3, <4 x i32> undef, <4 x i32> zeroinitializer		%4 = shufflevector <4 x i32> %3, <4 x i32> undef, <4 x i32> zeroinitializer
%5 = lshr <4 x i32> %4, <i32 0, i32 undef, i32 undef, i32 undef>		%5 = lshr <4 x i32> %4, <i32 0, i32 undef, i32 undef, i32 undef>
%6 = and <4 x i32> %5, <i32 -1, i32 0, i32 0, i32 0>		%6 = and <4 x i32> %5, <i32 -1, i32 0, i32 0, i32 0>
store <4 x i32> %6, <4 x i32>* %dst		store <4 x i32> %6, <4 x i32>* %dst
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

test/CodeGen/X86/merge-consecutive-loads-256.ll

Show First 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	; X32-AVX-NEXT: retl
%res2 = insertelement <4 x i64> %res1, i64 %val2, i32 2		%res2 = insertelement <4 x i64> %res1, i64 %val2, i32 2
%res3 = insertelement <4 x i64> %res2, i64 %val3, i32 3		%res3 = insertelement <4 x i64> %res2, i64 %val3, i32 3
ret <4 x i64> %res3		ret <4 x i64> %res3
}		}

define <4 x i64> @merge_4i64_i64_1zzu(i64* %ptr) nounwind uwtable noinline ssp {		define <4 x i64> @merge_4i64_i64_1zzu(i64* %ptr) nounwind uwtable noinline ssp {
; AVX-LABEL: merge_4i64_i64_1zzu:		; AVX-LABEL: merge_4i64_i64_1zzu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-AVX-LABEL: merge_4i64_i64_1zzu:		; X32-AVX-LABEL: merge_4i64_i64_1zzu:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X32-AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i64, i64* %ptr, i64 1		%ptr0 = getelementptr inbounds i64, i64* %ptr, i64 1
%val0 = load i64, i64* %ptr0		%val0 = load i64, i64* %ptr0
%res0 = insertelement <4 x i64> undef, i64 %val0, i32 0		%res0 = insertelement <4 x i64> undef, i64 %val0, i32 0
%res1 = insertelement <4 x i64> %res0, i64 0, i32 1		%res1 = insertelement <4 x i64> %res0, i64 0, i32 1
%res2 = insertelement <4 x i64> %res1, i64 0, i32 2		%res2 = insertelement <4 x i64> %res1, i64 0, i32 2
ret <4 x i64> %res2		ret <4 x i64> %res2
}		}
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr1 = getelementptr inbounds <4 x i32>, <4 x i32>* %ptr, i64 3		%ptr1 = getelementptr inbounds <4 x i32>, <4 x i32>* %ptr, i64 3
%val1 = load <4 x i32>, <4 x i32>* %ptr1		%val1 = load <4 x i32>, <4 x i32>* %ptr1
%res = shufflevector <4 x i32> zeroinitializer, <4 x i32> %val1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%res = shufflevector <4 x i32> zeroinitializer, <4 x i32> %val1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i32> %res		ret <8 x i32> %res
}		}

define <8 x i32> @merge_8i32_i32_56zz9uzz(i32* %ptr) nounwind uwtable noinline ssp {		define <8 x i32> @merge_8i32_i32_56zz9uzz(i32* %ptr) nounwind uwtable noinline ssp {
; AVX1-LABEL: merge_8i32_i32_56zz9uzz:		; AVX-LABEL: merge_8i32_i32_56zz9uzz:
; AVX1: # BB#0:		; AVX: # BB#0:
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX1-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX-NEXT: retq
;
; AVX2-LABEL: merge_8i32_i32_56zz9uzz:
; AVX2: # BB#0:
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX2-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX2-NEXT: retq
;
; AVX512F-LABEL: merge_8i32_i32_56zz9uzz:
; AVX512F: # BB#0:
; AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX512F-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX512F-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX512F-NEXT: retq
;		;
; X32-AVX-LABEL: merge_8i32_i32_56zz9uzz:		; X32-AVX-LABEL: merge_8i32_i32_56zz9uzz:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X32-AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-AVX-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero		; X32-AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; X32-AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; X32-AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 5		%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 5
%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 6		%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 6
%ptr4 = getelementptr inbounds i32, i32* %ptr, i64 9		%ptr4 = getelementptr inbounds i32, i32* %ptr, i64 9
%val0 = load i32, i32* %ptr0		%val0 = load i32, i32* %ptr0
%val1 = load i32, i32* %ptr1		%val1 = load i32, i32* %ptr1
%val4 = load i32, i32* %ptr4		%val4 = load i32, i32* %ptr4
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; X32-AVX-NEXT: retl
%res5 = insertelement <8 x i32> %res4, i32 0, i32 5		%res5 = insertelement <8 x i32> %res4, i32 0, i32 5
%res7 = insertelement <8 x i32> %res5, i32 %val7, i32 7		%res7 = insertelement <8 x i32> %res5, i32 %val7, i32 7
ret <8 x i32> %res7		ret <8 x i32> %res7
}		}

define <16 x i16> @merge_16i16_i16_89zzzuuuuuuuuuuuz(i16* %ptr) nounwind uwtable noinline ssp {		define <16 x i16> @merge_16i16_i16_89zzzuuuuuuuuuuuz(i16* %ptr) nounwind uwtable noinline ssp {
; AVX-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:		; AVX-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-AVX-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:		; X32-AVX-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 8		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 8
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 9		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 9
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%res0 = insertelement <16 x i16> undef, i16 %val0, i16 0		%res0 = insertelement <16 x i16> undef, i16 %val0, i16 0
%res1 = insertelement <16 x i16> %res0, i16 %val1, i16 1		%res1 = insertelement <16 x i16> %res0, i16 %val1, i16 1
%res2 = insertelement <16 x i16> %res1, i16 0, i16 2		%res2 = insertelement <16 x i16> %res1, i16 0, i16 2
%res3 = insertelement <16 x i16> %res2, i16 0, i16 3		%res3 = insertelement <16 x i16> %res2, i16 0, i16 3
%res4 = insertelement <16 x i16> %res3, i16 0, i16 4		%res4 = insertelement <16 x i16> %res3, i16 0, i16 4
%resF = insertelement <16 x i16> %res4, i16 0, i16 15		%resF = insertelement <16 x i16> %res4, i16 0, i16 15
ret <16 x i16> %resF		ret <16 x i16> %resF
}		}

define <16 x i16> @merge_16i16_i16_45u7uuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {		define <16 x i16> @merge_16i16_i16_45u7uuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {
; AVX-LABEL: merge_16i16_i16_45u7uuuuuuuuuuuu:		; AVX-LABEL: merge_16i16_i16_45u7uuuuuuuuuuuu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-AVX-LABEL: merge_16i16_i16_45u7uuuuuuuuuuuu:		; X32-AVX-LABEL: merge_16i16_i16_45u7uuuuuuuuuuuu:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X32-AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 5		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 5
%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 7		%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 7
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%val3 = load i16, i16* %ptr3		%val3 = load i16, i16* %ptr3
%res0 = insertelement <16 x i16> undef, i16 %val0, i16 0		%res0 = insertelement <16 x i16> undef, i16 %val0, i16 0
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	; X32-AVX-NEXT: retl
%resE = insertelement <16 x i16> %resD, i16 %valE, i16 14		%resE = insertelement <16 x i16> %resD, i16 %valE, i16 14
%resF = insertelement <16 x i16> %resE, i16 %valF, i16 15		%resF = insertelement <16 x i16> %resE, i16 %valF, i16 15
ret <16 x i16> %resF		ret <16 x i16> %resF
}		}

define <32 x i8> @merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {		define <32 x i8> @merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {
; AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:		; AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:		; X32-AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 4		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 4
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 5		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 5
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 7		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 7
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0		%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0
%res1 = insertelement <32 x i8> %res0, i8 %val1, i8 1		%res1 = insertelement <32 x i8> %res0, i8 %val1, i8 1
%res3 = insertelement <32 x i8> %res1, i8 %val3, i8 3		%res3 = insertelement <32 x i8> %res1, i8 %val3, i8 3
ret <32 x i8> %res3		ret <32 x i8> %res3
}		}

define <32 x i8> @merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {		define <32 x i8> @merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {
; AVX-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; AVX-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-AVX-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; X32-AVX-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 2		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 2
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 3		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 3
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 5		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 5
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0		%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

test/CodeGen/X86/merge-consecutive-loads-512.ll

Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	; X32-AVX512F-NEXT: retl
%resE = insertelement <16 x float> %resD, float %valE, i32 14		%resE = insertelement <16 x float> %resD, float %valE, i32 14
%resF = insertelement <16 x float> %resE, float %valF, i32 15		%resF = insertelement <16 x float> %resE, float %valF, i32 15
ret <16 x float> %resF		ret <16 x float> %resF
}		}

define <16 x i32> @merge_16i32_i32_12zzzuuuuuuuuuuuz(i32* %ptr) nounwind uwtable noinline ssp {		define <16 x i32> @merge_16i32_i32_12zzzuuuuuuuuuuuz(i32* %ptr) nounwind uwtable noinline ssp {
; ALL-LABEL: merge_16i32_i32_12zzzuuuuuuuuuuuz:		; ALL-LABEL: merge_16i32_i32_12zzzuuuuuuuuuuuz:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; ALL-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; ALL-NEXT: retq		; ALL-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_16i32_i32_12zzzuuuuuuuuuuuz:		; X32-AVX512F-LABEL: merge_16i32_i32_12zzzuuuuuuuuuuuz:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X32-AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 1		%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 1
%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 2		%ptr1 = getelementptr inbounds i32, i32* %ptr, i64 2
%val0 = load i32, i32* %ptr0		%val0 = load i32, i32* %ptr0
%val1 = load i32, i32* %ptr1		%val1 = load i32, i32* %ptr1
%res0 = insertelement <16 x i32> undef, i32 %val0, i32 0		%res0 = insertelement <16 x i32> undef, i32 %val0, i32 0
%res1 = insertelement <16 x i32> %res0, i32 %val1, i32 1		%res1 = insertelement <16 x i32> %res0, i32 %val1, i32 1
%res2 = insertelement <16 x i32> %res1, i32 0, i32 2		%res2 = insertelement <16 x i32> %res1, i32 0, i32 2
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	; X32-AVX512F-NEXT: retl
%resE = insertelement <16 x i32> %resD, i32 %valE, i32 14		%resE = insertelement <16 x i32> %resD, i32 %valE, i32 14
%resF = insertelement <16 x i32> %resE, i32 %valF, i32 15		%resF = insertelement <16 x i32> %resE, i32 %valF, i32 15
ret <16 x i32> %resF		ret <16 x i32> %resF
}		}

define <32 x i16> @merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz(i16* %ptr) nounwind uwtable noinline ssp {		define <32 x i16> @merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz(i16* %ptr) nounwind uwtable noinline ssp {
; AVX512F-LABEL: merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz:		; AVX512F-LABEL: merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz:		; AVX512BW-LABEL: merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz:
; AVX512BW: # BB#0:		; AVX512BW: # BB#0:
; AVX512BW-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX512BW-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz:		; X32-AVX512F-LABEL: merge_32i16_i16_12u4uuuuuuuuuuuuuuuuuuuuuuuuuuzz:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X32-AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 1		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 1
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 2		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 2
%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 4		%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 4
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%val3 = load i16, i16* %ptr3		%val3 = load i16, i16* %ptr3
%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0		%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0
%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1		%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1
%res3 = insertelement <32 x i16> %res1, i16 %val3, i16 3		%res3 = insertelement <32 x i16> %res1, i16 %val3, i16 3
%res30 = insertelement <32 x i16> %res3, i16 0, i16 30		%res30 = insertelement <32 x i16> %res3, i16 0, i16 30
%res31 = insertelement <32 x i16> %res30, i16 0, i16 31		%res31 = insertelement <32 x i16> %res30, i16 0, i16 31
ret <32 x i16> %res31		ret <32 x i16> %res31
}		}

define <32 x i16> @merge_32i16_i16_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {		define <32 x i16> @merge_32i16_i16_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {
; ALL-LABEL: merge_32i16_i16_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:		; ALL-LABEL: merge_32i16_i16_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; ALL-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; ALL-NEXT: retq		; ALL-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_32i16_i16_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:		; X32-AVX512F-LABEL: merge_32i16_i16_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X32-AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 5		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 5
%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 7		%ptr3 = getelementptr inbounds i16, i16* %ptr, i64 7
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%val3 = load i16, i16* %ptr3		%val3 = load i16, i16* %ptr3
%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0		%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0
%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1		%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1
%res3 = insertelement <32 x i16> %res1, i16 %val3, i16 3		%res3 = insertelement <32 x i16> %res1, i16 %val3, i16 3
ret <32 x i16> %res3		ret <32 x i16> %res3
}		}

define <32 x i16> @merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {		define <32 x i16> @merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {
; AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX512F-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; AVX512BW-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX512BW: # BB#0:		; AVX512BW: # BB#0:
; AVX512BW-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX512BW-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; X32-AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-AVX512F-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 2		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 2
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 3		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 3
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0		%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0
%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1		%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1
%res3 = insertelement <32 x i16> %res1, i16 0, i16 3		%res3 = insertelement <32 x i16> %res1, i16 0, i16 3
%resE = insertelement <32 x i16> %res3, i16 0, i16 14		%resE = insertelement <32 x i16> %res3, i16 0, i16 14
%resF = insertelement <32 x i16> %resE, i16 0, i16 15		%resF = insertelement <32 x i16> %resE, i16 0, i16 15
%resG = insertelement <32 x i16> %resF, i16 0, i16 16		%resG = insertelement <32 x i16> %resF, i16 0, i16 16
%resH = insertelement <32 x i16> %resG, i16 0, i16 17		%resH = insertelement <32 x i16> %resG, i16 0, i16 17
ret <32 x i16> %resH		ret <32 x i16> %resH
}		}

define <64 x i8> @merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz(i8* %ptr) nounwind uwtable noinline ssp {		define <64 x i8> @merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz(i8* %ptr) nounwind uwtable noinline ssp {
; AVX512F-LABEL: merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; AVX512F-LABEL: merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; AVX512BW-LABEL: merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; AVX512BW: # BB#0:		; AVX512BW: # BB#0:
; AVX512BW-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX512BW-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; X32-AVX512F-LABEL: merge_64i8_i8_12u4uuu8uuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X32-AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 1		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 1
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 2		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 2
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 4		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 4
%ptr7 = getelementptr inbounds i8, i8* %ptr, i64 8		%ptr7 = getelementptr inbounds i8, i8* %ptr, i64 8
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
Show All 9 Lines	; X32-AVX512F-NEXT: retl
%res17 = insertelement <64 x i8> %res16, i8 0, i8 17		%res17 = insertelement <64 x i8> %res16, i8 0, i8 17
%res63 = insertelement <64 x i8> %res17, i8 0, i8 63		%res63 = insertelement <64 x i8> %res17, i8 0, i8 63
ret <64 x i8> %res63		ret <64 x i8> %res63
}		}

define <64 x i8> @merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz(i8* %ptr) nounwind uwtable noinline ssp {		define <64 x i8> @merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz(i8* %ptr) nounwind uwtable noinline ssp {
; AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX512F-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; AVX512BW-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; AVX512BW: # BB#0:		; AVX512BW: # BB#0:
; AVX512BW-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX512BW-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; X32-AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-AVX512F-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1		; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 1		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 1
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 2		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 2
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 4		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 4
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

test/CodeGen/X86/mmx-arg-passing-x86-64.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X86-64			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X86-64
	;			;
	; On Darwin x86-64, v8i8, v4i16, v2i32 values are passed in XMM[0-7].			; On Darwin x86-64, v8i8, v4i16, v2i32 values are passed in XMM[0-7].
	; On Darwin x86-64, v1i64 values are passed in 64-bit GPRs.			; On Darwin x86-64, v1i64 values are passed in 64-bit GPRs.

	@g_v8qi = external global <8 x i8>			@g_v8qi = external global <8 x i8>

	define void @t3() nounwind {			define void @t3() nounwind {
	; X86-64-LABEL: t3:			; X86-64-LABEL: t3:
	; X86-64: ## BB#0:			; X86-64: ## BB#0:
	; X86-64-NEXT: movq _g_v8qi@{{.*}}(%rip), %rax			; X86-64-NEXT: movq _g_v8qi@{{.*}}(%rip), %rax
	; X86-64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X86-64-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X86-64-NEXT: movb $1, %al			; X86-64-NEXT: movb $1, %al
	; X86-64-NEXT: jmp _pass_v8qi ## TAILCALL			; X86-64-NEXT: jmp _pass_v8qi ## TAILCALL
	%tmp3 = load <8 x i8>, <8 x i8>* @g_v8qi, align 8			%tmp3 = load <8 x i8>, <8 x i8>* @g_v8qi, align 8
	%tmp3a = bitcast <8 x i8> %tmp3 to x86_mmx			%tmp3a = bitcast <8 x i8> %tmp3 to x86_mmx
	%tmp4 = tail call i32 (...) @pass_v8qi( x86_mmx %tmp3a ) nounwind			%tmp4 = tail call i32 (...) @pass_v8qi( x86_mmx %tmp3a ) nounwind
	ret void			ret void
	}			}

	Show All 34 Lines

test/CodeGen/X86/pr11334.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	entry:			entry:
	%f1 = fpext <8 x float> %v1 to <8 x double>			%f1 = fpext <8 x float> %v1 to <8 x double>
	ret <8 x double> %f1			ret <8 x double> %f1
	}			}

	define void @test_vector_creation() nounwind {			define void @test_vector_creation() nounwind {
	; SSE-LABEL: test_vector_creation:			; SSE-LABEL: test_vector_creation:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]			; SSE-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
	; SSE-NEXT: movdqa %xmm0, (%rax)			; SSE-NEXT: movdqa %xmm0, (%rax)
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: test_vector_creation:			; AVX-LABEL: test_vector_creation:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]			; AVX-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
	; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX-NEXT: vmovaps %ymm0, (%rax)			; AVX-NEXT: vmovaps %ymm0, (%rax)
	; AVX-NEXT: vzeroupper			; AVX-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%1 = insertelement <4 x double> undef, double 0.000000e+00, i32 2			%1 = insertelement <4 x double> undef, double 0.000000e+00, i32 2
	%2 = load double, double addrspace(1)* null			%2 = load double, double addrspace(1)* null
	%3 = insertelement <4 x double> %1, double %2, i32 3			%3 = insertelement <4 x double> %1, double %2, i32 3
	store <4 x double> %3, <4 x double>* undef			store <4 x double> %3, <4 x double>* undef
	ret void			ret void
	}			}

test/CodeGen/X86/pr2656.ll

	Show All 35 Lines

	; We can not fold the load from the stack into the 'and' instruction because			; We can not fold the load from the stack into the 'and' instruction because
	; that changes an 8-byte load into a 16-byte load (illegal memory access).			; that changes an 8-byte load into a 16-byte load (illegal memory access).
	; We can fold the load of the constant because it is a 16-byte vector constant.			; We can fold the load of the constant because it is a 16-byte vector constant.

	define double @PR22371(double %x) {			define double @PR22371(double %x) {
	; CHECK-LABEL: PR22371:			; CHECK-LABEL: PR22371:
	; CHECK: movsd 16(%esp), %xmm0			; CHECK: movsd 16(%esp), %xmm0
	; CHECK-NEXT: andpd LCPI1_0, %xmm0			; CHECK-NEXT: andps LCPI1_0, %xmm0
	; CHECK-NEXT: movlpd %xmm0, (%esp)			; CHECK-NEXT: movlps %xmm0, (%esp)
	%call = tail call double @fabs(double %x) #0			%call = tail call double @fabs(double %x) #0
	ret double %call			ret double %call
	}			}

	declare double @fabs(double) #0			declare double @fabs(double) #0
	attributes #0 = { readnone }			attributes #0 = { readnone }

test/CodeGen/X86/scalar-int-to-fp.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: s32_to_x			; CHECK-LABEL: s32_to_x
	; CHECK: fildl			; CHECK: fildl
	define x86_fp80 @s32_to_x(i32 %a) nounwind {			define x86_fp80 @s32_to_x(i32 %a) nounwind {
	%r = sitofp i32 %a to x86_fp80			%r = sitofp i32 %a to x86_fp80
	ret x86_fp80 %r			ret x86_fp80 %r
	}			}

	; CHECK-LABEL: u64_to_f			; CHECK-LABEL: u64_to_f
	; AVX512_32: vmovq {{.*#+}} xmm0 = mem[0],zero			; AVX512_32: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX512_32: vmovq %xmm0, {{[0-9]+}}(%esp)			; AVX512_32: vmovlps %xmm0, {{[0-9]+}}(%esp)
	; AVX512_32: fildll			; AVX512_32: fildll

	; AVX512_64: vcvtusi2ssq			; AVX512_64: vcvtusi2ssq

	; SSE2_32: movq {{.*#+}} xmm0 = mem[0],zero			; SSE2_32: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE2_32: movq %xmm0, {{[0-9]+}}(%esp)			; SSE2_32: movlps %xmm0, {{[0-9]+}}(%esp)
	; SSE2_32: fildll			; SSE2_32: fildll

	; SSE2_64: cvtsi2ssq			; SSE2_64: cvtsi2ssq
	; X87: fildll			; X87: fildll
	define float @u64_to_f(i64 %a) nounwind {			define float @u64_to_f(i64 %a) nounwind {
	%r = uitofp i64 %a to float			%r = uitofp i64 %a to float
	ret float %r			ret float %r
	}			}
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

test/CodeGen/X86/sse-fcopysign.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: pushl %ebp			; X32-NEXT: pushl %ebp
	; X32-NEXT: movl %esp, %ebp			; X32-NEXT: movl %esp, %ebp
	; X32-NEXT: andl $-8, %esp			; X32-NEXT: andl $-8, %esp
	; X32-NEXT: subl $8, %esp			; X32-NEXT: subl $8, %esp
	; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X32-NEXT: addss 20(%ebp), %xmm0			; X32-NEXT: addss 20(%ebp), %xmm0
	; X32-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; X32-NEXT: andpd {{\.LCPI.*}}, %xmm1			; X32-NEXT: andps {{\.LCPI.*}}, %xmm1
	; X32-NEXT: cvtss2sd %xmm0, %xmm0			; X32-NEXT: cvtss2sd %xmm0, %xmm0
	; X32-NEXT: andpd {{\.LCPI.*}}, %xmm0			; X32-NEXT: andps {{\.LCPI.*}}, %xmm0
	; X32-NEXT: orpd %xmm1, %xmm0			; X32-NEXT: orps %xmm1, %xmm0
	; X32-NEXT: movlpd %xmm0, (%esp)			; X32-NEXT: movlps %xmm0, (%esp)
	; X32-NEXT: fldl (%esp)			; X32-NEXT: fldl (%esp)
	; X32-NEXT: movl %ebp, %esp			; X32-NEXT: movl %ebp, %esp
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: int2:			; X64-LABEL: int2:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: addss %xmm2, %xmm1			; X64-NEXT: addss %xmm2, %xmm1
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/CodeGen/X86/sse-minmax.ll

Show First 20 Lines • Show All 773 Lines • ▼ Show 20 Lines	;
%d = select i1 %c, double -0.000000e+00, double %x		%d = select i1 %c, double -0.000000e+00, double %x
ret double %d		ret double %d
}		}

define double @oge_y(double %x) {		define double @oge_y(double %x) {
; STRICT-LABEL: oge_y:		; STRICT-LABEL: oge_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; STRICT-NEXT: movapd %xmm1, %xmm2		; STRICT-NEXT: movaps %xmm1, %xmm2
; STRICT-NEXT: cmplesd %xmm0, %xmm2		; STRICT-NEXT: cmplesd %xmm0, %xmm2
; STRICT-NEXT: andpd %xmm2, %xmm0		; STRICT-NEXT: andps %xmm2, %xmm0
; STRICT-NEXT: andnpd %xmm1, %xmm2		; STRICT-NEXT: andnps %xmm1, %xmm2
; STRICT-NEXT: orpd %xmm2, %xmm0		; STRICT-NEXT: orps %xmm2, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; RELAX-LABEL: oge_y:		; RELAX-LABEL: oge_y:
; RELAX: # BB#0:		; RELAX: # BB#0:
; RELAX-NEXT: maxsd {{.*}}(%rip), %xmm0		; RELAX-NEXT: maxsd {{.*}}(%rip), %xmm0
; RELAX-NEXT: retq		; RELAX-NEXT: retq
;		;
%c = fcmp oge double %x, -0.000000e+00		%c = fcmp oge double %x, -0.000000e+00
%d = select i1 %c, double %x, double -0.000000e+00		%d = select i1 %c, double %x, double -0.000000e+00
ret double %d		ret double %d
}		}

define double @ole_y(double %x) {		define double @ole_y(double %x) {
; STRICT-LABEL: ole_y:		; STRICT-LABEL: ole_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; STRICT-NEXT: movapd %xmm0, %xmm1		; STRICT-NEXT: movaps %xmm0, %xmm1
; STRICT-NEXT: cmplesd %xmm2, %xmm1		; STRICT-NEXT: cmplesd %xmm2, %xmm1
; STRICT-NEXT: andpd %xmm1, %xmm0		; STRICT-NEXT: andps %xmm1, %xmm0
; STRICT-NEXT: andnpd %xmm2, %xmm1		; STRICT-NEXT: andnps %xmm2, %xmm1
; STRICT-NEXT: orpd %xmm0, %xmm1		; STRICT-NEXT: orps %xmm0, %xmm1
; STRICT-NEXT: movapd %xmm1, %xmm0		; STRICT-NEXT: movaps %xmm1, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; RELAX-LABEL: ole_y:		; RELAX-LABEL: ole_y:
; RELAX: # BB#0:		; RELAX: # BB#0:
; RELAX-NEXT: minsd {{.*}}(%rip), %xmm0		; RELAX-NEXT: minsd {{.*}}(%rip), %xmm0
; RELAX-NEXT: retq		; RELAX-NEXT: retq
;		;
%c = fcmp ole double %x, -0.000000e+00		%c = fcmp ole double %x, -0.000000e+00
%d = select i1 %c, double %x, double -0.000000e+00		%d = select i1 %c, double %x, double -0.000000e+00
ret double %d		ret double %d
}		}

define double @oge_inverse_y(double %x) {		define double @oge_inverse_y(double %x) {
; STRICT-LABEL: oge_inverse_y:		; STRICT-LABEL: oge_inverse_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; STRICT-NEXT: movapd %xmm2, %xmm1		; STRICT-NEXT: movaps %xmm2, %xmm1
; STRICT-NEXT: cmplesd %xmm0, %xmm1		; STRICT-NEXT: cmplesd %xmm0, %xmm1
; STRICT-NEXT: andpd %xmm1, %xmm2		; STRICT-NEXT: andps %xmm1, %xmm2
; STRICT-NEXT: andnpd %xmm0, %xmm1		; STRICT-NEXT: andnps %xmm0, %xmm1
; STRICT-NEXT: orpd %xmm2, %xmm1		; STRICT-NEXT: orps %xmm2, %xmm1
; STRICT-NEXT: movapd %xmm1, %xmm0		; STRICT-NEXT: movaps %xmm1, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; UNSAFE-LABEL: oge_inverse_y:		; UNSAFE-LABEL: oge_inverse_y:
; UNSAFE: # BB#0:		; UNSAFE: # BB#0:
; UNSAFE-NEXT: minsd {{.*}}(%rip), %xmm0		; UNSAFE-NEXT: minsd {{.*}}(%rip), %xmm0
; UNSAFE-NEXT: retq		; UNSAFE-NEXT: retq
;		;
; FINITE-LABEL: oge_inverse_y:		; FINITE-LABEL: oge_inverse_y:
; FINITE: # BB#0:		; FINITE: # BB#0:
; FINITE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero		; FINITE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; FINITE-NEXT: minsd %xmm0, %xmm1		; FINITE-NEXT: minsd %xmm0, %xmm1
; FINITE-NEXT: movapd %xmm1, %xmm0		; FINITE-NEXT: movapd %xmm1, %xmm0
; FINITE-NEXT: retq		; FINITE-NEXT: retq
;		;
%c = fcmp oge double %x, -0.000000e+00		%c = fcmp oge double %x, -0.000000e+00
%d = select i1 %c, double -0.000000e+00, double %x		%d = select i1 %c, double -0.000000e+00, double %x
ret double %d		ret double %d
}		}

define double @ole_inverse_y(double %x) {		define double @ole_inverse_y(double %x) {
; STRICT-LABEL: ole_inverse_y:		; STRICT-LABEL: ole_inverse_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; STRICT-NEXT: movapd %xmm0, %xmm1		; STRICT-NEXT: movaps %xmm0, %xmm1
; STRICT-NEXT: cmplesd %xmm2, %xmm1		; STRICT-NEXT: cmplesd %xmm2, %xmm1
; STRICT-NEXT: andpd %xmm1, %xmm2		; STRICT-NEXT: andps %xmm1, %xmm2
; STRICT-NEXT: andnpd %xmm0, %xmm1		; STRICT-NEXT: andnps %xmm0, %xmm1
; STRICT-NEXT: orpd %xmm2, %xmm1		; STRICT-NEXT: orps %xmm2, %xmm1
; STRICT-NEXT: movapd %xmm1, %xmm0		; STRICT-NEXT: movaps %xmm1, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; UNSAFE-LABEL: ole_inverse_y:		; UNSAFE-LABEL: ole_inverse_y:
; UNSAFE: # BB#0:		; UNSAFE: # BB#0:
; UNSAFE-NEXT: maxsd {{.*}}(%rip), %xmm0		; UNSAFE-NEXT: maxsd {{.*}}(%rip), %xmm0
; UNSAFE-NEXT: retq		; UNSAFE-NEXT: retq
;		;
; FINITE-LABEL: ole_inverse_y:		; FINITE-LABEL: ole_inverse_y:
; FINITE: # BB#0:		; FINITE: # BB#0:
; FINITE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero		; FINITE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; FINITE-NEXT: maxsd %xmm0, %xmm1		; FINITE-NEXT: maxsd %xmm0, %xmm1
; FINITE-NEXT: movapd %xmm1, %xmm0		; FINITE-NEXT: movapd %xmm1, %xmm0
; FINITE-NEXT: retq		; FINITE-NEXT: retq
;		;
%c = fcmp ole double %x, -0.000000e+00		%c = fcmp ole double %x, -0.000000e+00
%d = select i1 %c, double -0.000000e+00, double %x		%d = select i1 %c, double -0.000000e+00, double %x
ret double %d		ret double %d
}		}

define double @ugt_y(double %x) {		define double @ugt_y(double %x) {
; STRICT-LABEL: ugt_y:		; STRICT-LABEL: ugt_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; STRICT-NEXT: movapd %xmm0, %xmm1		; STRICT-NEXT: movaps %xmm0, %xmm1
; STRICT-NEXT: cmpnlesd %xmm2, %xmm1		; STRICT-NEXT: cmpnlesd %xmm2, %xmm1
; STRICT-NEXT: andpd %xmm1, %xmm0		; STRICT-NEXT: andps %xmm1, %xmm0
; STRICT-NEXT: andnpd %xmm2, %xmm1		; STRICT-NEXT: andnps %xmm2, %xmm1
; STRICT-NEXT: orpd %xmm0, %xmm1		; STRICT-NEXT: orps %xmm0, %xmm1
; STRICT-NEXT: movapd %xmm1, %xmm0		; STRICT-NEXT: movaps %xmm1, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; RELAX-LABEL: ugt_y:		; RELAX-LABEL: ugt_y:
; RELAX: # BB#0:		; RELAX: # BB#0:
; RELAX-NEXT: maxsd {{.*}}(%rip), %xmm0		; RELAX-NEXT: maxsd {{.*}}(%rip), %xmm0
; RELAX-NEXT: retq		; RELAX-NEXT: retq
;		;
%c = fcmp ugt double %x, -0.000000e+00		%c = fcmp ugt double %x, -0.000000e+00
%d = select i1 %c, double %x, double -0.000000e+00		%d = select i1 %c, double %x, double -0.000000e+00
ret double %d		ret double %d
}		}

define double @ult_y(double %x) {		define double @ult_y(double %x) {
; STRICT-LABEL: ult_y:		; STRICT-LABEL: ult_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; STRICT-NEXT: movapd %xmm1, %xmm2		; STRICT-NEXT: movaps %xmm1, %xmm2
; STRICT-NEXT: cmpnlesd %xmm0, %xmm2		; STRICT-NEXT: cmpnlesd %xmm0, %xmm2
; STRICT-NEXT: andpd %xmm2, %xmm0		; STRICT-NEXT: andps %xmm2, %xmm0
; STRICT-NEXT: andnpd %xmm1, %xmm2		; STRICT-NEXT: andnps %xmm1, %xmm2
; STRICT-NEXT: orpd %xmm2, %xmm0		; STRICT-NEXT: orps %xmm2, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; RELAX-LABEL: ult_y:		; RELAX-LABEL: ult_y:
; RELAX: # BB#0:		; RELAX: # BB#0:
; RELAX-NEXT: minsd {{.*}}(%rip), %xmm0		; RELAX-NEXT: minsd {{.*}}(%rip), %xmm0
; RELAX-NEXT: retq		; RELAX-NEXT: retq
;		;
%c = fcmp ult double %x, -0.000000e+00		%c = fcmp ult double %x, -0.000000e+00
%d = select i1 %c, double %x, double -0.000000e+00		%d = select i1 %c, double %x, double -0.000000e+00
ret double %d		ret double %d
}		}

define double @ugt_inverse_y(double %x) {		define double @ugt_inverse_y(double %x) {
; STRICT-LABEL: ugt_inverse_y:		; STRICT-LABEL: ugt_inverse_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; STRICT-NEXT: movapd %xmm0, %xmm1		; STRICT-NEXT: movaps %xmm0, %xmm1
; STRICT-NEXT: cmpnlesd %xmm2, %xmm1		; STRICT-NEXT: cmpnlesd %xmm2, %xmm1
; STRICT-NEXT: andpd %xmm1, %xmm2		; STRICT-NEXT: andps %xmm1, %xmm2
; STRICT-NEXT: andnpd %xmm0, %xmm1		; STRICT-NEXT: andnps %xmm0, %xmm1
; STRICT-NEXT: orpd %xmm2, %xmm1		; STRICT-NEXT: orps %xmm2, %xmm1
; STRICT-NEXT: movapd %xmm1, %xmm0		; STRICT-NEXT: movaps %xmm1, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; UNSAFE-LABEL: ugt_inverse_y:		; UNSAFE-LABEL: ugt_inverse_y:
; UNSAFE: # BB#0:		; UNSAFE: # BB#0:
; UNSAFE-NEXT: minsd {{.*}}(%rip), %xmm0		; UNSAFE-NEXT: minsd {{.*}}(%rip), %xmm0
; UNSAFE-NEXT: retq		; UNSAFE-NEXT: retq
;		;
; FINITE-LABEL: ugt_inverse_y:		; FINITE-LABEL: ugt_inverse_y:
; FINITE: # BB#0:		; FINITE: # BB#0:
; FINITE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero		; FINITE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; FINITE-NEXT: minsd %xmm0, %xmm1		; FINITE-NEXT: minsd %xmm0, %xmm1
; FINITE-NEXT: movapd %xmm1, %xmm0		; FINITE-NEXT: movapd %xmm1, %xmm0
; FINITE-NEXT: retq		; FINITE-NEXT: retq
;		;
%c = fcmp ugt double %x, -0.000000e+00		%c = fcmp ugt double %x, -0.000000e+00
%d = select i1 %c, double -0.000000e+00, double %x		%d = select i1 %c, double -0.000000e+00, double %x
ret double %d		ret double %d
}		}

define double @ult_inverse_y(double %x) {		define double @ult_inverse_y(double %x) {
; STRICT-LABEL: ult_inverse_y:		; STRICT-LABEL: ult_inverse_y:
; STRICT: # BB#0:		; STRICT: # BB#0:
; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; STRICT-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; STRICT-NEXT: movapd %xmm2, %xmm1		; STRICT-NEXT: movaps %xmm2, %xmm1
; STRICT-NEXT: cmpnlesd %xmm0, %xmm1		; STRICT-NEXT: cmpnlesd %xmm0, %xmm1
; STRICT-NEXT: andpd %xmm1, %xmm2		; STRICT-NEXT: andps %xmm1, %xmm2
; STRICT-NEXT: andnpd %xmm0, %xmm1		; STRICT-NEXT: andnps %xmm0, %xmm1
; STRICT-NEXT: orpd %xmm2, %xmm1		; STRICT-NEXT: orps %xmm2, %xmm1
; STRICT-NEXT: movapd %xmm1, %xmm0		; STRICT-NEXT: movaps %xmm1, %xmm0
; STRICT-NEXT: retq		; STRICT-NEXT: retq
;		;
; UNSAFE-LABEL: ult_inverse_y:		; UNSAFE-LABEL: ult_inverse_y:
; UNSAFE: # BB#0:		; UNSAFE: # BB#0:
; UNSAFE-NEXT: maxsd {{.*}}(%rip), %xmm0		; UNSAFE-NEXT: maxsd {{.*}}(%rip), %xmm0
; UNSAFE-NEXT: retq		; UNSAFE-NEXT: retq
;		;
; FINITE-LABEL: ult_inverse_y:		; FINITE-LABEL: ult_inverse_y:
▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %a0)		%res = call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %a0)
ret i64 %res		ret i64 %res
}		}
declare i64 @llvm.x86.sse2.cvttsd2si64(<2 x double>) nounwind readnone		declare i64 @llvm.x86.sse2.cvttsd2si64(<2 x double>) nounwind readnone

define <2 x i64> @test_mm_loadu_si64(i64* %a0) nounwind {		define <2 x i64> @test_mm_loadu_si64(i64* %a0) nounwind {
; X64-LABEL: test_mm_loadu_si64:		; X64-LABEL: test_mm_loadu_si64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; X64-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; X64-NEXT: retq		; X64-NEXT: retq
%ld = load i64, i64* %a0, align 1		%ld = load i64, i64* %a0, align 1
%res0 = insertelement <2 x i64> undef, i64 %ld, i32 0		%res0 = insertelement <2 x i64> undef, i64 %ld, i32 0
%res1 = insertelement <2 x i64> %res0, i64 0, i32 1		%res1 = insertelement <2 x i64> %res0, i64 0, i32 1
ret <2 x i64> %res1		ret <2 x i64> %res1
}		}

define void @test_mm_stream_si64(i64 *%a0, i64 %a1) {		define void @test_mm_stream_si64(i64 *%a0, i64 %a1) {
Show All 9 Lines

test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

Show First 20 Lines • Show All 1,269 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%cvt = sitofp i32 %a1 to double		%cvt = sitofp i32 %a1 to double
%res = insertelement <2 x double> %a0, double %cvt, i32 0		%res = insertelement <2 x double> %a0, double %cvt, i32 0
ret <2 x double> %res		ret <2 x double> %res
}		}

define <2 x i64> @test_mm_cvtsi32_si128(i32 %a0) nounwind {		define <2 x i64> @test_mm_cvtsi32_si128(i32 %a0) nounwind {
; X32-LABEL: test_mm_cvtsi32_si128:		; X32-LABEL: test_mm_cvtsi32_si128:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_cvtsi32_si128:		; X64-LABEL: test_mm_cvtsi32_si128:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: movd %edi, %xmm0		; X64-NEXT: movd %edi, %xmm0
; X64-NEXT: retq		; X64-NEXT: retq
%res0 = insertelement <4 x i32> undef, i32 %a0, i32 0		%res0 = insertelement <4 x i32> undef, i32 %a0, i32 0
%res1 = insertelement <4 x i32> %res0, i32 0, i32 1		%res1 = insertelement <4 x i32> %res0, i32 0, i32 1
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = insertelement <2 x double> %a0, double %ld, i32 1		%res = insertelement <2 x double> %a0, double %ld, i32 1
ret <2 x double> %res		ret <2 x double> %res
}		}

define <2 x i64> @test_mm_loadl_epi64(<2 x i64> %a0, <2 x i64>* %a1) nounwind {		define <2 x i64> @test_mm_loadl_epi64(<2 x i64> %a0, <2 x i64>* %a1) nounwind {
; X32-LABEL: test_mm_loadl_epi64:		; X32-LABEL: test_mm_loadl_epi64:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_loadl_epi64:		; X64-LABEL: test_mm_loadl_epi64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; X64-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; X64-NEXT: retq		; X64-NEXT: retq
%bc = bitcast <2 x i64>* %a1 to i64*		%bc = bitcast <2 x i64>* %a1 to i64*
%ld = load i64, i64* %bc, align 1		%ld = load i64, i64* %bc, align 1
%res0 = insertelement <2 x i64> undef, i64 %ld, i32 0		%res0 = insertelement <2 x i64> undef, i64 %ld, i32 0
%res1 = insertelement <2 x i64> %res0, i64 0, i32 1		%res1 = insertelement <2 x i64> %res0, i64 0, i32 1
ret <2 x i64> %res1		ret <2 x i64> %res1
}		}

▲ Show 20 Lines • Show All 781 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res0 = insertelement <2 x double> undef, double %a1, i32 0		%res0 = insertelement <2 x double> undef, double %a1, i32 0
%res1 = insertelement <2 x double> %res0, double %a0, i32 1		%res1 = insertelement <2 x double> %res0, double %a0, i32 1
ret <2 x double> %res1		ret <2 x double> %res1
}		}

define <2 x double> @test_mm_set_sd(double %a0) nounwind {		define <2 x double> @test_mm_set_sd(double %a0) nounwind {
; X32-LABEL: test_mm_set_sd:		; X32-LABEL: test_mm_set_sd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero		; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
; X32-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero		; X32-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_set_sd:		; X64-LABEL: test_mm_set_sd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero		; X64-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero
; X64-NEXT: retq		; X64-NEXT: retq
%res0 = insertelement <2 x double> undef, double %a0, i32 0		%res0 = insertelement <2 x double> undef, double %a0, i32 0
▲ Show 20 Lines • Show All 1,544 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse2 \| FileCheck %s

	define <2 x i64> @test_x86_sse2_psll_dq_bs(<2 x i64> %a0) {			define <2 x i64> @test_x86_sse2_psll_dq_bs(<2 x i64> %a0) {
	; CHECK-LABEL: test_x86_sse2_psll_dq_bs:			; CHECK-LABEL: test_x86_sse2_psll_dq_bs:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8]			; CHECK-NEXT: pslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8]
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%res = call <2 x i64> @llvm.x86.sse2.psll.dq.bs(<2 x i64> %a0, i32 7) ; <<2 x i64>> [#uses=1]			%res = call <2 x i64> @llvm.x86.sse2.psll.dq.bs(<2 x i64> %a0, i32 7) ; <<2 x i64>> [#uses=1]
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind			declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind


	define void @test_x86_sse2_storeu_pd(i8* %a0, <2 x double> %a1) {			define void @test_x86_sse2_storeu_pd(i8* %a0, <2 x double> %a1) {
	; fadd operation forces the execution domain.			; fadd operation forces the execution domain.
	; CHECK-LABEL: test_x86_sse2_storeu_pd:			; CHECK-LABEL: test_x86_sse2_storeu_pd:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; CHECK-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
	; CHECK-NEXT: pslldq {{.*#+}} xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]			; CHECK-NEXT: pslldq {{.*#+}} xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
	; CHECK-NEXT: addpd %xmm0, %xmm1			; CHECK-NEXT: addpd %xmm0, %xmm1
	; CHECK-NEXT: movupd %xmm1, (%eax)			; CHECK-NEXT: movupd %xmm1, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%a2 = fadd <2 x double> %a1, <double 0x0, double 0x4200000000000000>			%a2 = fadd <2 x double> %a1, <double 0x0, double 0x4200000000000000>
	call void @llvm.x86.sse2.storeu.pd(i8* %a0, <2 x double> %a2)			call void @llvm.x86.sse2.storeu.pd(i8* %a0, <2 x double> %a2)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2.ll

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retl
ret void		ret void
}		}

define <4 x i32> @test5(i8** %ptr) nounwind {		define <4 x i32> @test5(i8** %ptr) nounwind {
; CHECK-LABEL: test5:		; CHECK-LABEL: test5:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax		; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: movl (%eax), %eax		; CHECK-NEXT: movl (%eax), %eax
; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; CHECK-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; CHECK-NEXT: pxor %xmm0, %xmm0		; CHECK-NEXT: pxor %xmm0, %xmm0
; CHECK-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; CHECK-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; CHECK-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; CHECK-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; CHECK-NEXT: retl		; CHECK-NEXT: retl
%tmp = load i8, i8* %ptr ; <i8*> [#uses=1]		%tmp = load i8, i8* %ptr ; <i8*> [#uses=1]
%tmp.upgrd.1 = bitcast i8* %tmp to float* ; <float*> [#uses=1]		%tmp.upgrd.1 = bitcast i8* %tmp to float* ; <float*> [#uses=1]
%tmp.upgrd.2 = load float, float* %tmp.upgrd.1 ; <float> [#uses=1]		%tmp.upgrd.2 = load float, float* %tmp.upgrd.1 ; <float> [#uses=1]
%tmp.upgrd.3 = insertelement <4 x float> undef, float %tmp.upgrd.2, i32 0 ; <<4 x float>> [#uses=1]		%tmp.upgrd.3 = insertelement <4 x float> undef, float %tmp.upgrd.2, i32 0 ; <<4 x float>> [#uses=1]
▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

test/CodeGen/X86/uint64-to-float.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-apple-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-apple-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-apple-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-apple-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64

	; Verify that we are using the efficient uitofp --> sitofp lowering illustrated			; Verify that we are using the efficient uitofp --> sitofp lowering illustrated
	; by the compiler_rt implementation of __floatundisf.			; by the compiler_rt implementation of __floatundisf.
	; <rdar://problem/8493982>			; <rdar://problem/8493982>

	define float @test(i64 %a) nounwind {			define float @test(i64 %a) nounwind {
	; X86-LABEL: test:			; X86-LABEL: test:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %ebp			; X86-NEXT: pushl %ebp
	; X86-NEXT: movl %esp, %ebp			; X86-NEXT: movl %esp, %ebp
	; X86-NEXT: andl $-8, %esp			; X86-NEXT: andl $-8, %esp
	; X86-NEXT: subl $16, %esp			; X86-NEXT: subl $16, %esp
	; X86-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X86-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X86-NEXT: movq %xmm0, {{[0-9]+}}(%esp)			; X86-NEXT: movlps %xmm0, {{[0-9]+}}(%esp)
	; X86-NEXT: xorl %eax, %eax			; X86-NEXT: xorl %eax, %eax
	; X86-NEXT: cmpl $0, 12(%ebp)			; X86-NEXT: cmpl $0, 12(%ebp)
	; X86-NEXT: setns %al			; X86-NEXT: setns %al
	; X86-NEXT: fildll {{[0-9]+}}(%esp)			; X86-NEXT: fildll {{[0-9]+}}(%esp)
	; X86-NEXT: fadds {{\.LCPI.*}}(,%eax,4)			; X86-NEXT: fadds {{\.LCPI.*}}(,%eax,4)
	; X86-NEXT: fstps {{[0-9]+}}(%esp)			; X86-NEXT: fstps {{[0-9]+}}(%esp)
	; X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-NEXT: movss %xmm0, (%esp)			; X86-NEXT: movss %xmm0, (%esp)
	Show All 24 Lines

test/CodeGen/X86/uint_to_fp-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-unknown-unknown -march=x86 -mattr=+sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=i386-unknown-unknown -march=x86 -mattr=+sse2 \| FileCheck %s

	; rdar://6504833			; rdar://6504833
	define float @test1(i32 %x) nounwind readnone {			define float @test1(i32 %x) nounwind readnone {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: pushl %eax			; CHECK-NEXT: pushl %eax
	; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; CHECK-NEXT: por %xmm0, %xmm1			; CHECK-NEXT: orpd %xmm0, %xmm1
	; CHECK-NEXT: subsd %xmm0, %xmm1			; CHECK-NEXT: subsd %xmm0, %xmm1
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: cvtsd2ss %xmm1, %xmm0			; CHECK-NEXT: cvtsd2ss %xmm1, %xmm0
	; CHECK-NEXT: movss %xmm0, (%esp)			; CHECK-NEXT: movss %xmm0, (%esp)
	; CHECK-NEXT: flds (%esp)			; CHECK-NEXT: flds (%esp)
	; CHECK-NEXT: popl %eax			; CHECK-NEXT: popl %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	Show All 25 Lines

test/CodeGen/X86/vec_extract-avx.ll

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	; X32-NEXT: vxorps %ymm1, %ymm1, %ymm1			; X32-NEXT: vxorps %ymm1, %ymm1, %ymm1
	; X32-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]			; X32-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
	; X32-NEXT: vmovaps %ymm0, (%eax)			; X32-NEXT: vmovaps %ymm0, (%eax)
	; X32-NEXT: vzeroupper			; X32-NEXT: vzeroupper
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: legal_vzmovl_2i32_8i32:			; X64-LABEL: legal_vzmovl_2i32_8i32:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: vxorps %ymm1, %ymm1, %ymm1			; X64-NEXT: vxorps %ymm1, %ymm1, %ymm1
	; X64-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]			; X64-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
	; X64-NEXT: vmovaps %ymm0, (%rsi)			; X64-NEXT: vmovaps %ymm0, (%rsi)
	; X64-NEXT: vzeroupper			; X64-NEXT: vzeroupper
	; X64-NEXT: retq			; X64-NEXT: retq
	%ld = load <2 x i32>, <2 x i32>* %in, align 8			%ld = load <2 x i32>, <2 x i32>* %in, align 8
	%ext = extractelement <2 x i32> %ld, i64 0			%ext = extractelement <2 x i32> %ld, i64 0
	%ins = insertelement <8 x i32> <i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>, i32 %ext, i64 0			%ins = insertelement <8 x i32> <i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>, i32 %ext, i64 0
	Show All 35 Lines
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X32-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X32-NEXT: vmovaps %ymm0, (%eax)			; X32-NEXT: vmovaps %ymm0, (%eax)
	; X32-NEXT: vzeroupper			; X32-NEXT: vzeroupper
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: legal_vzmovl_2f32_8f32:			; X64-LABEL: legal_vzmovl_2f32_8f32:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: vxorps %ymm1, %ymm1, %ymm1			; X64-NEXT: vxorps %ymm1, %ymm1, %ymm1
	; X64-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]			; X64-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
	; X64-NEXT: vmovaps %ymm0, (%rsi)			; X64-NEXT: vmovaps %ymm0, (%rsi)
	; X64-NEXT: vzeroupper			; X64-NEXT: vzeroupper
	; X64-NEXT: retq			; X64-NEXT: retq
	%ld = load <2 x float>, <2 x float>* %in, align 8			%ld = load <2 x float>, <2 x float>* %in, align 8
	%ext = extractelement <2 x float> %ld, i64 0			%ext = extractelement <2 x float> %ld, i64 0
	%ins = insertelement <8 x float> <float undef, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>, float %ext, i64 0			%ins = insertelement <8 x float> <float undef, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>, float %ext, i64 0
	Show All 30 Lines

test/CodeGen/X86/vec_extract-mmx.ll

	Show All 10 Lines
	; X32-NEXT: subl $24, %esp			; X32-NEXT: subl $24, %esp
	; X32-NEXT: movl 8(%ebp), %eax			; X32-NEXT: movl 8(%ebp), %eax
	; X32-NEXT: movl (%eax), %ecx			; X32-NEXT: movl (%eax), %ecx
	; X32-NEXT: movl 4(%eax), %eax			; X32-NEXT: movl 4(%eax), %eax
	; X32-NEXT: movl %eax, {{[0-9]+}}(%esp)			; X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; X32-NEXT: movl %ecx, (%esp)			; X32-NEXT: movl %ecx, (%esp)
	; X32-NEXT: pshufw $238, (%esp), %mm0 # mm0 = mem[2,3,2,3]			; X32-NEXT: pshufw $238, (%esp), %mm0 # mm0 = mem[2,3,2,3]
	; X32-NEXT: movq %mm0, {{[0-9]+}}(%esp)			; X32-NEXT: movq %mm0, {{[0-9]+}}(%esp)
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; X32-NEXT: movd %xmm0, %eax			; X32-NEXT: movd %xmm0, %eax
	; X32-NEXT: addl $32, %eax			; X32-NEXT: addl $32, %eax
	; X32-NEXT: movl %ebp, %esp			; X32-NEXT: movl %ebp, %esp
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test0:			; X64-LABEL: test0:
	Show All 22 Lines
	; X32-NEXT: pushl %ebp			; X32-NEXT: pushl %ebp
	; X32-NEXT: movl %esp, %ebp			; X32-NEXT: movl %esp, %ebp
	; X32-NEXT: andl $-8, %esp			; X32-NEXT: andl $-8, %esp
	; X32-NEXT: subl $16, %esp			; X32-NEXT: subl $16, %esp
	; X32-NEXT: movl 8(%ebp), %eax			; X32-NEXT: movl 8(%ebp), %eax
	; X32-NEXT: movd (%eax), %mm0			; X32-NEXT: movd (%eax), %mm0
	; X32-NEXT: pshufw $232, %mm0, %mm0 # mm0 = mm0[0,2,2,3]			; X32-NEXT: pshufw $232, %mm0, %mm0 # mm0 = mm0[0,2,2,3]
	; X32-NEXT: movq %mm0, (%esp)			; X32-NEXT: movq %mm0, (%esp)
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; X32-NEXT: movd %xmm0, %eax			; X32-NEXT: movd %xmm0, %eax
	; X32-NEXT: emms			; X32-NEXT: emms
	; X32-NEXT: movl %ebp, %esp			; X32-NEXT: movl %ebp, %esp
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test1:			; X64-LABEL: test1:
	Show All 26 Lines
	; X32: # BB#0: # %entry			; X32: # BB#0: # %entry
	; X32-NEXT: pushl %ebp			; X32-NEXT: pushl %ebp
	; X32-NEXT: movl %esp, %ebp			; X32-NEXT: movl %esp, %ebp
	; X32-NEXT: andl $-8, %esp			; X32-NEXT: andl $-8, %esp
	; X32-NEXT: subl $16, %esp			; X32-NEXT: subl $16, %esp
	; X32-NEXT: movl 8(%ebp), %eax			; X32-NEXT: movl 8(%ebp), %eax
	; X32-NEXT: pshufw $232, (%eax), %mm0 # mm0 = mem[0,2,2,3]			; X32-NEXT: pshufw $232, (%eax), %mm0 # mm0 = mem[0,2,2,3]
	; X32-NEXT: movq %mm0, (%esp)			; X32-NEXT: movq %mm0, (%esp)
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; X32-NEXT: movd %xmm0, %eax			; X32-NEXT: movd %xmm0, %eax
	; X32-NEXT: emms			; X32-NEXT: emms
	; X32-NEXT: movl %ebp, %esp			; X32-NEXT: movl %ebp, %esp
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test2:			; X64-LABEL: test2:
	Show All 34 Lines
	define i32 @test4(x86_mmx %a) nounwind {			define i32 @test4(x86_mmx %a) nounwind {
	; X32-LABEL: test4:			; X32-LABEL: test4:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: pushl %ebp			; X32-NEXT: pushl %ebp
	; X32-NEXT: movl %esp, %ebp			; X32-NEXT: movl %esp, %ebp
	; X32-NEXT: andl $-8, %esp			; X32-NEXT: andl $-8, %esp
	; X32-NEXT: subl $8, %esp			; X32-NEXT: subl $8, %esp
	; X32-NEXT: movq %mm0, (%esp)			; X32-NEXT: movq %mm0, (%esp)
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,0,1]			; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,0,1]
	; X32-NEXT: movd %xmm0, %eax			; X32-NEXT: movd %xmm0, %eax
	; X32-NEXT: movl %ebp, %esp			; X32-NEXT: movl %ebp, %esp
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test4:			; X64-LABEL: test4:
	; X64: # BB#0:			; X64: # BB#0:
	Show All 12 Lines

test/CodeGen/X86/vec_i64.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X32			; RUN: llc < %s -mtriple=i386-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X32
	; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64

	; Used movq to load i64 into a v2i64 when the top i64 is 0.			; Used movq to load i64 into a v2i64 when the top i64 is 0.

	define <2 x i64> @foo1(i64* %y) nounwind {			define <2 x i64> @foo1(i64* %y) nounwind {
	; X32-LABEL: foo1:			; X32-LABEL: foo1:
	; X32: # BB#0: # %entry			; X32: # BB#0: # %entry
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: foo1:			; X64-LABEL: foo1:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%tmp1 = load i64, i64* %y, align 8			%tmp1 = load i64, i64* %y, align 8
	%s2v = insertelement <2 x i64> undef, i64 %tmp1, i32 0			%s2v = insertelement <2 x i64> undef, i64 %tmp1, i32 0
	%loadl = shufflevector <2 x i64> zeroinitializer, <2 x i64> %s2v, <2 x i32> <i32 2, i32 1>			%loadl = shufflevector <2 x i64> zeroinitializer, <2 x i64> %s2v, <2 x i32> <i32 2, i32 1>
	ret <2 x i64> %loadl			ret <2 x i64> %loadl
	}			}


	define <4 x float> @foo2(i64* %p) nounwind {			define <4 x float> @foo2(i64* %p) nounwind {
	; X32-LABEL: foo2:			; X32-LABEL: foo2:
	; X32: # BB#0: # %entry			; X32: # BB#0: # %entry
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: foo2:			; X64-LABEL: foo2:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%load = load i64, i64* %p			%load = load i64, i64* %p
	%s2v = insertelement <2 x i64> undef, i64 %load, i32 0			%s2v = insertelement <2 x i64> undef, i64 %load, i32 0
	%loadl = shufflevector <2 x i64> zeroinitializer, <2 x i64> %s2v, <2 x i32> <i32 2, i32 1>			%loadl = shufflevector <2 x i64> zeroinitializer, <2 x i64> %s2v, <2 x i32> <i32 2, i32 1>
	%0 = bitcast <2 x i64> %loadl to <4 x float>			%0 = bitcast <2 x i64> %loadl to <4 x float>
	ret <4 x float> %0			ret <4 x float> %0
	}			}

test/CodeGen/X86/vec_insert-2.ll

	Show All 17 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp1 = insertelement <4 x float> %tmp, float %s, i32 3			%tmp1 = insertelement <4 x float> %tmp, float %s, i32 3
	ret <4 x float> %tmp1			ret <4 x float> %tmp1
	}			}

	define <4 x i32> @t2(i32 %s, <4 x i32> %tmp) nounwind {			define <4 x i32> @t2(i32 %s, <4 x i32> %tmp) nounwind {
	; X32-LABEL: t2:			; X32-LABEL: t2:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X32-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[2,0]			; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[2,0]
	; X32-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,0]			; X32-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,0]
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: t2:			; X64-LABEL: t2:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movd %edi, %xmm1			; X64-NEXT: movd %edi, %xmm1
	; X64-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[2,0]			; X64-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[2,0]
	Show All 34 Lines

test/CodeGen/X86/vec_insert-3.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse2,-sse4.1 \| FileCheck %s --check-prefix=X32			; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse2,-sse4.1 \| FileCheck %s --check-prefix=X32
	; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2,-sse4.1 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2,-sse4.1 \| FileCheck %s --check-prefix=X64

	define <2 x i64> @t1(i64 %s, <2 x i64> %tmp) nounwind {			define <2 x i64> @t1(i64 %s, <2 x i64> %tmp) nounwind {
	; X32-LABEL: t1:			; X32-LABEL: t1:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X32-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[3,0]			; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[3,0]
	; X32-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,2]			; X32-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,2]
	; X32-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X32-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[2,0]			; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[2,0]
	; X32-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,0]			; X32-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,0]
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: t1:			; X64-LABEL: t1:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movd %rdi, %xmm1			; X64-NEXT: movd %rdi, %xmm1
	; X64-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; X64-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp1 = insertelement <2 x i64> %tmp, i64 %s, i32 1			%tmp1 = insertelement <2 x i64> %tmp, i64 %s, i32 1
	ret <2 x i64> %tmp1			ret <2 x i64> %tmp1
	}			}

test/CodeGen/X86/vec_insert-mmx.ll

Show All 23 Lines	; X64-NEXT: retq
%tmp3 = insertelement <2 x i32> < i32 0, i32 undef >, i32 %A, i32 1		%tmp3 = insertelement <2 x i32> < i32 0, i32 undef >, i32 %A, i32 1
%tmp4 = bitcast <2 x i32> %tmp3 to x86_mmx		%tmp4 = bitcast <2 x i32> %tmp3 to x86_mmx
ret x86_mmx %tmp4		ret x86_mmx %tmp4
}		}

define <8 x i8> @t1(i8 zeroext %x) nounwind {		define <8 x i8> @t1(i8 zeroext %x) nounwind {
; X32-LABEL: t1:		; X32-LABEL: t1:
; X32: ## BB#0:		; X32: ## BB#0:
; X32-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: t1:		; X64-LABEL: t1:
; X64: ## BB#0:		; X64: ## BB#0:
; X64-NEXT: movd %edi, %xmm0		; X64-NEXT: movd %edi, %xmm0
; X64-NEXT: retq		; X64-NEXT: retq
%r = insertelement <8 x i8> undef, i8 %x, i32 0		%r = insertelement <8 x i8> undef, i8 %x, i32 0
ret <8 x i8> %r		ret <8 x i8> %r
Show All 19 Lines
@g1 = external global <4 x i16>		@g1 = external global <4 x i16>

; PR2562		; PR2562
define void @t3() {		define void @t3() {
; X32-LABEL: t3:		; X32-LABEL: t3:
; X32: ## BB#0:		; X32: ## BB#0:
; X32-NEXT: movl L_g0$non_lazy_ptr, %eax		; X32-NEXT: movl L_g0$non_lazy_ptr, %eax
; X32-NEXT: movl L_g1$non_lazy_ptr, %ecx		; X32-NEXT: movl L_g1$non_lazy_ptr, %ecx
; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero		; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
; X32-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; X32-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
; X32-NEXT: movzwl (%eax), %eax		; X32-NEXT: movzwl (%eax), %eax
; X32-NEXT: movd %eax, %xmm1		; X32-NEXT: movd %eax, %xmm1
; X32-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]		; X32-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
; X32-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]		; X32-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
; X32-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,6,6,7]		; X32-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,6,6,7]
; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]		; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; X32-NEXT: movq %xmm0, (%ecx)		; X32-NEXT: movq %xmm0, (%ecx)
Show All 18 Lines

test/CodeGen/X86/vec_int_to_fp.ll

	Show First 20 Lines • Show All 2,970 Lines • ▼ Show 20 Lines
	; VEX-NEXT: vpsrld $16, %xmm0, %xmm0			; VEX-NEXT: vpsrld $16, %xmm0, %xmm0
	; VEX-NEXT: vcvtdq2pd %xmm0, %xmm0			; VEX-NEXT: vcvtdq2pd %xmm0, %xmm0
	; VEX-NEXT: vmulpd {{.*}}(%rip), %xmm0, %xmm0			; VEX-NEXT: vmulpd {{.*}}(%rip), %xmm0, %xmm0
	; VEX-NEXT: vaddpd %xmm1, %xmm0, %xmm0			; VEX-NEXT: vaddpd %xmm1, %xmm0, %xmm0
	; VEX-NEXT: retq			; VEX-NEXT: retq
	;			;
	; AVX512F-LABEL: uitofp_load_2i32_to_2f64:			; AVX512F-LABEL: uitofp_load_2i32_to_2f64:
	; AVX512F: # BB#0:			; AVX512F: # BB#0:
	; AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX512F-NEXT: vcvtudq2pd %ymm0, %zmm0			; AVX512F-NEXT: vcvtudq2pd %ymm0, %zmm0
	; AVX512F-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>			; AVX512F-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: uitofp_load_2i32_to_2f64:			; AVX512VL-LABEL: uitofp_load_2i32_to_2f64:
	; AVX512VL: # BB#0:			; AVX512VL: # BB#0:
	; AVX512VL-NEXT: vpmovzxdq {{.*#+}} xmm0 = mem[0],zero,mem[1],zero			; AVX512VL-NEXT: vpmovzxdq {{.*#+}} xmm0 = mem[0],zero,mem[1],zero
	; AVX512VL-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]			; AVX512VL-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; AVX512VL-NEXT: vcvtudq2pd %xmm0, %xmm0			; AVX512VL-NEXT: vcvtudq2pd %xmm0, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512DQ-LABEL: uitofp_load_2i32_to_2f64:			; AVX512DQ-LABEL: uitofp_load_2i32_to_2f64:
	; AVX512DQ: # BB#0:			; AVX512DQ: # BB#0:
	; AVX512DQ-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; AVX512DQ-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX512DQ-NEXT: vcvtudq2pd %ymm0, %zmm0			; AVX512DQ-NEXT: vcvtudq2pd %ymm0, %zmm0
	; AVX512DQ-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>			; AVX512DQ-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
	; AVX512DQ-NEXT: retq			; AVX512DQ-NEXT: retq
	;			;
	; AVX512VLDQ-LABEL: uitofp_load_2i32_to_2f64:			; AVX512VLDQ-LABEL: uitofp_load_2i32_to_2f64:
	; AVX512VLDQ: # BB#0:			; AVX512VLDQ: # BB#0:
	; AVX512VLDQ-NEXT: vpmovzxdq {{.*#+}} xmm0 = mem[0],zero,mem[1],zero			; AVX512VLDQ-NEXT: vpmovzxdq {{.*#+}} xmm0 = mem[0],zero,mem[1],zero
	; AVX512VLDQ-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]			; AVX512VLDQ-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
	▲ Show 20 Lines • Show All 1,831 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_set-2.ll

Show All 10 Lines	; CHECK-NEXT: retl
%tmp6 = insertelement <4 x float> %tmp5, float 0.000000e+00, i32 2		%tmp6 = insertelement <4 x float> %tmp5, float 0.000000e+00, i32 2
%tmp7 = insertelement <4 x float> %tmp6, float 0.000000e+00, i32 3		%tmp7 = insertelement <4 x float> %tmp6, float 0.000000e+00, i32 3
ret <4 x float> %tmp7		ret <4 x float> %tmp7
}		}

define <2 x i64> @test(i32 %a) nounwind {		define <2 x i64> @test(i32 %a) nounwind {
; CHECK-LABEL: test:		; CHECK-LABEL: test:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; CHECK-NEXT: retl		; CHECK-NEXT: retl
%tmp = insertelement <4 x i32> zeroinitializer, i32 %a, i32 0		%tmp = insertelement <4 x i32> zeroinitializer, i32 %a, i32 0
%tmp6 = insertelement <4 x i32> %tmp, i32 0, i32 1		%tmp6 = insertelement <4 x i32> %tmp, i32 0, i32 1
%tmp8 = insertelement <4 x i32> %tmp6, i32 0, i32 2		%tmp8 = insertelement <4 x i32> %tmp6, i32 0, i32 2
%tmp10 = insertelement <4 x i32> %tmp8, i32 0, i32 3		%tmp10 = insertelement <4 x i32> %tmp8, i32 0, i32 3
%tmp19 = bitcast <4 x i32> %tmp10 to <2 x i64>		%tmp19 = bitcast <4 x i32> %tmp10 to <2 x i64>
ret <2 x i64> %tmp19		ret <2 x i64> %tmp19
}		}

test/CodeGen/X86/vec_set-C.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-linux-gnu -mattr=+sse2,-avx \| FileCheck %s --check-prefix=X32			; RUN: llc < %s -mtriple=i386-linux-gnu -mattr=+sse2,-avx \| FileCheck %s --check-prefix=X32
	; RUN: llc < %s -mtriple=x86_64-pc-linux -mattr=+sse2,-avx \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-pc-linux -mattr=+sse2,-avx \| FileCheck %s --check-prefix=X64

	define <2 x i64> @t1(i64 %x) nounwind {			define <2 x i64> @t1(i64 %x) nounwind {
	; X32-LABEL: t1:			; X32-LABEL: t1:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: t1:			; X64-LABEL: t1:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movd %rdi, %xmm0			; X64-NEXT: movd %rdi, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp8 = insertelement <2 x i64> zeroinitializer, i64 %x, i32 0			%tmp8 = insertelement <2 x i64> zeroinitializer, i64 %x, i32 0
	ret <2 x i64> %tmp8			ret <2 x i64> %tmp8
	}			}

test/CodeGen/X86/vec_set-D.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-unknown -mattr=+sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=i386-unknown -mattr=+sse2 \| FileCheck %s

	define <4 x i32> @t(i32 %x, i32 %y) nounwind {			define <4 x i32> @t(i32 %x, i32 %y) nounwind {
	; CHECK-LABEL: t:			; CHECK-LABEL: t:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%tmp1 = insertelement <4 x i32> zeroinitializer, i32 %x, i32 0			%tmp1 = insertelement <4 x i32> zeroinitializer, i32 %x, i32 0
	%tmp2 = insertelement <4 x i32> %tmp1, i32 %y, i32 1			%tmp2 = insertelement <4 x i32> %tmp1, i32 %y, i32 1
	ret <4 x i32> %tmp2			ret <4 x i32> %tmp2
	}			}

test/CodeGen/X86/vec_set-F.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-linux -mattr=+sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=i686-linux -mattr=+sse2 \| FileCheck %s

	define <2 x i64> @t1(<2 x i64>* %ptr) nounwind {			define <2 x i64> @t1(<2 x i64>* %ptr) nounwind {
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%tmp45 = bitcast <2 x i64>* %ptr to <2 x i32>*			%tmp45 = bitcast <2 x i64>* %ptr to <2 x i32>*
	%tmp615 = load <2 x i32>, <2 x i32>* %tmp45			%tmp615 = load <2 x i32>, <2 x i32>* %tmp45
	%tmp7 = bitcast <2 x i32> %tmp615 to i64			%tmp7 = bitcast <2 x i32> %tmp615 to i64
	%tmp8 = insertelement <2 x i64> zeroinitializer, i64 %tmp7, i32 0			%tmp8 = insertelement <2 x i64> zeroinitializer, i64 %tmp7, i32 0
	ret <2 x i64> %tmp8			ret <2 x i64> %tmp8
	}			}

	Show All 11 Lines

test/CodeGen/X86/vector-shuffle-128-v2.ll

Show First 20 Lines • Show All 995 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%v = insertelement <2 x i64> undef, i64 %a, i32 0		%v = insertelement <2 x i64> undef, i64 %a, i32 0
%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x i64> @insert_mem_and_zero_v2i64(i64* %ptr) {		define <2 x i64> @insert_mem_and_zero_v2i64(i64* %ptr) {
; SSE-LABEL: insert_mem_and_zero_v2i64:		; SSE-LABEL: insert_mem_and_zero_v2i64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_and_zero_v2i64:		; AVX-LABEL: insert_mem_and_zero_v2i64:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load i64, i64* %ptr		%a = load i64, i64* %ptr
%v = insertelement <2 x i64> undef, i64 %a, i32 0		%v = insertelement <2 x i64> undef, i64 %a, i32 0
%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>		%shuffle = shufflevector <2 x i64> %v, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 3>
ret <2 x i64> %shuffle		ret <2 x i64> %shuffle
}		}

define <2 x double> @insert_reg_and_zero_v2f64(double %a) {		define <2 x double> @insert_reg_and_zero_v2f64(double %a) {
▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-128-v4.ll

Show First 20 Lines • Show All 2,049 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%v = insertelement <4 x i32> undef, i32 %a, i32 0		%v = insertelement <4 x i32> undef, i32 %a, i32 0
%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x i32> @insert_mem_and_zero_v4i32(i32* %ptr) {		define <4 x i32> @insert_mem_and_zero_v4i32(i32* %ptr) {
; SSE-LABEL: insert_mem_and_zero_v4i32:		; SSE-LABEL: insert_mem_and_zero_v4i32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: insert_mem_and_zero_v4i32:		; AVX-LABEL: insert_mem_and_zero_v4i32:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: retq		; AVX-NEXT: retq
%a = load i32, i32* %ptr		%a = load i32, i32* %ptr
%v = insertelement <4 x i32> undef, i32 %a, i32 0		%v = insertelement <4 x i32> undef, i32 %a, i32 0
%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i32> %v, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i32> %shuffle		ret <4 x i32> %shuffle
}		}

define <4 x float> @insert_reg_and_zero_v4f32(float %a) {		define <4 x float> @insert_reg_and_zero_v4f32(float %a) {
▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v4.ll

Show First 20 Lines • Show All 1,181 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%v = insertelement <4 x i64> undef, i64 %a, i64 0		%v = insertelement <4 x i64> undef, i64 %a, i64 0
%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i64> %shuffle		ret <4 x i64> %shuffle
}		}

define <4 x i64> @insert_mem_and_zero_v4i64(i64* %ptr) {		define <4 x i64> @insert_mem_and_zero_v4i64(i64* %ptr) {
; ALL-LABEL: insert_mem_and_zero_v4i64:		; ALL-LABEL: insert_mem_and_zero_v4i64:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; ALL-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; ALL-NEXT: retq		; ALL-NEXT: retq
%a = load i64, i64* %ptr		%a = load i64, i64* %ptr
%v = insertelement <4 x i64> undef, i64 %a, i64 0		%v = insertelement <4 x i64> undef, i64 %a, i64 0
%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i64> %shuffle		ret <4 x i64> %shuffle
}		}

define <4 x double> @insert_reg_and_zero_v4f64(double %a) {		define <4 x double> @insert_reg_and_zero_v4f64(double %a) {
▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-256-v8.ll

	Show First 20 Lines • Show All 2,428 Lines • ▼ Show 20 Lines
	; AVX2OR512VL-NEXT: retq			; AVX2OR512VL-NEXT: retq
	%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 1, i32 2, i32 3, i32 0, i32 5, i32 6, i32 7, i32 4>			%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 1, i32 2, i32 3, i32 0, i32 5, i32 6, i32 7, i32 4>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

	define <8x float> @concat_v2f32_1(<2 x float>* %tmp64, <2 x float>* %tmp65) {			define <8x float> @concat_v2f32_1(<2 x float>* %tmp64, <2 x float>* %tmp65) {
	; ALL-LABEL: concat_v2f32_1:			; ALL-LABEL: concat_v2f32_1:
	; ALL: # BB#0: # %entry			; ALL: # BB#0: # %entry
	; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; ALL-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]			; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%tmp74 = load <2 x float>, <2 x float>* %tmp65, align 8			%tmp74 = load <2 x float>, <2 x float>* %tmp65, align 8
	%tmp72 = load <2 x float>, <2 x float>* %tmp64, align 8			%tmp72 = load <2 x float>, <2 x float>* %tmp64, align 8
	%tmp73 = shufflevector <2 x float> %tmp72, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp73 = shufflevector <2 x float> %tmp72, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%tmp75 = shufflevector <2 x float> %tmp74, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp75 = shufflevector <2 x float> %tmp74, <2 x float> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%tmp76 = shufflevector <8 x float> %tmp73, <8 x float> %tmp75, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp76 = shufflevector <8 x float> %tmp73, <8 x float> %tmp75, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x float> %tmp76			ret <8 x float> %tmp76
	}			}

	define <8x float> @concat_v2f32_2(<2 x float>* %tmp64, <2 x float>* %tmp65) {			define <8x float> @concat_v2f32_2(<2 x float>* %tmp64, <2 x float>* %tmp65) {
	; ALL-LABEL: concat_v2f32_2:			; ALL-LABEL: concat_v2f32_2:
	; ALL: # BB#0: # %entry			; ALL: # BB#0: # %entry
	; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; ALL-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]			; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%tmp74 = load <2 x float>, <2 x float>* %tmp65, align 8			%tmp74 = load <2 x float>, <2 x float>* %tmp65, align 8
	%tmp72 = load <2 x float>, <2 x float>* %tmp64, align 8			%tmp72 = load <2 x float>, <2 x float>* %tmp64, align 8
	%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x float> %tmp76			ret <8 x float> %tmp76
	}			}

	define <8x float> @concat_v2f32_3(<2 x float>* %tmp64, <2 x float>* %tmp65) {			define <8x float> @concat_v2f32_3(<2 x float>* %tmp64, <2 x float>* %tmp65) {
	; ALL-LABEL: concat_v2f32_3:			; ALL-LABEL: concat_v2f32_3:
	; ALL: # BB#0: # %entry			; ALL: # BB#0: # %entry
	; ALL-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; ALL-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]			; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%tmp74 = load <2 x float>, <2 x float>* %tmp65, align 8			%tmp74 = load <2 x float>, <2 x float>* %tmp65, align 8
	%tmp72 = load <2 x float>, <2 x float>* %tmp64, align 8			%tmp72 = load <2 x float>, <2 x float>* %tmp64, align 8
	%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%tmp76 = shufflevector <2 x float> %tmp72, <2 x float> %tmp74, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%res = shufflevector <4 x float> %tmp76, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%res = shufflevector <4 x float> %tmp76, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x float> %res			ret <8 x float> %res
	}			}

	define <8 x i32> @insert_mem_and_zero_v8i32(i32* %ptr) {			define <8 x i32> @insert_mem_and_zero_v8i32(i32* %ptr) {
	; ALL-LABEL: insert_mem_and_zero_v8i32:			; ALL-LABEL: insert_mem_and_zero_v8i32:
	; ALL: # BB#0:			; ALL: # BB#0:
	; ALL-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%a = load i32, i32* %ptr			%a = load i32, i32* %ptr
	%v = insertelement <8 x i32> undef, i32 %a, i32 0			%v = insertelement <8 x i32> undef, i32 %a, i32 0
	%shuffle = shufflevector <8 x i32> %v, <8 x i32> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%shuffle = shufflevector <8 x i32> %v, <8 x i32> zeroinitializer, <8 x i32> <i32 0, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

	define <8 x i32> @concat_v8i32_0123CDEF(<8 x i32> %a, <8 x i32> %b) {			define <8 x i32> @concat_v8i32_0123CDEF(<8 x i32> %a, <8 x i32> %b) {
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-512-v16.ll

	Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%shuffle = shufflevector <16 x float> %a, <16 x float> %b, <16 x i32> <i32 0, i32 1, i32 16, i32 16, i32 4, i32 5, i32 20, i32 20, i32 8, i32 9, i32 24, i32 24, i32 12, i32 13, i32 28, i32 28>			%shuffle = shufflevector <16 x float> %a, <16 x float> %b, <16 x i32> <i32 0, i32 1, i32 16, i32 16, i32 4, i32 5, i32 20, i32 20, i32 8, i32 9, i32 24, i32 24, i32 12, i32 13, i32 28, i32 28>
	ret <16 x float> %shuffle			ret <16 x float> %shuffle
	}			}

	define <16 x i32> @insert_mem_and_zero_v16i32(i32* %ptr) {			define <16 x i32> @insert_mem_and_zero_v16i32(i32* %ptr) {
	; ALL-LABEL: insert_mem_and_zero_v16i32:			; ALL-LABEL: insert_mem_and_zero_v16i32:
	; ALL: # BB#0:			; ALL: # BB#0:
	; ALL-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%a = load i32, i32* %ptr			%a = load i32, i32* %ptr
	%v = insertelement <16 x i32> undef, i32 %a, i32 0			%v = insertelement <16 x i32> undef, i32 %a, i32 0
	%shuffle = shufflevector <16 x i32> %v, <16 x i32> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			%shuffle = shufflevector <16 x i32> %v, <16 x i32> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	ret <16 x i32> %shuffle			ret <16 x i32> %shuffle
	}			}


	▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-combining-xop.ll

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%1 = call <16 x i8> @llvm.x86.xop.vpperm(<16 x i8> <i8 0, i8 -1, i8 -2, i8 -3, i8 -4, i8 -5, i8 -6, i8 -7, i8 -8, i8 -9, i8 -10, i8 -11, i8 -12, i8 -13, i8 -14, i8 -15>, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>, <16 x i8> <i8 31, i8 30, i8 29, i8 28, i8 27, i8 26, i8 25, i8 24, i8 23, i8 22, i8 21, i8 20, i8 19, i8 18, i8 17, i8 16>)		%1 = call <16 x i8> @llvm.x86.xop.vpperm(<16 x i8> <i8 0, i8 -1, i8 -2, i8 -3, i8 -4, i8 -5, i8 -6, i8 -7, i8 -8, i8 -9, i8 -10, i8 -11, i8 -12, i8 -13, i8 -14, i8 -15>, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>, <16 x i8> <i8 31, i8 30, i8 29, i8 28, i8 27, i8 26, i8 25, i8 24, i8 23, i8 22, i8 21, i8 20, i8 19, i8 18, i8 17, i8 16>)
ret <16 x i8> %1		ret <16 x i8> %1
}		}

define <4 x float> @PR31296(i8* %in) {		define <4 x float> @PR31296(i8* %in) {
; X32-LABEL: PR31296:		; X32-LABEL: PR31296:
; X32: # BB#0: # %entry		; X32: # BB#0: # %entry
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-NEXT: vmovaps {{.*#+}} xmm1 = <0,1,u,u>		; X32-NEXT: vmovaps {{.*#+}} xmm1 = <0,1,u,u>
; X32-NEXT: vpermil2ps {{.*#+}} xmm0 = xmm0[0],xmm1[0,0,1]		; X32-NEXT: vpermil2ps {{.*#+}} xmm0 = xmm0[0],xmm1[0,0,1]
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: PR31296:		; X64-LABEL: PR31296:
; X64: # BB#0: # %entry		; X64: # BB#0: # %entry
; X64-NEXT: movl (%rdi), %eax		; X64-NEXT: movl (%rdi), %eax
; X64-NEXT: vmovq %rax, %xmm0		; X64-NEXT: vmovq %rax, %xmm0
Show All 12 Lines

test/CodeGen/X86/vector-shuffle-combining.ll

Show First 20 Lines • Show All 1,776 Lines • ▼ Show 20 Lines	; AVX2-NEXT: retq
%2 = shufflevector <8 x i32> %a, <8 x i32> %a, <4 x i32> <i32 2, i32 3, i32 6, i32 7>		%2 = shufflevector <8 x i32> %a, <8 x i32> %a, <4 x i32> <i32 2, i32 3, i32 6, i32 7>
store <4 x i32> %1, <4 x i32>* %ptr, align 16		store <4 x i32> %1, <4 x i32>* %ptr, align 16
ret <4 x i32> %2		ret <4 x i32> %2
}		}

define <8 x float> @combine_test22(<2 x float>* %a, <2 x float>* %b) {		define <8 x float> @combine_test22(<2 x float>* %a, <2 x float>* %b) {
; SSE-LABEL: combine_test22:		; SSE-LABEL: combine_test22:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: movhpd {{.*#+}} xmm0 = xmm0[0],mem[0]		; SSE-NEXT: movhpd {{.*#+}} xmm0 = xmm0[0],mem[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: combine_test22:		; AVX-LABEL: combine_test22:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]		; AVX-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]
; AVX-NEXT: retq		; AVX-NEXT: retq
; Current AVX2 lowering of this is still awful, not adding a test case.		; Current AVX2 lowering of this is still awful, not adding a test case.
%1 = load <2 x float>, <2 x float>* %a, align 8		%1 = load <2 x float>, <2 x float>* %a, align 8
%2 = load <2 x float>, <2 x float>* %b, align 8		%2 = load <2 x float>, <2 x float>* %b, align 8
%3 = shufflevector <2 x float> %1, <2 x float> %2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		%3 = shufflevector <2 x float> %1, <2 x float> %2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
ret <8 x float> %3		ret <8 x float> %3
}		}
▲ Show 20 Lines • Show All 1,013 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%d = shufflevector <4 x float> %a, <4 x float> %c, <4 x i32><i32 4, i32 1, i32 6, i32 5>		%d = shufflevector <4 x float> %a, <4 x float> %c, <4 x i32><i32 4, i32 1, i32 6, i32 5>
ret <4 x float> %d		ret <4 x float> %d
}		}

define void @combine_scalar_load_with_blend_with_zero(double* %a0, <4 x float>* %a1) {		define void @combine_scalar_load_with_blend_with_zero(double* %a0, <4 x float>* %a1) {
; SSE-LABEL: combine_scalar_load_with_blend_with_zero:		; SSE-LABEL: combine_scalar_load_with_blend_with_zero:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: movapd %xmm0, (%rsi)		; SSE-NEXT: movaps %xmm0, (%rsi)
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: combine_scalar_load_with_blend_with_zero:		; AVX-LABEL: combine_scalar_load_with_blend_with_zero:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: vmovapd %xmm0, (%rsi)		; AVX-NEXT: vmovaps %xmm0, (%rsi)
; AVX-NEXT: retq		; AVX-NEXT: retq
%1 = load double, double* %a0, align 8		%1 = load double, double* %a0, align 8
%2 = insertelement <2 x double> undef, double %1, i32 0		%2 = insertelement <2 x double> undef, double %1, i32 0
%3 = insertelement <2 x double> %2, double 0.000000e+00, i32 1		%3 = insertelement <2 x double> %2, double 0.000000e+00, i32 1
%4 = bitcast <2 x double> %3 to <4 x float>		%4 = bitcast <2 x double> %3 to <4 x float>
%5 = shufflevector <4 x float> %4, <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, <4 x i32> <i32 0, i32 1, i32 4, i32 3>		%5 = shufflevector <4 x float> %4, <4 x float> <float 0.000000e+00, float undef, float undef, float undef>, <4 x i32> <i32 0, i32 1, i32 4, i32 3>
store <4 x float> %5, <4 x float>* %a1, align 16		store <4 x float> %5, <4 x float>* %a1, align 16
ret void		ret void
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-mmx.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X32 %s			; RUN: llc < %s -mtriple=i686-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X32 %s
	; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X64 %s			; RUN: llc < %s -mtriple=x86_64-darwin -mattr=+mmx,+sse2 \| FileCheck --check-prefix=X64 %s

	; If there is no explicit MMX type usage, always promote to XMM.			; If there is no explicit MMX type usage, always promote to XMM.

	define void @test0(<1 x i64>* %x) {			define void @test0(<1 x i64>* %x) {
	; X32-LABEL: test0:			; X32-LABEL: test0:
	; X32: ## BB#0: ## %entry			; X32: ## BB#0: ## %entry
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; X32-NEXT: movq %xmm0, (%eax)			; X32-NEXT: movq %xmm0, (%eax)
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test0:			; X64-LABEL: test0:
	; X64: ## BB#0: ## %entry			; X64: ## BB#0: ## %entry
	; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-variable-256.ll

	Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vmovd %edx, %xmm3			; AVX2-NEXT: vmovd %edx, %xmm3
	; AVX2-NEXT: vpermps %ymm0, %ymm3, %ymm3			; AVX2-NEXT: vpermps %ymm0, %ymm3, %ymm3
	; AVX2-NEXT: vmovd %ecx, %xmm4			; AVX2-NEXT: vmovd %ecx, %xmm4
	; AVX2-NEXT: vpermps %ymm0, %ymm4, %ymm4			; AVX2-NEXT: vpermps %ymm0, %ymm4, %ymm4
	; AVX2-NEXT: vmovd %r8d, %xmm5			; AVX2-NEXT: vmovd %r8d, %xmm5
	; AVX2-NEXT: vpermps %ymm0, %ymm5, %ymm5			; AVX2-NEXT: vpermps %ymm0, %ymm5, %ymm5
	; AVX2-NEXT: vmovd %r9d, %xmm6			; AVX2-NEXT: vmovd %r9d, %xmm6
	; AVX2-NEXT: vpermps %ymm0, %ymm6, %ymm6			; AVX2-NEXT: vpermps %ymm0, %ymm6, %ymm6
	; AVX2-NEXT: vmovd {{.*#+}} xmm7 = mem[0],zero,zero,zero			; AVX2-NEXT: vmovss {{.*#+}} xmm7 = mem[0],zero,zero,zero
	; AVX2-NEXT: vpermps %ymm0, %ymm7, %ymm7			; AVX2-NEXT: vpermps %ymm0, %ymm7, %ymm7
	; AVX2-NEXT: vmovd {{.*#+}} xmm8 = mem[0],zero,zero,zero			; AVX2-NEXT: vmovss {{.*#+}} xmm8 = mem[0],zero,zero,zero
	; AVX2-NEXT: vpermps %ymm0, %ymm8, %ymm0			; AVX2-NEXT: vpermps %ymm0, %ymm8, %ymm0
	; AVX2-NEXT: vinsertps {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[2,3]			; AVX2-NEXT: vinsertps {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[2,3]
	; AVX2-NEXT: vinsertps {{.*#+}} xmm5 = xmm5[0,1],xmm7[0],xmm5[3]			; AVX2-NEXT: vinsertps {{.*#+}} xmm5 = xmm5[0,1],xmm7[0],xmm5[3]
	; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm5[0,1,2],xmm0[0]			; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm5[0,1,2],xmm0[0]
	; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]			; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
	; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm3[0],xmm1[3]			; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm3[0],xmm1[3]
	; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[0]			; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[0]
	; AVX2-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX2-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	▲ Show 20 Lines • Show All 458 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-zmov.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+ssse3 \| FileCheck %s --check-prefix=SSE --check-prefix=SSSE3			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+ssse3 \| FileCheck %s --check-prefix=SSE --check-prefix=SSSE3
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE41			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE41
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx \| FileCheck %s --check-prefix=AVX --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx \| FileCheck %s --check-prefix=AVX --check-prefix=AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2

	define <4 x i32> @load_zmov_4i32_to_0zzz(<4 x i32> *%ptr) {			define <4 x i32> @load_zmov_4i32_to_0zzz(<4 x i32> *%ptr) {
	; SSE-LABEL: load_zmov_4i32_to_0zzz:			; SSE-LABEL: load_zmov_4i32_to_0zzz:
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: load_zmov_4i32_to_0zzz:			; AVX-LABEL: load_zmov_4i32_to_0zzz:
	; AVX: # BB#0: # %entry			; AVX: # BB#0: # %entry
	; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; AVX-NEXT: retq			; AVX-NEXT: retq

	entry:			entry:
	%X = load <4 x i32>, <4 x i32>* %ptr			%X = load <4 x i32>, <4 x i32>* %ptr
	%Y = shufflevector <4 x i32> %X, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 4, i32 4, i32 4>			%Y = shufflevector <4 x i32> %X, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 4, i32 4, i32 4>
	ret <4 x i32>%Y			ret <4 x i32>%Y
	}			}

	define <2 x i64> @load_zmov_2i64_to_0z(<2 x i64> *%ptr) {			define <2 x i64> @load_zmov_2i64_to_0z(<2 x i64> *%ptr) {
	; SSE-LABEL: load_zmov_2i64_to_0z:			; SSE-LABEL: load_zmov_2i64_to_0z:
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: load_zmov_2i64_to_0z:			; AVX-LABEL: load_zmov_2i64_to_0z:
	; AVX: # BB#0: # %entry			; AVX: # BB#0: # %entry
	; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: retq			; AVX-NEXT: retq

	entry:			entry:
	%X = load <2 x i64>, <2 x i64>* %ptr			%X = load <2 x i64>, <2 x i64>* %ptr
	%Y = shufflevector <2 x i64> %X, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 2>			%Y = shufflevector <2 x i64> %X, <2 x i64> zeroinitializer, <2 x i32> <i32 0, i32 2>
	ret <2 x i64>%Y			ret <2 x i64>%Y
	}			}

test/CodeGen/X86/widen_load-2.ll

	Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines

	%i16vec4 = type <4 x i16>			%i16vec4 = type <4 x i16>
	define void @add4i16(%i16vec4* nocapture sret %ret, %i16vec4* %ap, %i16vec4* %bp) nounwind {			define void @add4i16(%i16vec4* nocapture sret %ret, %i16vec4* %ap, %i16vec4* %bp) nounwind {
	; X86-LABEL: add4i16:			; X86-LABEL: add4i16:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X86-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; X86-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
	; X86-NEXT: paddw %xmm0, %xmm1			; X86-NEXT: paddw %xmm0, %xmm1
	; X86-NEXT: movq %xmm1, (%eax)			; X86-NEXT: movq %xmm1, (%eax)
	; X86-NEXT: retl $4			; X86-NEXT: retl $4
	;			;
	; X64-LABEL: add4i16:			; X64-LABEL: add4i16:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: movq {{.*#+}} xmm1 = mem[0],zero			; X64-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Fix domains for VZEXT_LOAD type instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 81122

lib/Target/X86/X86InstrInfo.cpp

test/CodeGen/X86/2008-02-06-LoadFoldingBug.ll

test/CodeGen/X86/2012-1-10-buildvector.ll

test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

test/CodeGen/X86/avx-shuffle-x86_32.ll

test/CodeGen/X86/avx2-vbroadcast.ll

test/CodeGen/X86/avx512-mov.ll

test/CodeGen/X86/fp-logic.ll

test/CodeGen/X86/fp128-cast.ll

test/CodeGen/X86/i64-mem-copy.ll

test/CodeGen/X86/logical-load-fold.ll

test/CodeGen/X86/merge-consecutive-loads-128.ll

test/CodeGen/X86/merge-consecutive-loads-256.ll

test/CodeGen/X86/merge-consecutive-loads-512.ll

test/CodeGen/X86/mmx-arg-passing-x86-64.ll

test/CodeGen/X86/pr11334.ll

test/CodeGen/X86/pr2656.ll

test/CodeGen/X86/scalar-int-to-fp.ll

test/CodeGen/X86/sse-fcopysign.ll

test/CodeGen/X86/sse-minmax.ll

test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll

test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll

test/CodeGen/X86/sse2.ll

test/CodeGen/X86/uint64-to-float.ll

test/CodeGen/X86/uint_to_fp-2.ll

test/CodeGen/X86/vec_extract-avx.ll

test/CodeGen/X86/vec_extract-mmx.ll

test/CodeGen/X86/vec_i64.ll

test/CodeGen/X86/vec_insert-2.ll

test/CodeGen/X86/vec_insert-3.ll

test/CodeGen/X86/vec_insert-mmx.ll

test/CodeGen/X86/vec_int_to_fp.ll

test/CodeGen/X86/vec_set-2.ll

test/CodeGen/X86/vec_set-C.ll

test/CodeGen/X86/vec_set-D.ll

test/CodeGen/X86/vec_set-F.ll

test/CodeGen/X86/vector-shuffle-128-v2.ll

test/CodeGen/X86/vector-shuffle-128-v4.ll

test/CodeGen/X86/vector-shuffle-256-v4.ll

test/CodeGen/X86/vector-shuffle-256-v8.ll

test/CodeGen/X86/vector-shuffle-512-v16.ll

test/CodeGen/X86/vector-shuffle-combining-xop.ll

test/CodeGen/X86/vector-shuffle-combining.ll

test/CodeGen/X86/vector-shuffle-mmx.ll

test/CodeGen/X86/vector-shuffle-variable-256.ll

test/CodeGen/X86/vector-zmov.ll

test/CodeGen/X86/widen_load-2.ll

[X86][SSE] Fix domains for VZEXT_LOAD type instructions
ClosedPublic