This is an archive of the discontinued LLVM Phabricator instance.

[X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.
ClosedPublic

Authored by craig.topper on Sep 25 2018, 11:47 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel

Commits

rG35d513c7e4cd: [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.
rL344291: [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.

Summary

On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success.

This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency.

There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Sep 25 2018, 11:47 PM

Harbormaster completed remote builds in B23124: Diff 167050.Sep 25 2018, 11:48 PM

Is it worth trying to improve the SSE1 codegen or just go with the SSE2+ f64 solution?

I"m open to suggestions of how to improve the SSE1 codegen.

In D52528#1246579, @craig.topper wrote:

I"m open to suggestions of how to improve the SSE1 codegen.

Its tricky - much of the movlps/movhps selection is based around either i64 or f64 elements.

I'm a bit worried about how easy it will be to fix some of these regressions for similar reasons - have you looked at follow up fixes for these yet?

craig.topper added inline comments.Sep 29 2018, 9:38 AM

test/CodeGen/X86/bitcast-int-to-vector.ll
20 ↗	(On Diff #167050)	For this case we would need to decide if it makes sense to split the load into 2 scalar loads when both elements are extracted separately.
test/CodeGen/X86/vec_extract-avx.ll
174 ↗	(On Diff #167050)	This regression is because DAGCombiner::visitEXTRACT_ELEMENT explicitly avoids splitting a load until after op legalization. So we form a shuffle first and then we can't recover. I just checked to see if InstCombine would let this sequence through in the first place and it looks like it will widen the 2f32 to v8f32 and then shuffle the single element into place. Same as what was DAGCombine did. This seems not great. Why aren't we recognizing that we don't need the other elements of the v2f32 load?

spatel added inline comments.Sep 29 2018, 12:14 PM

test/CodeGen/X86/vec_extract-avx.ll

174 ↗

(On Diff #167050)

Would there be codegen problems if we always scalarize an extractelement of a vector load with no other uses in instcombine?

define float @load_extract(<4 x float>* %p) {
  %v = load <4 x float>, <4 x float>* %p
  %s = extractelement <4 x float> %v, i32 0
  ret float %s
}

-->

define float @load_extract(<4 x float>* %p) {
  %bc = bitcast <4 x float>* %p to float*
  %s = load float, float* %bc
  ret float %s
}

This would require an address offset (gep) in the general case.

I believe the only regressions caused by this are already issues in 64-bit mode. Are we concerned about 32-bit mode regressions here? Or can we take this and try to improve these issues as a follow up?

test/CodeGen/X86/vec_extract-avx.ll
174 ↗	(On Diff #167050)	I'm not sure.

LGTM as long as all the regressions are documented somewhere so we don't lose track

lib/Target/X86/X86ISelLowering.cpp
896 ↗	(On Diff #167050)	Please can you align the arguments into the columns </pedantic>
test/CodeGen/X86/vec_extract-avx.ll
174 ↗	(On Diff #167050)	A mixture of XFormVExtractWithShuffleIntoLoad and EltsFromConsecutiveLoads would probably help here.

This revision is now accepted and ready to land.Oct 11 2018, 4:19 AM

Closed by commit rL344291: [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector. (authored by ctopper). · Explain WhyOct 11 2018, 1:38 PM

This revision was automatically updated to reflect the committed changes.

craig.topper mentioned this in D53173: [X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing..Oct 11 2018, 2:50 PM

Diffusion mentioned this in rL344470: [X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64….Oct 13 2018, 8:39 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

24 lines

test/

CodeGen/

X86/

bitcast-int-to-vector.ll

6 lines

fold-load-vec.ll

4 lines

merge-consecutive-loads-256.ll

30 lines

sse-intrinsics-fast-isel.ll

24 lines

vec_extract-avx.ll

4 lines

vector-shuffle-128-v4.ll

4 lines

widen_load-1.ll

4 lines

Diff 169295

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 896 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasSSE2()) {
setOperationAction(ISD::UINT_TO_FP, MVT::v2f32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v2f32, Custom);

setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);		setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);
setOperationAction(ISD::FP_ROUND, MVT::v2f32, Custom);		setOperationAction(ISD::FP_ROUND, MVT::v2f32, Custom);

for (MVT VT : MVT::fp_vector_valuetypes())		for (MVT VT : MVT::fp_vector_valuetypes())
setLoadExtAction(ISD::EXTLOAD, VT, MVT::v2f32, Legal);		setLoadExtAction(ISD::EXTLOAD, VT, MVT::v2f32, Legal);

		// We want to legalize this to an f64 load rather than an i64 load on
		// 64-bit targets and two 32-bit loads on a 32-bit target.
		setOperationAction(ISD::LOAD, MVT::v2f32, Custom);

setOperationAction(ISD::BITCAST, MVT::v2i32, Custom);		setOperationAction(ISD::BITCAST, MVT::v2i32, Custom);
setOperationAction(ISD::BITCAST, MVT::v4i16, Custom);		setOperationAction(ISD::BITCAST, MVT::v4i16, Custom);
setOperationAction(ISD::BITCAST, MVT::v8i8, Custom);		setOperationAction(ISD::BITCAST, MVT::v8i8, Custom);
if (!Subtarget.hasAVX512())		if (!Subtarget.hasAVX512())
setOperationAction(ISD::BITCAST, MVT::v16i1, Custom);		setOperationAction(ISD::BITCAST, MVT::v16i1, Custom);

setOperationAction(ISD::SIGN_EXTEND_VECTOR_INREG, MVT::v2i64, Custom);		setOperationAction(ISD::SIGN_EXTEND_VECTOR_INREG, MVT::v2i64, Custom);
setOperationAction(ISD::SIGN_EXTEND_VECTOR_INREG, MVT::v4i32, Custom);		setOperationAction(ISD::SIGN_EXTEND_VECTOR_INREG, MVT::v4i32, Custom);
▲ Show 20 Lines • Show All 25,502 Lines • ▼ Show 20 Lines	if (VT == MVT::v2i32) {
Res = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v2i32, Res,		Res = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v2i32, Res,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
Results.push_back(Res);		Results.push_back(Res);
Results.push_back(Chain);		Results.push_back(Chain);
return;		return;
}		}
break;		break;
}		}
		case ISD::LOAD: {
		// Use an f64 load and a scalar_to_vector for v2f32 loads. This avoids
		// scalarizing in 32-bit mode. In 64-bit mode this avoids a int->fp cast
		// since type legalization will try to use an i64 load.
		EVT VT = N->getValueType(0);
		assert(VT == MVT::v2f32 && "Unexpected VT");
		if (!ISD::isNON_EXTLoad(N))
		return;
		auto *Ld = cast<LoadSDNode>(N);
		SDValue Res = DAG.getLoad(MVT::f64, dl, Ld->getChain(), Ld->getBasePtr(),
		Ld->getPointerInfo(),
		Ld->getAlignment(),
		Ld->getMemOperand()->getFlags());
		SDValue Chain = Res.getValue(1);
		Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2f64, Res);
		Res = DAG.getBitcast(MVT::v4f32, Res);
		Results.push_back(Res);
		Results.push_back(Chain);
		return;
		}
}		}
}		}

const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {		const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((X86ISD::NodeType)Opcode) {		switch ((X86ISD::NodeType)Opcode) {
case X86ISD::FIRST_NUMBER: break;		case X86ISD::FIRST_NUMBER: break;
case X86ISD::BSF: return "X86ISD::BSF";		case X86ISD::BSF: return "X86ISD::BSF";
case X86ISD::BSR: return "X86ISD::BSR";		case X86ISD::BSR: return "X86ISD::BSR";
▲ Show 20 Lines • Show All 15,134 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/bitcast-int-to-vector.ll

	Show All 11 Lines
	; X86-NEXT: fnstsw %ax			; X86-NEXT: fnstsw %ax
	; X86-NEXT: # kill: def $ah killed $ah killed $ax			; X86-NEXT: # kill: def $ah killed $ah killed $ax
	; X86-NEXT: sahf			; X86-NEXT: sahf
	; X86-NEXT: setp %al			; X86-NEXT: setp %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X86-SSE-LABEL: foo:			; X86-SSE-LABEL: foo:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X86-SSE-NEXT: ucomiss {{[0-9]+}}(%esp), %xmm0			; X86-SSE-NEXT: movaps %xmm0, %xmm1
				; X86-SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1],xmm0[2,3]
				; X86-SSE-NEXT: ucomiss %xmm1, %xmm0
	; X86-SSE-NEXT: setp %al			; X86-SSE-NEXT: setp %al
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %xmm0			; X64-NEXT: movq %rdi, %xmm0
	; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]			; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
	; X64-NEXT: ucomiss %xmm1, %xmm0			; X64-NEXT: ucomiss %xmm1, %xmm0
	; X64-NEXT: setp %al			; X64-NEXT: setp %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%t = bitcast i64 %a to <2 x float>			%t = bitcast i64 %a to <2 x float>
	%r = extractelement <2 x float> %t, i32 0			%r = extractelement <2 x float> %t, i32 0
	%s = extractelement <2 x float> %t, i32 1			%s = extractelement <2 x float> %t, i32 1
	%b = fcmp uno float %r, %s			%b = fcmp uno float %r, %s
	ret i1 %b			ret i1 %b
	}			}

llvm/trunk/test/CodeGen/X86/fold-load-vec.ll

	Show All 10 Lines
	; CHECK-NEXT: movq %rdi, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rsi, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: movlps %xmm0, (%rsp)			; CHECK-NEXT: movlps %xmm0, (%rsp)
	; CHECK-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]			; CHECK-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
	; CHECK-NEXT: movlps %xmm0, (%rsp)			; CHECK-NEXT: movlps %xmm0, (%rsp)
	; CHECK-NEXT: movlps %xmm0, (%rsi)			; CHECK-NEXT: movlps %xmm0, (%rsi)
	; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rax			; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]			; CHECK-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
	; CHECK-NEXT: callq ext			; CHECK-NEXT: callq ext
	; CHECK-NEXT: addq $24, %rsp			; CHECK-NEXT: addq $24, %rsp
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%source.addr = alloca <4 x float>*, align 8			%source.addr = alloca <4 x float>*, align 8
	%dest.addr = alloca <2 x float>*, align 8			%dest.addr = alloca <2 x float>*, align 8
	%tmp = alloca <2 x float>, align 8			%tmp = alloca <2 x float>, align 8
	store <4 x float>* %source, <4 x float>** %source.addr, align 8			store <4 x float>* %source, <4 x float>** %source.addr, align 8
	Show All 25 Lines

llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	; X32-AVX-NEXT: retl
%res0 = insertelement <4 x i64> zeroinitializer, i64 %val0, i32 0		%res0 = insertelement <4 x i64> zeroinitializer, i64 %val0, i32 0
%res1 = insertelement <4 x i64> %res0, i64 %val1, i32 1		%res1 = insertelement <4 x i64> %res0, i64 %val1, i32 1
ret <4 x i64> %res1		ret <4 x i64> %res1
}		}

define <8 x float> @merge_8f32_2f32_23z5(<2 x float>* %ptr) nounwind uwtable noinline ssp {		define <8 x float> @merge_8f32_2f32_23z5(<2 x float>* %ptr) nounwind uwtable noinline ssp {
; AVX1-LABEL: merge_8f32_2f32_23z5:		; AVX1-LABEL: merge_8f32_2f32_23z5:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX1-NEXT: vmovups 16(%rdi), %xmm0
; AVX1-NEXT: vmovups 16(%rdi), %xmm1		; AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]		; AVX1-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: merge_8f32_2f32_23z5:		; AVX2-LABEL: merge_8f32_2f32_23z5:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX2-NEXT: vmovupd 16(%rdi), %xmm0
; AVX2-NEXT: vmovdqu 16(%rdi), %xmm1		; AVX2-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX2-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]		; AVX2-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0		; AVX2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: merge_8f32_2f32_23z5:		; AVX512F-LABEL: merge_8f32_2f32_23z5:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX512F-NEXT: vmovupd 16(%rdi), %xmm0
; AVX512F-NEXT: vmovdqu 16(%rdi), %xmm1		; AVX512F-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX512F-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]		; AVX512F-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0		; AVX512F-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; X32-AVX-LABEL: merge_8f32_2f32_23z5:		; X32-AVX-LABEL: merge_8f32_2f32_23z5:
; X32-AVX: # %bb.0:		; X32-AVX: # %bb.0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0		; X32-AVX-NEXT: vmovups 16(%eax), %xmm0
; X32-AVX-NEXT: vblendps {{.*#+}} ymm0 = mem[0,1,2,3],ymm0[4,5],mem[6,7]		; X32-AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1
		; X32-AVX-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
		; X32-AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 2		%ptr0 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 2
%ptr1 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 3		%ptr1 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 3
%ptr3 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 5		%ptr3 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 5
%val0 = load <2 x float>, <2 x float>* %ptr0		%val0 = load <2 x float>, <2 x float>* %ptr0
%val1 = load <2 x float>, <2 x float>* %ptr1		%val1 = load <2 x float>, <2 x float>* %ptr1
%val3 = load <2 x float>, <2 x float>* %ptr3		%val3 = load <2 x float>, <2 x float>* %ptr3
%res01 = shufflevector <2 x float> %val0, <2 x float> %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%res01 = shufflevector <2 x float> %val0, <2 x float> %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
▲ Show 20 Lines • Show All 408 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll

	Show First 20 Lines • Show All 1,323 Lines • ▼ Show 20 Lines
	; X86-SSE-NEXT: # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]			; X86-SSE-NEXT: # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
	; X86-SSE-NEXT: movlhps %xmm1, %xmm0 # encoding: [0x0f,0x16,0xc1]			; X86-SSE-NEXT: movlhps %xmm1, %xmm0 # encoding: [0x0f,0x16,0xc1]
	; X86-SSE-NEXT: # xmm0 = xmm0[0],xmm1[0]			; X86-SSE-NEXT: # xmm0 = xmm0[0],xmm1[0]
	; X86-SSE-NEXT: retl # encoding: [0xc3]			; X86-SSE-NEXT: retl # encoding: [0xc3]
	;			;
	; X86-AVX1-LABEL: test_mm_loadh_pi:			; X86-AVX1-LABEL: test_mm_loadh_pi:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]			; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX1-NEXT: vmovsd (%eax), %xmm1 # encoding: [0xc5,0xfb,0x10,0x08]			; X86-AVX1-NEXT: vmovhpd (%eax), %xmm0, %xmm0 # encoding: [0xc5,0xf9,0x16,0x00]
	; X86-AVX1-NEXT: # xmm1 = mem[0],zero			; X86-AVX1-NEXT: # xmm0 = xmm0[0],mem[0]
	; X86-AVX1-NEXT: vmovlhps %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf8,0x16,0xc1]
	; X86-AVX1-NEXT: # xmm0 = xmm0[0],xmm1[0]
	; X86-AVX1-NEXT: retl # encoding: [0xc3]			; X86-AVX1-NEXT: retl # encoding: [0xc3]
	;			;
	; X86-AVX512-LABEL: test_mm_loadh_pi:			; X86-AVX512-LABEL: test_mm_loadh_pi:
	; X86-AVX512: # %bb.0:			; X86-AVX512: # %bb.0:
	; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]			; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX512-NEXT: vmovsd (%eax), %xmm1 # EVEX TO VEX Compression encoding: [0xc5,0xfb,0x10,0x08]			; X86-AVX512-NEXT: vmovhpd (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x16,0x00]
	; X86-AVX512-NEXT: # xmm1 = mem[0],zero			; X86-AVX512-NEXT: # xmm0 = xmm0[0],mem[0]
	; X86-AVX512-NEXT: vmovlhps %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf8,0x16,0xc1]
	; X86-AVX512-NEXT: # xmm0 = xmm0[0],xmm1[0]
	; X86-AVX512-NEXT: retl # encoding: [0xc3]			; X86-AVX512-NEXT: retl # encoding: [0xc3]
	;			;
	; X64-SSE-LABEL: test_mm_loadh_pi:			; X64-SSE-LABEL: test_mm_loadh_pi:
	; X64-SSE: # %bb.0:			; X64-SSE: # %bb.0:
	; X64-SSE-NEXT: movq (%rdi), %rax # encoding: [0x48,0x8b,0x07]			; X64-SSE-NEXT: movq (%rdi), %rax # encoding: [0x48,0x8b,0x07]
	; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xf8]			; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xf8]
	; X64-SSE-NEXT: shrq $32, %rax # encoding: [0x48,0xc1,0xe8,0x20]			; X64-SSE-NEXT: shrq $32, %rax # encoding: [0x48,0xc1,0xe8,0x20]
	; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xfc]			; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xfc]
	Show All 38 Lines
	; X86-SSE-NEXT: shufps $228, %xmm0, %xmm1 # encoding: [0x0f,0xc6,0xc8,0xe4]			; X86-SSE-NEXT: shufps $228, %xmm0, %xmm1 # encoding: [0x0f,0xc6,0xc8,0xe4]
	; X86-SSE-NEXT: # xmm1 = xmm1[0,1],xmm0[2,3]			; X86-SSE-NEXT: # xmm1 = xmm1[0,1],xmm0[2,3]
	; X86-SSE-NEXT: movaps %xmm1, %xmm0 # encoding: [0x0f,0x28,0xc1]			; X86-SSE-NEXT: movaps %xmm1, %xmm0 # encoding: [0x0f,0x28,0xc1]
	; X86-SSE-NEXT: retl # encoding: [0xc3]			; X86-SSE-NEXT: retl # encoding: [0xc3]
	;			;
	; X86-AVX1-LABEL: test_mm_loadl_pi:			; X86-AVX1-LABEL: test_mm_loadl_pi:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]			; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX1-NEXT: vmovsd (%eax), %xmm1 # encoding: [0xc5,0xfb,0x10,0x08]			; X86-AVX1-NEXT: vmovlpd (%eax), %xmm0, %xmm0 # encoding: [0xc5,0xf9,0x12,0x00]
	; X86-AVX1-NEXT: # xmm1 = mem[0],zero			; X86-AVX1-NEXT: # xmm0 = mem[0],xmm0[1]
	; X86-AVX1-NEXT: vblendps $3, %xmm1, %xmm0, %xmm0 # encoding: [0xc4,0xe3,0x79,0x0c,0xc1,0x03]
	; X86-AVX1-NEXT: # xmm0 = xmm1[0,1],xmm0[2,3]
	; X86-AVX1-NEXT: retl # encoding: [0xc3]			; X86-AVX1-NEXT: retl # encoding: [0xc3]
	;			;
	; X86-AVX512-LABEL: test_mm_loadl_pi:			; X86-AVX512-LABEL: test_mm_loadl_pi:
	; X86-AVX512: # %bb.0:			; X86-AVX512: # %bb.0:
	; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]			; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX512-NEXT: vmovsd (%eax), %xmm1 # EVEX TO VEX Compression encoding: [0xc5,0xfb,0x10,0x08]			; X86-AVX512-NEXT: vmovlpd (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x12,0x00]
	; X86-AVX512-NEXT: # xmm1 = mem[0],zero			; X86-AVX512-NEXT: # xmm0 = mem[0],xmm0[1]
	; X86-AVX512-NEXT: vblendps $3, %xmm1, %xmm0, %xmm0 # encoding: [0xc4,0xe3,0x79,0x0c,0xc1,0x03]
	; X86-AVX512-NEXT: # xmm0 = xmm1[0,1],xmm0[2,3]
	; X86-AVX512-NEXT: retl # encoding: [0xc3]			; X86-AVX512-NEXT: retl # encoding: [0xc3]
	;			;
	; X64-SSE-LABEL: test_mm_loadl_pi:			; X64-SSE-LABEL: test_mm_loadl_pi:
	; X64-SSE: # %bb.0:			; X64-SSE: # %bb.0:
	; X64-SSE-NEXT: movq (%rdi), %rax # encoding: [0x48,0x8b,0x07]			; X64-SSE-NEXT: movq (%rdi), %rax # encoding: [0x48,0x8b,0x07]
	; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xf8]			; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xf8]
	; X64-SSE-NEXT: shrq $32, %rax # encoding: [0x48,0xc1,0xe8,0x20]			; X64-SSE-NEXT: shrq $32, %rax # encoding: [0x48,0xc1,0xe8,0x20]
	; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xfc]			; X64-SSE-NEXT: movl %eax, -{{[0-9]+}}(%rsp) # encoding: [0x89,0x44,0x24,0xfc]
	▲ Show 20 Lines • Show All 2,079 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_extract-avx.ll

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
ret void		ret void
}		}

define void @legal_vzmovl_2f32_8f32(<2 x float>* %in, <8 x float>* %out) {		define void @legal_vzmovl_2f32_8f32(<2 x float>* %in, <8 x float>* %out) {
; X32-LABEL: legal_vzmovl_2f32_8f32:		; X32-LABEL: legal_vzmovl_2f32_8f32:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
		; X32-NEXT: vxorps %xmm1, %xmm1, %xmm1
		; X32-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; X32-NEXT: vmovaps %ymm0, (%eax)		; X32-NEXT: vmovaps %ymm0, (%eax)
; X32-NEXT: vzeroupper		; X32-NEXT: vzeroupper
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: legal_vzmovl_2f32_8f32:		; X64-LABEL: legal_vzmovl_2f32_8f32:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; X64-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X64-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X64-NEXT: vxorps %xmm1, %xmm1, %xmm1
Show All 37 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v4.ll

Show First 20 Lines • Show All 1,992 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%5 = or <2 x i64> %4, %3		%5 = or <2 x i64> %4, %3
%6 = bitcast <2 x i64> %5 to <4 x i32>		%6 = bitcast <2 x i64> %5 to <4 x i32>
ret <4 x i32> %6		ret <4 x i32> %6
}		}

define <4 x float> @broadcast_v4f32_0101_from_v2f32(<2 x float>* %x) {		define <4 x float> @broadcast_v4f32_0101_from_v2f32(<2 x float>* %x) {
; SSE2-LABEL: broadcast_v4f32_0101_from_v2f32:		; SSE2-LABEL: broadcast_v4f32_0101_from_v2f32:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]		; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0,0]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE3-LABEL: broadcast_v4f32_0101_from_v2f32:		; SSE3-LABEL: broadcast_v4f32_0101_from_v2f32:
; SSE3: # %bb.0:		; SSE3: # %bb.0:
; SSE3-NEXT: movddup {{.*#+}} xmm0 = mem[0,0]		; SSE3-NEXT: movddup {{.*#+}} xmm0 = mem[0,0]
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: broadcast_v4f32_0101_from_v2f32:		; SSSE3-LABEL: broadcast_v4f32_0101_from_v2f32:
▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/widen_load-1.ll

	; RUN: llc -stack-symbol-ordering=0 %s -o - -mattr=-avx -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=SSE			; RUN: llc -stack-symbol-ordering=0 %s -o - -mattr=-avx -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=SSE
	; RUN: llc -stack-symbol-ordering=0 %s -o - -mattr=+avx -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=AVX			; RUN: llc -stack-symbol-ordering=0 %s -o - -mattr=+avx -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=AVX
	; PR4891			; PR4891
	; PR5626			; PR5626

	; This load should be before the call, not after.			; This load should be before the call, not after.

	; SSE: movaps compl+128(%rip), %xmm0			; SSE: movsd compl+128(%rip), %xmm0
	; SSE: movaps %xmm0, (%rsp)			; SSE: movaps %xmm0, (%rsp)
	; SSE: callq killcommon			; SSE: callq killcommon

	; AVX: vmovaps compl+128(%rip), %xmm0			; AVX: vmovsd compl+128(%rip), %xmm0
	; AVX: vmovaps %xmm0, (%rsp)			; AVX: vmovaps %xmm0, (%rsp)
	; AVX: callq killcommon			; AVX: callq killcommon

	@compl = linkonce global [20 x i64] zeroinitializer, align 64 ; <[20 x i64]*> [#uses=1]			@compl = linkonce global [20 x i64] zeroinitializer, align 64 ; <[20 x i64]*> [#uses=1]

	declare void @killcommon(i32* noalias)			declare void @killcommon(i32* noalias)

	define void @reset(<2 x float>* noalias %garbage1) {			define void @reset(<2 x float>* noalias %garbage1) {
	Show All 31 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 169295

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/bitcast-int-to-vector.ll

llvm/trunk/test/CodeGen/X86/fold-load-vec.ll

llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll

llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll

llvm/trunk/test/CodeGen/X86/vec_extract-avx.ll

llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v4.ll

llvm/trunk/test/CodeGen/X86/widen_load-1.ll

[X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.
ClosedPublic