This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq)
ClosedPublic

Authored by RKSimon on Oct 27 2014, 8:23 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet

Commits

rG615ab8e7211c: [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq /…
rL221489: [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq /…

Summary

Split from http://reviews.llvm.org/D5981

Fixed an issue with the VCVTTPD2DQ / VCVTTPS2DQ instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing 256-bit AVX versions

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 15493.Oct 27 2014, 8:23 AM

RKSimon retitled this revision from to Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq).

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: qcolombet, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

RKSimon retitled this revision from Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq) to [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq).Oct 27 2014, 11:44 AM

Hi Simon,

Thanks for having split the patches.

See my comments inlined.

Cheers,
-Quentin

lib/Target/X86/X86InstrInfo.cpp
936 ↗	(On Diff #15493)	While you are fixing this kind of issue, could you double check the opcode in there? All the CVTs look suspicious to me.
test/CodeGen/X86/avx1-stack-reload-folding.ll
22 ↗	(On Diff #15493)	Could you trigger the transformation with something simpler (like load, cvt, store, with both addresses as argument)? Maybe by using fast-isel?

Updated the patch to follow the test pattern used in http://reviews.llvm.org/D5981

Added the (v)cvtps2dq / (v)cvtpd2dq folds as well

Many of the Int_CVT and CVT 'scalar' folds look suspicious and I wish to more thoroughly test them, but would prefer to do that separately from this patch.

I found some rather poor code generation for non-AVX code for double -> int32 that needs fixing as well.

Many of the Int_CVT and CVT 'scalar' folds look suspicious and I wish to more thoroughly test them, but would prefer to do that separately from this patch.

Sounds good to me.

Please add some CHECK-LABELs for the functions you added for testing.
With that, LGTM (no need to send an updated review).

Thanks,
Q.

This revision is now accepted and ready to land.Nov 6 2014, 1:32 PM

Closed by commit rL221489 (authored by @RKSimon).

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86InstrInfo.cpp

12 lines

test/

CodeGen/

X86/

avx1-stack-reload-folding.ll

34 lines

Diff 15893

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

Show First 20 Lines • Show All 443 Lines • ▼ Show 20 Lines	static const X86OpTblEntry OpTbl1[] = {
{ X86::IMUL64rri8, X86::IMUL64rmi8, 0 },		{ X86::IMUL64rri8, X86::IMUL64rmi8, 0 },
{ X86::Int_COMISDrr, X86::Int_COMISDrm, 0 },		{ X86::Int_COMISDrr, X86::Int_COMISDrm, 0 },
{ X86::Int_COMISSrr, X86::Int_COMISSrm, 0 },		{ X86::Int_COMISSrr, X86::Int_COMISSrm, 0 },
{ X86::CVTSD2SI64rr, X86::CVTSD2SI64rm, 0 },		{ X86::CVTSD2SI64rr, X86::CVTSD2SI64rm, 0 },
{ X86::CVTSD2SIrr, X86::CVTSD2SIrm, 0 },		{ X86::CVTSD2SIrr, X86::CVTSD2SIrm, 0 },
{ X86::CVTSS2SI64rr, X86::CVTSS2SI64rm, 0 },		{ X86::CVTSS2SI64rr, X86::CVTSS2SI64rm, 0 },
{ X86::CVTSS2SIrr, X86::CVTSS2SIrm, 0 },		{ X86::CVTSS2SIrr, X86::CVTSS2SIrm, 0 },
{ X86::CVTDQ2PSrr, X86::CVTDQ2PSrm, TB_ALIGN_16 },		{ X86::CVTDQ2PSrr, X86::CVTDQ2PSrm, TB_ALIGN_16 },
		{ X86::CVTPD2DQrr, X86::CVTPD2DQrm, TB_ALIGN_16 },
		{ X86::CVTPS2DQrr, X86::CVTPS2DQrm, TB_ALIGN_16 },
{ X86::CVTTPD2DQrr, X86::CVTTPD2DQrm, TB_ALIGN_16 },		{ X86::CVTTPD2DQrr, X86::CVTTPD2DQrm, TB_ALIGN_16 },
{ X86::CVTTPS2DQrr, X86::CVTTPS2DQrm, TB_ALIGN_16 },		{ X86::CVTTPS2DQrr, X86::CVTTPS2DQrm, TB_ALIGN_16 },
{ X86::Int_CVTTSD2SI64rr,X86::Int_CVTTSD2SI64rm, 0 },		{ X86::Int_CVTTSD2SI64rr,X86::Int_CVTTSD2SI64rm, 0 },
{ X86::Int_CVTTSD2SIrr, X86::Int_CVTTSD2SIrm, 0 },		{ X86::Int_CVTTSD2SIrr, X86::Int_CVTTSD2SIrm, 0 },
{ X86::Int_CVTTSS2SI64rr,X86::Int_CVTTSS2SI64rm, 0 },		{ X86::Int_CVTTSS2SI64rr,X86::Int_CVTTSS2SI64rm, 0 },
{ X86::Int_CVTTSS2SIrr, X86::Int_CVTTSS2SIrm, 0 },		{ X86::Int_CVTTSS2SIrr, X86::Int_CVTTSS2SIrm, 0 },
{ X86::Int_UCOMISDrr, X86::Int_UCOMISDrm, 0 },		{ X86::Int_UCOMISDrr, X86::Int_UCOMISDrm, 0 },
{ X86::Int_UCOMISSrr, X86::Int_UCOMISSrm, 0 },		{ X86::Int_UCOMISSrr, X86::Int_UCOMISSrm, 0 },
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static const X86OpTblEntry OpTbl1[] = {
{ X86::Int_VCVTTSS2SI64rr,X86::Int_VCVTTSS2SI64rm,0 },		{ X86::Int_VCVTTSS2SI64rr,X86::Int_VCVTTSS2SI64rm,0 },
{ X86::VCVTTSS2SIrr, X86::VCVTTSS2SIrm, 0 },		{ X86::VCVTTSS2SIrr, X86::VCVTTSS2SIrm, 0 },
{ X86::Int_VCVTTSS2SIrr,X86::Int_VCVTTSS2SIrm, 0 },		{ X86::Int_VCVTTSS2SIrr,X86::Int_VCVTTSS2SIrm, 0 },
{ X86::VCVTSD2SI64rr, X86::VCVTSD2SI64rm, 0 },		{ X86::VCVTSD2SI64rr, X86::VCVTSD2SI64rm, 0 },
{ X86::VCVTSD2SIrr, X86::VCVTSD2SIrm, 0 },		{ X86::VCVTSD2SIrr, X86::VCVTSD2SIrm, 0 },
{ X86::VCVTSS2SI64rr, X86::VCVTSS2SI64rm, 0 },		{ X86::VCVTSS2SI64rr, X86::VCVTSS2SI64rm, 0 },
{ X86::VCVTSS2SIrr, X86::VCVTSS2SIrm, 0 },		{ X86::VCVTSS2SIrr, X86::VCVTSS2SIrm, 0 },
{ X86::VCVTDQ2PSrr, X86::VCVTDQ2PSrm, 0 },		{ X86::VCVTDQ2PSrr, X86::VCVTDQ2PSrm, 0 },
		{ X86::VCVTPD2DQrr, X86::VCVTPD2DQXrm, 0 },
		{ X86::VCVTPS2DQrr, X86::VCVTPS2DQrm, 0 },
		{ X86::VCVTTPD2DQrr, X86::VCVTTPD2DQXrm, 0 },
		{ X86::VCVTTPS2DQrr, X86::VCVTTPS2DQrm, 0 },
{ X86::VMOV64toPQIrr, X86::VMOVQI2PQIrm, 0 },		{ X86::VMOV64toPQIrr, X86::VMOVQI2PQIrm, 0 },
{ X86::VMOV64toSDrr, X86::VMOV64toSDrm, 0 },		{ X86::VMOV64toSDrr, X86::VMOV64toSDrm, 0 },
{ X86::VMOVAPDrr, X86::VMOVAPDrm, TB_ALIGN_16 },		{ X86::VMOVAPDrr, X86::VMOVAPDrm, TB_ALIGN_16 },
{ X86::VMOVAPSrr, X86::VMOVAPSrm, TB_ALIGN_16 },		{ X86::VMOVAPSrr, X86::VMOVAPSrm, TB_ALIGN_16 },
{ X86::VMOVDDUPrr, X86::VMOVDDUPrm, 0 },		{ X86::VMOVDDUPrr, X86::VMOVDDUPrm, 0 },
{ X86::VMOVDI2PDIrr, X86::VMOVDI2PDIrm, 0 },		{ X86::VMOVDI2PDIrr, X86::VMOVDI2PDIrm, 0 },
{ X86::VMOVDI2SSrr, X86::VMOVDI2SSrm, 0 },		{ X86::VMOVDI2SSrr, X86::VMOVDI2SSrm, 0 },
{ X86::VMOVDQArr, X86::VMOVDQArm, TB_ALIGN_16 },		{ X86::VMOVDQArr, X86::VMOVDQArm, TB_ALIGN_16 },
Show All 18 Lines	static const X86OpTblEntry OpTbl1[] = {
{ X86::VSQRTPDr, X86::VSQRTPDm, 0 },		{ X86::VSQRTPDr, X86::VSQRTPDm, 0 },
{ X86::VSQRTPSr, X86::VSQRTPSm, 0 },		{ X86::VSQRTPSr, X86::VSQRTPSm, 0 },
{ X86::VUCOMISDrr, X86::VUCOMISDrm, 0 },		{ X86::VUCOMISDrr, X86::VUCOMISDrm, 0 },
{ X86::VUCOMISSrr, X86::VUCOMISSrm, 0 },		{ X86::VUCOMISSrr, X86::VUCOMISSrm, 0 },
{ X86::VBROADCASTSSrr, X86::VBROADCASTSSrm, TB_NO_REVERSE },		{ X86::VBROADCASTSSrr, X86::VBROADCASTSSrm, TB_NO_REVERSE },

// AVX 256-bit foldable instructions		// AVX 256-bit foldable instructions
{ X86::VCVTDQ2PSYrr, X86::VCVTDQ2PSYrm, 0 },		{ X86::VCVTDQ2PSYrr, X86::VCVTDQ2PSYrm, 0 },
		{ X86::VCVTPD2DQYrr, X86::VCVTPD2DQYrm, 0 },
		{ X86::VCVTPS2DQYrr, X86::VCVTPS2DQYrm, 0 },
		{ X86::VCVTTPD2DQYrr, X86::VCVTTPD2DQYrm, 0 },
		{ X86::VCVTTPS2DQYrr, X86::VCVTTPS2DQYrm, 0 },
{ X86::VMOVAPDYrr, X86::VMOVAPDYrm, TB_ALIGN_32 },		{ X86::VMOVAPDYrr, X86::VMOVAPDYrm, TB_ALIGN_32 },
{ X86::VMOVAPSYrr, X86::VMOVAPSYrm, TB_ALIGN_32 },		{ X86::VMOVAPSYrr, X86::VMOVAPSYrm, TB_ALIGN_32 },
{ X86::VMOVDQAYrr, X86::VMOVDQAYrm, TB_ALIGN_32 },		{ X86::VMOVDQAYrr, X86::VMOVDQAYrm, TB_ALIGN_32 },
{ X86::VMOVUPDYrr, X86::VMOVUPDYrm, 0 },		{ X86::VMOVUPDYrr, X86::VMOVUPDYrm, 0 },
{ X86::VMOVUPSYrr, X86::VMOVUPSYrm, 0 },		{ X86::VMOVUPSYrr, X86::VMOVUPSYrm, 0 },
{ X86::VPERMILPDYri, X86::VPERMILPDYmi, 0 },		{ X86::VPERMILPDYri, X86::VPERMILPDYmi, 0 },
{ X86::VPERMILPSYri, X86::VPERMILPSYmi, 0 },		{ X86::VPERMILPSYri, X86::VPERMILPSYmi, 0 },
{ X86::VRCPPSYr, X86::VRCPPSYm, 0 },		{ X86::VRCPPSYr, X86::VRCPPSYm, 0 },
▲ Show 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	static const X86OpTblEntry OpTbl2[] = {
{ X86::VCVTSI2SDrr, X86::VCVTSI2SDrm, 0 },		{ X86::VCVTSI2SDrr, X86::VCVTSI2SDrm, 0 },
{ X86::Int_VCVTSI2SDrr, X86::Int_VCVTSI2SDrm, 0 },		{ X86::Int_VCVTSI2SDrr, X86::Int_VCVTSI2SDrm, 0 },
{ X86::VCVTSI2SS64rr, X86::VCVTSI2SS64rm, 0 },		{ X86::VCVTSI2SS64rr, X86::VCVTSI2SS64rm, 0 },
{ X86::Int_VCVTSI2SS64rr, X86::Int_VCVTSI2SS64rm, 0 },		{ X86::Int_VCVTSI2SS64rr, X86::Int_VCVTSI2SS64rm, 0 },
{ X86::VCVTSI2SSrr, X86::VCVTSI2SSrm, 0 },		{ X86::VCVTSI2SSrr, X86::VCVTSI2SSrm, 0 },
{ X86::Int_VCVTSI2SSrr, X86::Int_VCVTSI2SSrm, 0 },		{ X86::Int_VCVTSI2SSrr, X86::Int_VCVTSI2SSrm, 0 },
{ X86::VCVTSS2SDrr, X86::VCVTSS2SDrm, 0 },		{ X86::VCVTSS2SDrr, X86::VCVTSS2SDrm, 0 },
{ X86::Int_VCVTSS2SDrr, X86::Int_VCVTSS2SDrm, 0 },		{ X86::Int_VCVTSS2SDrr, X86::Int_VCVTSS2SDrm, 0 },
{ X86::VCVTTPD2DQrr, X86::VCVTTPD2DQXrm, 0 },
{ X86::VCVTTPS2DQrr, X86::VCVTTPS2DQrm, 0 },
{ X86::VRSQRTSSr, X86::VRSQRTSSm, 0 },		{ X86::VRSQRTSSr, X86::VRSQRTSSm, 0 },
{ X86::VSQRTSDr, X86::VSQRTSDm, 0 },		{ X86::VSQRTSDr, X86::VSQRTSDm, 0 },
{ X86::VSQRTSSr, X86::VSQRTSSm, 0 },		{ X86::VSQRTSSr, X86::VSQRTSSm, 0 },
{ X86::VADDPDrr, X86::VADDPDrm, 0 },		{ X86::VADDPDrr, X86::VADDPDrm, 0 },
{ X86::VADDPSrr, X86::VADDPSrm, 0 },		{ X86::VADDPSrr, X86::VADDPSrm, 0 },
{ X86::VADDSDrr, X86::VADDSDrm, 0 },		{ X86::VADDSDrr, X86::VADDSDrm, 0 },
{ X86::VADDSSrr, X86::VADDSSrm, 0 },		{ X86::VADDSSrr, X86::VADDSSrm, 0 },
{ X86::VADDSUBPDrr, X86::VADDSUBPDrm, 0 },		{ X86::VADDSUBPDrr, X86::VADDSUBPDrm, 0 },
▲ Show 20 Lines • Show All 4,805 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx1-stack-reload-folding.ll

	; RUN: llc -O3 -disable-peephole -mcpu=corei7-avx -mattr=+avx < %s \| FileCheck %s			; RUN: llc -O3 -disable-peephole -mcpu=corei7-avx -mattr=+avx < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	; Stack reload folding tests - we use the 'big vectors' pattern to guarantee spilling to stack.			; Stack reload folding tests - we use the 'big vectors' pattern to guarantee spilling to stack.
	;			;
	; Many of these tests are primarily to check memory folding with specific instructions. Using a basic			; Many of these tests are primarily to check memory folding with specific instructions. Using a basic
	; load/cvt/store pattern to test for this would mean that it wouldn't be the memory folding code thats			; load/cvt/store pattern to test for this would mean that it wouldn't be the memory folding code thats
	; being tested - the load-execute version of the instruction from the tables would be matched instead.			; being tested - the load-execute version of the instruction from the tables would be matched instead.

	define void @stack_fold_vmulpd(<64 x double>* %a, <64 x double>* %b, <64 x double>* %c) {			define void @stack_fold_vmulpd(<64 x double>* %a, <64 x double>* %b, <64 x double>* %c) {
				;CHECK-LABEL: stack_fold_vmulpd
	;CHECK: vmulpd {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}}, {{%ymm[0-9][0-9]}} {{.#+}} 32-byte Folded Reload			;CHECK: vmulpd {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}}, {{%ymm[0-9][0-9]}} {{.#+}} 32-byte Folded Reload

	%1 = load <64 x double>* %a			%1 = load <64 x double>* %a
	%2 = load <64 x double>* %b			%2 = load <64 x double>* %b
	%3 = fadd <64 x double> %1, %2			%3 = fadd <64 x double> %1, %2
	%4 = fsub <64 x double> %1, %2			%4 = fsub <64 x double> %1, %2
	%5 = fmul <64 x double> %3, %4			%5 = fmul <64 x double> %3, %4
	store <64 x double> %5, <64 x double>* %c			store <64 x double> %5, <64 x double>* %c
	ret void			ret void
	}			}

	define void @stack_fold_cvtdq2ps(<128 x i32>* %a, <128 x i32>* %b, <128 x float>* %c) {			define void @stack_fold_cvtdq2ps(<128 x i32>* %a, <128 x i32>* %b, <128 x float>* %c) {
				;CHECK-LABEL: stack_fold_cvtdq2ps
	;CHECK: vcvtdq2ps {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}} {{.*#+}} 32-byte Folded Reload			;CHECK: vcvtdq2ps {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}} {{.*#+}} 32-byte Folded Reload

	%1 = load <128 x i32>* %a			%1 = load <128 x i32>* %a
	%2 = load <128 x i32>* %b			%2 = load <128 x i32>* %b
	%3 = and <128 x i32> %1, %2			%3 = and <128 x i32> %1, %2
	%4 = xor <128 x i32> %1, %2			%4 = xor <128 x i32> %1, %2
	%5 = sitofp <128 x i32> %3 to <128 x float>			%5 = sitofp <128 x i32> %3 to <128 x float>
	%6 = sitofp <128 x i32> %4 to <128 x float>			%6 = sitofp <128 x i32> %4 to <128 x float>
	%7 = fadd <128 x float> %5, %6			%7 = fadd <128 x float> %5, %6
	store <128 x float> %7, <128 x float>* %c			store <128 x float> %7, <128 x float>* %c
	ret void			ret void
	}			}

				define void @stack_fold_cvttpd2dq(<64 x double>* %a, <64 x double>* %b, <64 x i32>* %c) #0 {
				;CHECK-LABEL: stack_fold_cvttpd2dq
				;CHECK: vcvttpd2dqy {{[0-9]}}(%rsp), {{%xmm[0-9][0-9]}} {{.*#+}} 32-byte Folded Reload

				%1 = load <64 x double>* %a
				%2 = load <64 x double>* %b
				%3 = fadd <64 x double> %1, %2
				%4 = fsub <64 x double> %1, %2
				%5 = fptosi <64 x double> %3 to <64 x i32>
				%6 = fptosi <64 x double> %4 to <64 x i32>
				%7 = or <64 x i32> %5, %6
				store <64 x i32> %7, <64 x i32>* %c
				ret void
				}

				define void @stack_fold_cvttps2dq(<128 x float>* %a, <128 x float>* %b, <128 x i32>* %c) #0 {
				;CHECK-LABEL: stack_fold_cvttps2dq
				;CHECK: vcvttps2dq {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}} {{.*#+}} 32-byte Folded Reload

				%1 = load <128 x float>* %a
				%2 = load <128 x float>* %b
				%3 = fadd <128 x float> %1, %2
				%4 = fsub <128 x float> %1, %2
				%5 = fptosi <128 x float> %3 to <128 x i32>
				%6 = fptosi <128 x float> %4 to <128 x i32>
				%7 = or <128 x i32> %5, %6
				store <128 x i32> %7, <128 x i32>* %c
				ret void
				}