This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq)
ClosedPublic

Authored by RKSimon on Oct 27 2014, 8:23 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet

Commits

rG615ab8e7211c: [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq /…
rL221489: [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq /…

Summary

Split from http://reviews.llvm.org/D5981

Fixed an issue with the VCVTTPD2DQ / VCVTTPS2DQ instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing 256-bit AVX versions

Diff Detail

Event Timeline

RKSimon updated this revision to Diff 15493.Oct 27 2014, 8:23 AM

RKSimon retitled this revision from to Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq).

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: qcolombet, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

RKSimon retitled this revision from Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq) to [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq).Oct 27 2014, 11:44 AM

Hi Simon,

Thanks for having split the patches.

See my comments inlined.

Cheers,
-Quentin

lib/Target/X86/X86InstrInfo.cpp
936	While you are fixing this kind of issue, could you double check the opcode in there? All the CVTs look suspicious to me.
test/CodeGen/X86/avx1-stack-reload-folding.ll
22	Could you trigger the transformation with something simpler (like load, cvt, store, with both addresses as argument)? Maybe by using fast-isel?

Updated the patch to follow the test pattern used in http://reviews.llvm.org/D5981

Added the (v)cvtps2dq / (v)cvtpd2dq folds as well

Many of the Int_CVT and CVT 'scalar' folds look suspicious and I wish to more thoroughly test them, but would prefer to do that separately from this patch.

I found some rather poor code generation for non-AVX code for double -> int32 that needs fixing as well.

Many of the Int_CVT and CVT 'scalar' folds look suspicious and I wish to more thoroughly test them, but would prefer to do that separately from this patch.

Sounds good to me.

Please add some CHECK-LABELs for the functions you added for testing.
With that, LGTM (no need to send an updated review).

Thanks,
Q.

This revision is now accepted and ready to land.Nov 6 2014, 1:32 PM

Closed by commit rL221489 (authored by @RKSimon).

Revision Contents

Path

Size

lib/

Target/

X86/

	X86InstrInfo.cpp
	X86InstrInfo.cpp (revision 220681)

6 lines

test/

CodeGen/

X86/

	avx1-stack-reload-folding.ll
	avx1-stack-reload-folding.ll (revision 220681)

26 lines

Diff 15493

lib/Target/X86/X86InstrInfo.cpp

Show First 20 Lines • Show All 520 Lines • ▼ Show 20 Lines	static const X86OpTblEntry OpTbl1[] = {
{ X86::VCVTTSS2SI64rr, X86::VCVTTSS2SI64rm, 0 },		{ X86::VCVTTSS2SI64rr, X86::VCVTTSS2SI64rm, 0 },
{ X86::Int_VCVTTSS2SI64rr,X86::Int_VCVTTSS2SI64rm,0 },		{ X86::Int_VCVTTSS2SI64rr,X86::Int_VCVTTSS2SI64rm,0 },
{ X86::VCVTTSS2SIrr, X86::VCVTTSS2SIrm, 0 },		{ X86::VCVTTSS2SIrr, X86::VCVTTSS2SIrm, 0 },
{ X86::Int_VCVTTSS2SIrr,X86::Int_VCVTTSS2SIrm, 0 },		{ X86::Int_VCVTTSS2SIrr,X86::Int_VCVTTSS2SIrm, 0 },
{ X86::VCVTSD2SI64rr, X86::VCVTSD2SI64rm, 0 },		{ X86::VCVTSD2SI64rr, X86::VCVTSD2SI64rm, 0 },
{ X86::VCVTSD2SIrr, X86::VCVTSD2SIrm, 0 },		{ X86::VCVTSD2SIrr, X86::VCVTSD2SIrm, 0 },
{ X86::VCVTSS2SI64rr, X86::VCVTSS2SI64rm, 0 },		{ X86::VCVTSS2SI64rr, X86::VCVTSS2SI64rm, 0 },
{ X86::VCVTSS2SIrr, X86::VCVTSS2SIrm, 0 },		{ X86::VCVTSS2SIrr, X86::VCVTSS2SIrm, 0 },
		{ X86::VCVTTPD2DQrr, X86::VCVTTPD2DQXrm, 0 },
		{ X86::VCVTTPS2DQrr, X86::VCVTTPS2DQrm, 0 },
{ X86::VMOV64toPQIrr, X86::VMOVQI2PQIrm, 0 },		{ X86::VMOV64toPQIrr, X86::VMOVQI2PQIrm, 0 },
{ X86::VMOV64toSDrr, X86::VMOV64toSDrm, 0 },		{ X86::VMOV64toSDrr, X86::VMOV64toSDrm, 0 },
{ X86::VMOVAPDrr, X86::VMOVAPDrm, TB_ALIGN_16 },		{ X86::VMOVAPDrr, X86::VMOVAPDrm, TB_ALIGN_16 },
{ X86::VMOVAPSrr, X86::VMOVAPSrm, TB_ALIGN_16 },		{ X86::VMOVAPSrr, X86::VMOVAPSrm, TB_ALIGN_16 },
{ X86::VMOVDDUPrr, X86::VMOVDDUPrm, 0 },		{ X86::VMOVDDUPrr, X86::VMOVDDUPrm, 0 },
{ X86::VMOVDI2PDIrr, X86::VMOVDI2PDIrm, 0 },		{ X86::VMOVDI2PDIrr, X86::VMOVDI2PDIrm, 0 },
{ X86::VMOVDI2SSrr, X86::VMOVDI2SSrm, 0 },		{ X86::VMOVDI2SSrr, X86::VMOVDI2SSrm, 0 },
{ X86::VMOVDQArr, X86::VMOVDQArm, TB_ALIGN_16 },		{ X86::VMOVDQArr, X86::VMOVDQArm, TB_ALIGN_16 },
Show All 17 Lines	static const X86OpTblEntry OpTbl1[] = {
{ X86::VRSQRTPSr_Int, X86::VRSQRTPSm_Int, 0 },		{ X86::VRSQRTPSr_Int, X86::VRSQRTPSm_Int, 0 },
{ X86::VSQRTPDr, X86::VSQRTPDm, 0 },		{ X86::VSQRTPDr, X86::VSQRTPDm, 0 },
{ X86::VSQRTPSr, X86::VSQRTPSm, 0 },		{ X86::VSQRTPSr, X86::VSQRTPSm, 0 },
{ X86::VUCOMISDrr, X86::VUCOMISDrm, 0 },		{ X86::VUCOMISDrr, X86::VUCOMISDrm, 0 },
{ X86::VUCOMISSrr, X86::VUCOMISSrm, 0 },		{ X86::VUCOMISSrr, X86::VUCOMISSrm, 0 },
{ X86::VBROADCASTSSrr, X86::VBROADCASTSSrm, TB_NO_REVERSE },		{ X86::VBROADCASTSSrr, X86::VBROADCASTSSrm, TB_NO_REVERSE },

// AVX 256-bit foldable instructions		// AVX 256-bit foldable instructions
		{ X86::VCVTTPD2DQYrr, X86::VCVTTPD2DQYrm, 0 },
		{ X86::VCVTTPS2DQYrr, X86::VCVTTPS2DQYrm, 0 },
{ X86::VMOVAPDYrr, X86::VMOVAPDYrm, TB_ALIGN_32 },		{ X86::VMOVAPDYrr, X86::VMOVAPDYrm, TB_ALIGN_32 },
{ X86::VMOVAPSYrr, X86::VMOVAPSYrm, TB_ALIGN_32 },		{ X86::VMOVAPSYrr, X86::VMOVAPSYrm, TB_ALIGN_32 },
{ X86::VMOVDQAYrr, X86::VMOVDQAYrm, TB_ALIGN_32 },		{ X86::VMOVDQAYrr, X86::VMOVDQAYrm, TB_ALIGN_32 },
{ X86::VMOVUPDYrr, X86::VMOVUPDYrm, 0 },		{ X86::VMOVUPDYrr, X86::VMOVUPDYrm, 0 },
{ X86::VMOVUPSYrr, X86::VMOVUPSYrm, 0 },		{ X86::VMOVUPSYrr, X86::VMOVUPSYrm, 0 },
{ X86::VPERMILPDYri, X86::VPERMILPDYmi, 0 },		{ X86::VPERMILPDYri, X86::VPERMILPDYmi, 0 },
{ X86::VPERMILPSYri, X86::VPERMILPSYmi, 0 },		{ X86::VPERMILPSYri, X86::VPERMILPSYmi, 0 },
{ X86::VRCPPSYr, X86::VRCPPSYm, 0 },		{ X86::VRCPPSYr, X86::VRCPPSYm, 0 },
▲ Show 20 Lines • Show All 354 Lines • ▼ Show 20 Lines	static const X86OpTblEntry OpTbl2[] = {
{ X86::Int_VCVTSI2SD64rr, X86::Int_VCVTSI2SD64rm, 0 },		{ X86::Int_VCVTSI2SD64rr, X86::Int_VCVTSI2SD64rm, 0 },
{ X86::VCVTSI2SDrr, X86::VCVTSI2SDrm, 0 },		{ X86::VCVTSI2SDrr, X86::VCVTSI2SDrm, 0 },
{ X86::Int_VCVTSI2SDrr, X86::Int_VCVTSI2SDrm, 0 },		{ X86::Int_VCVTSI2SDrr, X86::Int_VCVTSI2SDrm, 0 },
{ X86::VCVTSI2SS64rr, X86::VCVTSI2SS64rm, 0 },		{ X86::VCVTSI2SS64rr, X86::VCVTSI2SS64rm, 0 },
{ X86::Int_VCVTSI2SS64rr, X86::Int_VCVTSI2SS64rm, 0 },		{ X86::Int_VCVTSI2SS64rr, X86::Int_VCVTSI2SS64rm, 0 },
{ X86::VCVTSI2SSrr, X86::VCVTSI2SSrm, 0 },		{ X86::VCVTSI2SSrr, X86::VCVTSI2SSrm, 0 },
{ X86::Int_VCVTSI2SSrr, X86::Int_VCVTSI2SSrm, 0 },		{ X86::Int_VCVTSI2SSrr, X86::Int_VCVTSI2SSrm, 0 },
{ X86::VCVTSS2SDrr, X86::VCVTSS2SDrm, 0 },		{ X86::VCVTSS2SDrr, X86::VCVTSS2SDrm, 0 },
{ X86::Int_VCVTSS2SDrr, X86::Int_VCVTSS2SDrm, 0 },		{ X86::Int_VCVTSS2SDrr, X86::Int_VCVTSS2SDrm, 0 },
		qcolombetUnsubmitted Not Done Reply Inline Actions While you are fixing this kind of issue, could you double check the opcode in there? All the CVTs look suspicious to me. qcolombet: While you are fixing this kind of issue, could you double check the opcode in there? All the…
{ X86::VCVTTPD2DQrr, X86::VCVTTPD2DQXrm, 0 },
{ X86::VCVTTPS2DQrr, X86::VCVTTPS2DQrm, 0 },
{ X86::VRSQRTSSr, X86::VRSQRTSSm, 0 },		{ X86::VRSQRTSSr, X86::VRSQRTSSm, 0 },
{ X86::VSQRTSDr, X86::VSQRTSDm, 0 },		{ X86::VSQRTSDr, X86::VSQRTSDm, 0 },
{ X86::VSQRTSSr, X86::VSQRTSSm, 0 },		{ X86::VSQRTSSr, X86::VSQRTSSm, 0 },
{ X86::VADDPDrr, X86::VADDPDrm, 0 },		{ X86::VADDPDrr, X86::VADDPDrm, 0 },
{ X86::VADDPSrr, X86::VADDPSrm, 0 },		{ X86::VADDPSrr, X86::VADDPSrm, 0 },
{ X86::VADDSDrr, X86::VADDSDrm, 0 },		{ X86::VADDSDrr, X86::VADDSDrm, 0 },
{ X86::VADDSSrr, X86::VADDSSrm, 0 },		{ X86::VADDSSrr, X86::VADDSSrm, 0 },
{ X86::VADDSUBPDrr, X86::VADDSUBPDrm, 0 },		{ X86::VADDSUBPDrr, X86::VADDSUBPDrm, 0 },
▲ Show 20 Lines • Show All 4,755 Lines • Show Last 20 Lines

test/CodeGen/X86/avx1-stack-reload-folding.ll

	; RUN: llc -O3 -disable-peephole -mcpu=corei7-avx -mattr=+avx < %s \| FileCheck %s			; RUN: llc -O3 -disable-peephole -mcpu=corei7-avx -mattr=+avx < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	; Function Attrs: nounwind readonly uwtable			; Function Attrs: nounwind readonly uwtable
	define <32 x double> @_Z14vstack_foldDv32_dS_(<32 x double> %a, <32 x double> %b) #0 {			define <32 x double> @_Z14vstack_foldDv32_dS_(<32 x double> %a, <32 x double> %b) #0 {
	%1 = fadd <32 x double> %a, %b			%1 = fadd <32 x double> %a, %b
	%2 = fsub <32 x double> %a, %b			%2 = fsub <32 x double> %a, %b
	%3 = fmul <32 x double> %1, %2			%3 = fmul <32 x double> %1, %2
	ret <32 x double> %3			ret <32 x double> %3

	;CHECK-NOT: vmovapd {{.*#+}} 32-byte Reload			;CHECK-NOT: vmovapd {{.*#+}} 32-byte Reload
	;CHECK: vmulpd {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}}, {{%ymm[0-9][0-9]}} {{.#+}} 32-byte Folded Reload			;CHECK: vmulpd {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}}, {{%ymm[0-9][0-9]}} {{.#+}} 32-byte Folded Reload
	;CHECK-NOT: vmovapd {{.*#+}} 32-byte Reload			;CHECK-NOT: vmovapd {{.*#+}} 32-byte Reload
	}			}

				define <64 x i32> @stack_fold_cvttpd2dq(<64 x double> %a, <64 x double> %b) #0 {
				%1 = fadd <64 x double> %a, %b
				%2 = fsub <64 x double> %a, %b
				%3 = fptosi <64 x double> %1 to <64 x i32>
				%4 = fptosi <64 x double> %2 to <64 x i32>
				qcolombetUnsubmitted Not Done Reply Inline Actions Could you trigger the transformation with something simpler (like load, cvt, store, with both addresses as argument)? Maybe by using fast-isel? qcolombet: Could you trigger the transformation with something simpler (like load, cvt, store, with both…
				%5 = or <64 x i32> %3, %4
				ret <64 x i32> %5

				;CHECK-NOT: vmovapd {{.*#+}} 32-byte Reload
				;CHECK: vcvttpd2dqy {{[0-9]}}(%rsp), {{%xmm[0-9][0-9]}} {{.*#+}} 32-byte Folded Reload
				;CHECK-NOT: vmovapd {{.*#+}} 32-byte Reload
				}

				define <64 x i32> @stack_fold_cvttps2dq(<64 x float> %a, <64 x float> %b) #0 {
				%1 = fadd <64 x float> %a, %b
				%2 = fsub <64 x float> %a, %b
				%3 = fptosi <64 x float> %1 to <64 x i32>
				%4 = fptosi <64 x float> %2 to <64 x i32>
				%5 = or <64 x i32> %3, %4
				ret <64 x i32> %5

				;CHECK-NOT: vmovaps {{.*#+}} 32-byte Reload
				;CHECK: vcvttps2dq {{[0-9]}}(%rsp), {{%ymm[0-9][0-9]}} {{.*#+}} 32-byte Folded Reload
				;CHECK-NOT: vmovaps {{.*#+}} 32-byte Reload
				}