This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
1/2
PPCISelLowering.cpp
1/6
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
f128-arith.ll
-
f128-conv.ll

Differential D47569

[Power9]Legalize and emit code for quad-precision convert from single-precision
ClosedPublic

Authored by lei on May 30 2018, 9:10 PM.

Download Raw Diff

Details

Reviewers

nemanjai
kbarton
stefanp
hfinkel

Commits

rGd17c39ccaac6: [Power9]Legalize and emit code for quad-precision convert from single-precision
rL336307: [Power9]Legalize and emit code for quad-precision convert from single-precision

Summary

Legalize and emit code for quad-precision floating point operation conversion of single-precision value to quad-precision.

Diff Detail

Event Timeline

lei created this revision.May 30 2018, 9:10 PM

nemanjai added inline comments.May 31 2018, 1:06 AM

lib/Target/PowerPC/PPCISelLowering.cpp
801	I imagine this just goes away since it's handled in the other patch.
lib/Target/PowerPC/PPCInstrVSX.td
3386	Huh? We are copying the sign of the input to itself? That seems like an unnecessary noop. Why do we need that?
3388	These two patterns seem: Redundant since we can handle loading and extending separately already anyway Wrong if the weird sign copying sequence is actually necessary We will produce a different sequence for these two equivalent snippets of code and that seems wrong: __float128 test1(float Ptr) { return Ptr; } vs. float __attribute__((noinline)) getFromPtr(float Ptr) { return Ptr; } __float128 test2(float *Ptr) { return getFromPtr(Ptr); }

nemanjai requested changes to this revision.May 31 2018, 1:20 AM

nemanjai added inline comments.

lib/Target/PowerPC/PPCInstrVSX.td
3386	Oh I see the motivation here - I imagine it's because of the code coming out of GCC. If that's the case, please remove this. We do not need to replicate this. The reason they use a copy-sign instruction is actually to move the value from the FPR into a VR (we use `xxlor`). On a side note, the instruction they use to copy scalar values between VSR's is a bit better since it allows for more parallelism (even if it doesn't provide shorter latency). But that's for a separate patch.

This revision now requires changes to proceed.May 31 2018, 1:20 AM

address review comments.

lei marked 2 inline comments as done.May 31 2018, 2:38 PM

lei added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
801	correct!
lib/Target/PowerPC/PPCInstrVSX.td
3386	k

lei marked 2 inline comments as done.May 31 2018, 2:56 PM

lei added inline comments.

lib/Target/PowerPC/PPCInstrVSX.td
3388	Is this equivalent? `test1` returns `*Ptr` which is pass in via GPR as it's a pointer to a float, whereas `test2` is returning the return value from `getFromPtr` which is in a FPR.

nemanjai added inline comments.May 31 2018, 4:40 PM

lib/Target/PowerPC/PPCInstrVSX.td
3388	Well sure. Semantically they're exactly the same thing - load a 4-byte float from memory, convert to `__float128` and return.

LGTM. Thanks for the updates.

This revision is now accepted and ready to land.May 31 2018, 4:53 PM

lei added a parent revision: D47552: [Power9] Implement float128 parameter passing and return values.Jun 19 2018, 7:34 AM

Closed by commit rL336307: [Power9]Legalize and emit code for quad-precision convert from single-precision (authored by lei). · Explain WhyJul 4 2018, 9:23 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.cpp

7 lines

PPCInstrVSX.td

9 lines

test/

CodeGen/

PowerPC/

f128-arith.ll

27 lines

f128-conv.ll

162 lines

Diff 149357

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 791 Lines • ▼ Show 20 Lines	if (Subtarget.hasP9Vector()) {

if (EnableQuadPrecision) {		if (EnableQuadPrecision) {
addRegisterClass(MVT::f128, &PPC::VRRCRegClass);		addRegisterClass(MVT::f128, &PPC::VRRCRegClass);
setOperationAction(ISD::FADD, MVT::f128, Legal);		setOperationAction(ISD::FADD, MVT::f128, Legal);
setOperationAction(ISD::FSUB, MVT::f128, Legal);		setOperationAction(ISD::FSUB, MVT::f128, Legal);
setOperationAction(ISD::FDIV, MVT::f128, Legal);		setOperationAction(ISD::FDIV, MVT::f128, Legal);
setOperationAction(ISD::FMUL, MVT::f128, Legal);		setOperationAction(ISD::FMUL, MVT::f128, Legal);
setOperationAction(ISD::FP_EXTEND, MVT::f128, Legal);		setOperationAction(ISD::FP_EXTEND, MVT::f128, Legal);
setLoadExtAction(ISD::EXTLOAD, MVT::f128, MVT::f64, Expand);		// No extending loads to f128 on PPC.
		for (MVT FT : MVT::fp_valuetypes())
		nemanjaiUnsubmitted Done Reply Inline Actions I imagine this just goes away since it's handled in the other patch. nemanjai: I imagine this just goes away since it's handled in the other patch.
		leiAuthorUnsubmitted Not Done Reply Inline Actions correct! lei: correct!
		setLoadExtAction(ISD::EXTLOAD, MVT::f128, FT, Expand);
setOperationAction(ISD::FMA, MVT::f128, Legal);		setOperationAction(ISD::FMA, MVT::f128, Legal);
setOperationAction(ISD::FP_ROUND, MVT::f64, Legal);		setOperationAction(ISD::FP_ROUND, MVT::f64, Legal);
setOperationAction(ISD::FP_ROUND, MVT::f32, Legal);		setOperationAction(ISD::FP_ROUND, MVT::f32, Legal);
setTruncStoreAction(MVT::f128, MVT::f64, Expand);		setTruncStoreAction(MVT::f128, MVT::f64, Expand);
setTruncStoreAction(MVT::f128, MVT::f32, Expand);		setTruncStoreAction(MVT::f128, MVT::f32, Expand);
}		}

}		}
▲ Show 20 Lines • Show All 12,915 Lines • ▼ Show 20 Lines	bool PPCTargetLowering::isZExtFree(SDValue Val, EVT VT2) const {
// - zext after and by a constant mask		// - zext after and by a constant mask

return TargetLowering::isZExtFree(Val, VT2);		return TargetLowering::isZExtFree(Val, VT2);
}		}

bool PPCTargetLowering::isFPExtFree(EVT DestVT, EVT SrcVT) const {		bool PPCTargetLowering::isFPExtFree(EVT DestVT, EVT SrcVT) const {
assert(DestVT.isFloatingPoint() && SrcVT.isFloatingPoint() &&		assert(DestVT.isFloatingPoint() && SrcVT.isFloatingPoint() &&
"invalid fpext types");		"invalid fpext types");
		// Extending to float128 is not free.
		if (DestVT == MVT::f128)
		return false;
return true;		return true;
}		}

bool PPCTargetLowering::isLegalICmpImmediate(int64_t Imm) const {		bool PPCTargetLowering::isLegalICmpImmediate(int64_t Imm) const {
return isInt<16>(Imm) \|\| isUInt<16>(Imm);		return isInt<16>(Imm) \|\| isUInt<16>(Imm);
}		}

bool PPCTargetLowering::isLegalAddImmediate(int64_t Imm) const {		bool PPCTargetLowering::isLegalAddImmediate(int64_t Imm) const {
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 2,525 Lines • ▼ Show 20 Lines	def XSCMPGEDP : XX3_XT5_XA5_XB5<60, 19, "xscmpgedp", vsrc, vsfrc, vsfrc,
IIC_FPCompare, []>;		IIC_FPCompare, []>;
def XSCMPGTDP : XX3_XT5_XA5_XB5<60, 11, "xscmpgtdp", vsrc, vsfrc, vsfrc,		def XSCMPGTDP : XX3_XT5_XA5_XB5<60, 11, "xscmpgtdp", vsrc, vsfrc, vsfrc,
IIC_FPCompare, []>;		IIC_FPCompare, []>;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Quad-Precision Floating-Point Conversion Instructions:		// Quad-Precision Floating-Point Conversion Instructions:

// Convert DP -> QP		// Convert DP -> QP
def XSCVDPQP : X_VT5_XO5_VB5_TyVB<63, 22, 836, "xscvdpqp", vfrc, []>;		def XSCVDPQP : X_VT5_XO5_VB5_TyVB<63, 22, 836, "xscvdpqp", vfrc,
def : Pat<(f128 (fpextend f64:$src)), (f128 (XSCVDPQP $src))>;		[(set f128:$vT, (fpextend f64:$vB))]>;

// Round & Convert QP -> DP (dword[1] is set to zero)		// Round & Convert QP -> DP (dword[1] is set to zero)
def XSCVQPDP : X_VT5_XO5_VB5_VSFR<63, 20, 836, "xscvqpdp" , []>;		def XSCVQPDP : X_VT5_XO5_VB5_VSFR<63, 20, 836, "xscvqpdp" , []>;
def XSCVQPDPO : X_VT5_XO5_VB5_VSFR_Ro<63, 20, 836, "xscvqpdpo", []>;		def XSCVQPDPO : X_VT5_XO5_VB5_VSFR_Ro<63, 20, 836, "xscvqpdpo", []>;

// Truncate & Convert QP -> (Un)Signed (D)Word (dword[1] is set to zero)		// Truncate & Convert QP -> (Un)Signed (D)Word (dword[1] is set to zero)
def XSCVQPSDZ : X_VT5_XO5_VB5<63, 25, 836, "xscvqpsdz", []>;		def XSCVQPSDZ : X_VT5_XO5_VB5<63, 25, 836, "xscvqpsdz", []>;
def XSCVQPSWZ : X_VT5_XO5_VB5<63, 9, 836, "xscvqpswz", []>;		def XSCVQPSWZ : X_VT5_XO5_VB5<63, 9, 836, "xscvqpswz", []>;
▲ Show 20 Lines • Show All 831 Lines • ▼ Show 20 Lines	def : Pat<(PPCstore_scal_int_from_vsr
(STXSIHX (XSCVDPUXWS f64:$src), xoaddr:$dst)>;		(STXSIHX (XSCVDPUXWS f64:$src), xoaddr:$dst)>;
def : Pat<(PPCstore_scal_int_from_vsr		def : Pat<(PPCstore_scal_int_from_vsr
(f64 (PPCcv_fp_to_uint_in_vsr f64:$src)), xoaddr:$dst, 1),		(f64 (PPCcv_fp_to_uint_in_vsr f64:$src)), xoaddr:$dst, 1),
(STXSIBX (XSCVDPUXWS f64:$src), xoaddr:$dst)>;		(STXSIBX (XSCVDPUXWS f64:$src), xoaddr:$dst)>;

// Round & Convert QP -> DP/SP		// Round & Convert QP -> DP/SP
def : Pat<(f64 (fpround f128:$src)), (f64 (XSCVQPDP $src))>;		def : Pat<(f64 (fpround f128:$src)), (f64 (XSCVQPDP $src))>;
def : Pat<(f32 (fpround f128:$src)), (f32 (XSRSP (XSCVQPDPO $src)))>;		def : Pat<(f32 (fpround f128:$src)), (f32 (XSRSP (XSCVQPDPO $src)))>;

		// Convert SP -> QP
		def : Pat<(f128 (fpextend f32:$src)),
		(f128 (XSCVDPQP (COPY_TO_REGCLASS $src, VFRC)))>;
		nemanjaiUnsubmitted Not Done Reply Inline Actions Huh? We are copying the sign of the input to itself? That seems like an unnecessary noop. Why do we need that? nemanjai: Huh? We are copying the sign of the input to itself? That seems like an unnecessary noop. Why…
		nemanjaiUnsubmitted Done Reply Inline Actions Oh I see the motivation here - I imagine it's because of the code coming out of GCC. If that's the case, please remove this. We do not need to replicate this. The reason they use a copy-sign instruction is actually to move the value from the FPR into a VR (we use `xxlor`). On a side note, the instruction they use to copy scalar values between VSR's is a bit better since it allows for more parallelism (even if it doesn't provide shorter latency). But that's for a separate patch. nemanjai: Oh I see the motivation here - I imagine it's because of the code coming out of GCC. If that's…
		leiAuthorUnsubmitted Not Done Reply Inline Actions k lei: k

} // end HasP9Vector, AddedComplexity		} // end HasP9Vector, AddedComplexity
		nemanjaiUnsubmitted Not Done Reply Inline Actions These two patterns seem: Redundant since we can handle loading and extending separately already anyway Wrong if the weird sign copying sequence is actually necessary We will produce a different sequence for these two equivalent snippets of code and that seems wrong: __float128 test1(float Ptr) { return Ptr; } vs. float __attribute__((noinline)) getFromPtr(float Ptr) { return Ptr; } __float128 test2(float Ptr) { return getFromPtr(Ptr); } nemanjai:* These two patterns seem: 1. Redundant since we can handle loading and extending separately…
		leiAuthorUnsubmitted Not Done Reply Inline Actions Is this equivalent? `test1` returns `Ptr` which is pass in via GPR as it's a pointer to a float, whereas `test2` is returning the return value from `getFromPtr` which is in a FPR. lei:* Is this equivalent? `test1` returns `*Ptr` which is pass in via GPR as it's a pointer to a…
		nemanjaiUnsubmitted Not Done Reply Inline Actions Well sure. Semantically they're exactly the same thing - load a 4-byte float from memory, convert to `__float128` and return. nemanjai: Well sure. Semantically they're exactly the same thing - load a 4-byte float from memory…

let Predicates = [HasP9Vector] in {		let Predicates = [HasP9Vector] in {
let isPseudo = 1 in {		let isPseudo = 1 in {
let mayStore = 1 in {		let mayStore = 1 in {
def SPILLTOVSR_STX : PseudoXFormMemOp<(outs),		def SPILLTOVSR_STX : PseudoXFormMemOp<(outs),
(ins spilltovsrrc:$XT, memrr:$dst),		(ins spilltovsrrc:$XT, memrr:$dst),
"#SPILLTOVSR_STX", []>;		"#SPILLTOVSR_STX", []>;
def SPILLTOVSR_ST : Pseudo<(outs), (ins spilltovsrrc:$XT, memrix:$dst),		def SPILLTOVSR_ST : Pseudo<(outs), (ins spilltovsrrc:$XT, memrix:$dst),
▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/f128-arith.ll

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	entry:
ret void		ret void

; CHECK-LABEL: qpNeg		; CHECK-LABEL: qpNeg
; CHECK-NOT: bl __subtf3		; CHECK-NOT: bl __subtf3
; CHECK: xsnegqp		; CHECK: xsnegqp
; CHECK: stxv		; CHECK: stxv
; CHECK: blr		; CHECK: blr
}		}

; Function Attrs: norecurse nounwind
define void @dpConv2qp(double* nocapture readonly %a, fp128* nocapture %res) {
entry:
%0 = load double, double* %a, align 8
%conv = fpext double %0 to fp128
store fp128 %conv, fp128* %res, align 16
ret void
; CHECK-LABEL: dpConv2qp
; CHECK-NOT: bl __extenddftf2
; CHECK: lxsd
; CHECK: xscvdpqp
; CHECK: blr
}

; Function Attrs: norecurse nounwind
define void @dpConv2qp_02(double %a, fp128* nocapture %res) {
entry:
%conv = fpext double %a to fp128
store fp128 %conv, fp128* %res, align 16
ret void
; CHECK-LABEL: dpConv2qp_02
; CHECK-NOT: bl __extenddftf2
; CHECK: xxlor
; CHECK: xscvdpqp
; CHECK: blr
}

test/CodeGen/PowerPC/f128-conv.ll

	Show First 20 Lines • Show All 546 Lines • ▼ Show 20 Lines
	entry:			entry:
	%0 = load fp128, fp128* %a, align 16			%0 = load fp128, fp128* %a, align 16
	%1 = load fp128, fp128* %b, align 16			%1 = load fp128, fp128* %b, align 16
	%add = fadd fp128 %0, %1			%add = fadd fp128 %0, %1
	%conv = fptrunc fp128 %add to float			%conv = fptrunc fp128 %add to float
	store float %conv, float* %res, align 4			store float %conv, float* %res, align 4
	ret void			ret void
	}			}

				@f128Glob = common global fp128 0xL00000000000000000000000000000000, align 16

				; Function Attrs: norecurse nounwind readnone
				define fp128 @dpConv2qp(double %a) {
				; CHECK-LABEL: dpConv2qp:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xxlor 2, 1, 1
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: blr
				entry:
				%conv = fpext double %a to fp128
				ret fp128 %conv
				}

				; Function Attrs: norecurse nounwind
				define void @dpConv2qp_02(double* nocapture readonly %a) {
				; CHECK-LABEL: dpConv2qp_02:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lxsd 2, 0(3)
				; CHECK-NEXT: addis 3, 2, .LC8@toc@ha
				; CHECK-NEXT: ld 3, .LC8@toc@l(3)
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxvx 2, 0, 3
				; CHECK-NEXT: blr
				entry:
				%0 = load double, double* %a, align 8
				%conv = fpext double %0 to fp128
				store fp128 %conv, fp128* @f128Glob, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @dpConv2qp_02b(double* nocapture readonly %a, i32 signext %idx) {
				; CHECK-LABEL: dpConv2qp_02b:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: sldi 4, 4, 3
				; CHECK-NEXT: lxsdx 2, 3, 4
				; CHECK-NEXT: addis 3, 2, .LC8@toc@ha
				; CHECK-NEXT: ld 3, .LC8@toc@l(3)
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxvx 2, 0, 3
				; CHECK-NEXT: blr
				entry:
				%idxprom = sext i32 %idx to i64
				%arrayidx = getelementptr inbounds double, double* %a, i64 %idxprom
				%0 = load double, double* %arrayidx, align 8
				%conv = fpext double %0 to fp128
				store fp128 %conv, fp128* @f128Glob, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @dpConv2qp_03(fp128* nocapture %res, i32 signext %idx, double %a) {
				; CHECK-LABEL: dpConv2qp_03:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xxlor 2, 1, 1
				; CHECK-NEXT: sldi 4, 4, 4
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxvx 2, 3, 4
				; CHECK-NEXT: blr
				entry:
				%conv = fpext double %a to fp128
				%idxprom = sext i32 %idx to i64
				%arrayidx = getelementptr inbounds fp128, fp128* %res, i64 %idxprom
				store fp128 %conv, fp128* %arrayidx, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @dpConv2qp_04(double %a, fp128* nocapture %res) {
				; CHECK-LABEL: dpConv2qp_04:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xxlor 2, 1, 1
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxv 2, 0(4)
				; CHECK-NEXT: blr
				entry:
				%conv = fpext double %a to fp128
				store fp128 %conv, fp128* %res, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind readnone
				define fp128 @spConv2qp(float %a) {
				; CHECK-LABEL: spConv2qp:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xxlor 2, 1, 1
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: blr
				entry:
				%conv = fpext float %a to fp128
				ret fp128 %conv
				}

				; Function Attrs: norecurse nounwind
				define void @spConv2qp_02(float* nocapture readonly %a) {
				; CHECK-LABEL: spConv2qp_02:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lxssp 2, 0(3)
				; CHECK-NEXT: addis 3, 2, .LC8@toc@ha
				; CHECK-NEXT: ld 3, .LC8@toc@l(3)
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxvx 2, 0, 3
				; CHECK-NEXT: blr
				entry:
				%0 = load float, float* %a, align 4
				%conv = fpext float %0 to fp128
				store fp128 %conv, fp128* @f128Glob, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @spConv2qp_02b(float* nocapture readonly %a, i32 signext %idx) {
				; CHECK-LABEL: spConv2qp_02b:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: sldi 4, 4, 2
				; CHECK-NEXT: lxsspx 2, 3, 4
				; CHECK-NEXT: addis 3, 2, .LC8@toc@ha
				; CHECK-NEXT: ld 3, .LC8@toc@l(3)
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxvx 2, 0, 3
				; CHECK-NEXT: blr
				entry:
				%idxprom = sext i32 %idx to i64
				%arrayidx = getelementptr inbounds float, float* %a, i64 %idxprom
				%0 = load float, float* %arrayidx, align 4
				%conv = fpext float %0 to fp128
				store fp128 %conv, fp128* @f128Glob, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @spConv2qp_03(fp128* nocapture %res, i32 signext %idx, float %a) {
				; CHECK-LABEL: spConv2qp_03:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xxlor 2, 1, 1
				; CHECK-NEXT: sldi 4, 4, 4
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxvx 2, 3, 4
				; CHECK-NEXT: blr
				entry:
				%conv = fpext float %a to fp128
				%idxprom = sext i32 %idx to i64
				%arrayidx = getelementptr inbounds fp128, fp128* %res, i64 %idxprom
				store fp128 %conv, fp128* %arrayidx, align 16
				ret void
				}

				; Function Attrs: norecurse nounwind
				define void @spConv2qp_04(float %a, fp128* nocapture %res) {
				; CHECK-LABEL: spConv2qp_04:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xxlor 2, 1, 1
				; CHECK-NEXT: xscvdpqp 2, 2
				; CHECK-NEXT: stxv 2, 0(4)
				; CHECK-NEXT: blr
				entry:
				%conv = fpext float %a to fp128
				store fp128 %conv, fp128* %res, align 16
				ret void
				}