This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Explicit lowering of half <-> double conversions.
ClosedPublic

Authored by simon_tatham on Apr 15 2019, 5:56 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samparker
SjoerdMeijer
ostannard

Commits

rG4cf18c284955: [ARM] Explicit lowering of half <-> double conversions.
rL364294: [ARM] Explicit lowering of half <-> double conversions.

Summary

If an FP_EXTEND or FP_ROUND isel dag node converts directly between
f16 and f32 when the target CPU has no instruction to do it in one go,
it has to be done in two steps instead, going via f32.

Previously, this was done implicitly, because all such CPUs had the
storage-only implementation of f16 (i.e. the only thing you can do
with one at all is to convert it to/from f32). So isel would legalize
the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp
node (or vice versa), and then the fp_extend would already be f32->f64
rather than f16->f64.

But that technique can't support a target CPU which has full f16
support but _not_ f64, such as some variants of Arm v8.1-M. So now we
provide custom lowering for FP_EXTEND and FP_ROUND, which checks
support for f16 and f64 and decides on the best thing to do given the
combination of flags it gets back.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 30535
Build 30534: arc lint + arc unit

Event Timeline

simon_tatham created this revision.Apr 15 2019, 5:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2019, 5:56 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B30535: Diff 195150.Apr 15 2019, 5:57 AM

simon_tatham added a parent revision: D60691: [ARM] Replace fp-only-sp and d16 with fp64 and d32..Apr 15 2019, 6:08 AM

simon_tatham added a child revision: D60693: [ARM] Split predicates out into their own .td file..

Can this be tested now, or does that depend on one of the other patches?

The code looks reasonable to me, but should be tested.

simon_tatham removed a parent revision: D60691: [ARM] Replace fp-only-sp and d16 with fp64 and d32..May 30 2019, 8:33 AM

simon_tatham removed a child revision: D60693: [ARM] Split predicates out into their own .td file..May 30 2019, 8:35 AM

simon_tatham added a child revision: D60708: [ARM] Code-generation infrastructure for MVE..Jun 4 2019, 4:55 AM

Remastered patch to apply cleanly against current trunk.

Harbormaster completed remote builds in B33198: Diff 204037.Jun 11 2019, 5:30 AM

LGTM except for a few typos in comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
14432	Typos: convertig, withouth

This revision is now accepted and ready to land.Jun 13 2019, 5:51 AM

Closed by commit rL364294: [ARM] Explicit lowering of half <-> double conversions. (authored by statham). · Explain WhyJun 25 2019, 4:25 AM

This revision was automatically updated to reflect the committed changes.

MaskRay mentioned this in rL364312: [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692.Jun 25 2019, 6:34 AM

MaskRay mentioned this in rG807d2f442ad4: [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692.Jun 25 2019, 6:37 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.cpp

79 lines

ARMInstrVFP.td

8 lines

Diff 195150

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 762 Lines • ▼ Show 20 Lines	if (!Subtarget->hasFP64()) {
setOperationAction(ISD::FFLOOR, MVT::f64, Expand);		setOperationAction(ISD::FFLOOR, MVT::f64, Expand);
setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::i32, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::i32, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::f64, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::f64, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::f64, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::f64, Custom);
setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);		setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);
		}

		if (!Subtarget->hasFP64() \|\| !Subtarget->hasFPARMv8Base()){
setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);		setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);
		setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);
}		}

		if (!Subtarget->hasFP16())
		setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom);

		if (!Subtarget->hasFP64())
		setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);

computeRegisterProperties(Subtarget->getRegisterInfo());		computeRegisterProperties(Subtarget->getRegisterInfo());

// ARM does not have floating-point extending loads.		// ARM does not have floating-point extending loads.
for (MVT VT : MVT::fp_valuetypes()) {		for (MVT VT : MVT::fp_valuetypes()) {
setLoadExtAction(ISD::EXTLOAD, VT, MVT::f32, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, MVT::f32, Expand);
setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand);
}		}

▲ Show 20 Lines • Show All 13,617 Lines • ▼ Show 20 Lines	ARMTargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const {
SDValue NewSP = DAG.getCopyFromReg(Chain, DL, ARM::SP, MVT::i32);		SDValue NewSP = DAG.getCopyFromReg(Chain, DL, ARM::SP, MVT::i32);
Chain = NewSP.getValue(1);		Chain = NewSP.getValue(1);

SDValue Ops[2] = { NewSP, Chain };		SDValue Ops[2] = { NewSP, Chain };
return DAG.getMergeValues(Ops, DL);		return DAG.getMergeValues(Ops, DL);
}		}

SDValue ARMTargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {
assert(Op.getValueType() == MVT::f64 && !Subtarget->hasFP64() &&		SDValue SrcVal = Op.getOperand(0);
		const unsigned DstSz = Op.getValueType().getSizeInBits();
		const unsigned SrcSz = SrcVal.getValueType().getSizeInBits();
		assert(DstSz > SrcSz && DstSz <= 64 && SrcSz >= 16 &&
"Unexpected type for custom-lowering FP_EXTEND");		"Unexpected type for custom-lowering FP_EXTEND");

		assert((!Subtarget->hasFP64() \|\| !Subtarget->hasFPARMv8Base()) &&
		"With both FP DP and 16, any FP conversion is legal!");

		assert(!(DstSz == 32 && Subtarget->hasFP16()) &&
		"With FP16, 16 to 32 conversion is legal!");

		// Either we are converting from 16 -> 64, without FP16 and/or
		// FP.double-precision or without Armv8-fp. So we must do it in two
		// steps.
		// Or we are convertig from 32 -> 64 withouth fp.double-precision or 16 -> 32
		ostannardUnsubmitted Not Done Reply Inline Actions Typos: convertig, withouth ostannard: Typos: convertig, withouth
		// without FP16. So we must do a function call.
		SDLoc Loc(Op);
RTLIB::Libcall LC;		RTLIB::Libcall LC;
LC = RTLIB::getFPEXT(Op.getOperand(0).getValueType(), Op.getValueType());		if (SrcSz == 16) {
		// Instruction from 16 -> 32
		if (Subtarget->hasFP16())
		SrcVal = DAG.getNode(ISD::FP_EXTEND, Loc, MVT::f32, SrcVal);
		// Lib call from 16 -> 32
		else {
		LC = RTLIB::getFPEXT(MVT::f16, MVT::f32);
		assert(LC != RTLIB::UNKNOWN_LIBCALL &&
		"Unexpected type for custom-lowering FP_EXTEND");
		SrcVal =
		makeLibCall(DAG, LC, MVT::f32, SrcVal, /isSigned/ false, Loc).first;
		}
		}

SDValue SrcVal = Op.getOperand(0);		if (DstSz != 64)
return makeLibCall(DAG, LC, Op.getValueType(), SrcVal, /isSigned/ false,		return SrcVal;
SDLoc(Op)).first;		// For sure now SrcVal is 32 bits
		if (Subtarget->hasFP64()) // Instruction from 32 -> 64
		return DAG.getNode(ISD::FP_EXTEND, Loc, MVT::f64, SrcVal);

		LC = RTLIB::getFPEXT(MVT::f32, MVT::f64);
		assert(LC != RTLIB::UNKNOWN_LIBCALL &&
		"Unexpected type for custom-lowering FP_EXTEND");
		return makeLibCall(DAG, LC, MVT::f64, SrcVal, /isSigned/ false, Loc).first;
}		}

SDValue ARMTargetLowering::LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
assert(Op.getOperand(0).getValueType() == MVT::f64 && !Subtarget->hasFP64() &&		SDValue SrcVal = Op.getOperand(0);
		EVT SrcVT = SrcVal.getValueType();
		EVT DstVT = Op.getValueType();
		const unsigned DstSz = Op.getValueType().getSizeInBits();
		const unsigned SrcSz = SrcVT.getSizeInBits();
		assert(DstSz < SrcSz && SrcSz <= 64 && DstSz >= 16 &&
"Unexpected type for custom-lowering FP_ROUND");		"Unexpected type for custom-lowering FP_ROUND");

RTLIB::Libcall LC;		assert((!Subtarget->hasFP64() \|\| !Subtarget->hasFPARMv8Base()) &&
LC = RTLIB::getFPROUND(Op.getOperand(0).getValueType(), Op.getValueType());		"With both FP DP and 16, any FP conversion is legal!");

SDValue SrcVal = Op.getOperand(0);		SDLoc Loc(Op);
return makeLibCall(DAG, LC, Op.getValueType(), SrcVal, /isSigned/ false,
SDLoc(Op)).first;		// Instruction from 32 -> 16 if hasFP16 is valid
		if (SrcSz == 32 && Subtarget->hasFP16())
		return Op;

		// Lib call from 32 -> 16 / 64 -> [32, 16]
		RTLIB::Libcall LC = RTLIB::getFPROUND(SrcVT, DstVT);
		assert(LC != RTLIB::UNKNOWN_LIBCALL &&
		"Unexpected type for custom-lowering FP_ROUND");
		return makeLibCall(DAG, LC, DstVT, SrcVal, /isSigned/ false, Loc).first;
}		}

void ARMTargetLowering::lowerABS(SDNode *N, SmallVectorImpl<SDValue> &Results,		void ARMTargetLowering::lowerABS(SDNode *N, SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(N->getValueType(0) == MVT::i64 && "Unexpected type (!= i64) on ABS.");		assert(N->getValueType(0) == MVT::i64 && "Unexpected type (!= i64) on ABS.");
MVT HalfT = MVT::i32;		MVT HalfT = MVT::i32;
SDLoc dl(N);		SDLoc dl(N);
SDValue Hi, Lo, Tmp;		SDValue Hi, Lo, Tmp;
▲ Show 20 Lines • Show All 924 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrVFP.td

	Show First 20 Lines • Show All 700 Lines • ▼ Show 20 Lines

	// Between half, single and double-precision.			// Between half, single and double-precision.
	def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),			def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
	/* FIXME */ IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm",			/* FIXME */ IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm",
	[/* Intentionally left blank, see patterns below */]>,			[/* Intentionally left blank, see patterns below */]>,
	Requires<[HasFP16]>,			Requires<[HasFP16]>,
	Sched<[WriteFPCVT]>;			Sched<[WriteFPCVT]>;

	def : FullFP16Pat<(f32 (fpextend HPR:$Sm)),			def : FP16Pat<(f32 (fpextend HPR:$Sm)),
	(VCVTBHS (COPY_TO_REGCLASS HPR:$Sm, SPR))>;			(VCVTBHS (COPY_TO_REGCLASS HPR:$Sm, SPR))>;
	def : FP16Pat<(f16_to_fp GPR:$a),			def : FP16Pat<(f16_to_fp GPR:$a),
	(VCVTBHS (COPY_TO_REGCLASS GPR:$a, SPR))>;			(VCVTBHS (COPY_TO_REGCLASS GPR:$a, SPR))>;

	def VCVTBSH: ASuI<0b11101, 0b11, 0b0011, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),			def VCVTBSH: ASuI<0b11101, 0b11, 0b0011, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
	/* FIXME */ IIC_fpCVTHS, "vcvtb", ".f16.f32\t$Sd, $Sm",			/* FIXME */ IIC_fpCVTHS, "vcvtb", ".f16.f32\t$Sd, $Sm",
	[/* Intentionally left blank, see patterns below */]>,			[/* Intentionally left blank, see patterns below */]>,
	Requires<[HasFP16]>,			Requires<[HasFP16]>,
	Sched<[WriteFPCVT]>;			Sched<[WriteFPCVT]>;

	def : FullFP16Pat<(f16 (fpround SPR:$Sm)),			def : FP16Pat<(f16 (fpround SPR:$Sm)),
	(COPY_TO_REGCLASS (VCVTBSH SPR:$Sm), HPR)>;			(COPY_TO_REGCLASS (VCVTBSH SPR:$Sm), HPR)>;
	def : FP16Pat<(fp_to_f16 SPR:$a),			def : FP16Pat<(fp_to_f16 SPR:$a),
	(i32 (COPY_TO_REGCLASS (VCVTBSH SPR:$a), GPR))>;			(i32 (COPY_TO_REGCLASS (VCVTBSH SPR:$a), GPR))>;

	def VCVTTHS: ASuI<0b11101, 0b11, 0b0010, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sm),			def VCVTTHS: ASuI<0b11101, 0b11, 0b0010, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sm),
	/* FIXME */ IIC_fpCVTSH, "vcvtt", ".f32.f16\t$Sd, $Sm",			/* FIXME */ IIC_fpCVTSH, "vcvtt", ".f32.f16\t$Sd, $Sm",
	[/* For disassembly only; pattern left blank */]>,			[/* For disassembly only; pattern left blank */]>,
	Requires<[HasFP16]>,			Requires<[HasFP16]>,
	Sched<[WriteFPCVT]>;			Sched<[WriteFPCVT]>;
	▲ Show 20 Lines • Show All 1,802 Lines • Show Last 20 Lines