This is an archive of the discontinued LLVM Phabricator instance.

Teach the AArch64 backend about half-precision floating point
ClosedPublic

Authored by olista01 on Aug 13 2014, 2:22 AM.

Download Raw Diff

Details

Reviewers

Summary

This patch makes half and vectors of half valid types for the AArch64 backend (half is known as f16 in the backend). This is mostly a case of adding f16 to all instruction selection patterns that can use it, but also adds some target-independent logic for promoting arithmetic operations to a wider floating-point type, and fixes up some AArch64 custom lowering which assumes that the smallest floating-point type is f32.

The motivation for this is that the ACLE (ARM C Language Extensions) allows fp16 to be used as a function argument or return type, and it must be passed in floating-point registers. Previously, fp16 was converted to i16, so the backend could not know to pass an __fp16 in a different register to a short.

Diff Detail

Event Timeline

olista01 updated this revision to Diff 12437.Aug 13 2014, 2:22 AM

olista01 retitled this revision from to Teach the AArch64 backend about half-precision floating point.

olista01 updated this object.

olista01 edited the test plan for this revision. (Show Details)

olista01 set the repository for this revision to rL LLVM.

olista01 added a subscriber: Unknown Object (MLST).

Herald added subscribers: mroth, mcrosier, aemerson. · View Herald TranscriptAug 13 2014, 2:22 AM

Hi Oliver,

Thanks for working on this, it looks like it might really be a set of patches that should be analysed separately.

First, there's the "deal with vNf16 natively in AArch64 where possible". This looks largely fine, though even that would be a bit better sub-divided into function-calls, loads, casts, ... for commit.

Then there's the promotion. I think that's rather more problematic. In general trunc . OP32 . extend != OP16. Apparently, it *does* work for add, sub, div, mul & sqrt, but explicitly not for fma. Personally, I'd be extremely surprised if it worked for the transcendental functions, but I haven't tried to prove it.

So we want to be very careful in that area. If we really want to support half as a native type, we'll probably need to add libcalls for some operations.

Now, this clearly ties in with D4456, and I'm guessing you added the promotion logic because it makes clang emit fptrunc/fpext which then gets optimised. In that case, the operations that *do* get optimised should be safe, so a more limited set of promotions is probably workable.

Cheers.

Tim.

Then there's the promotion. I think that's rather more problematic. In general trunc . OP32 . extend != OP16. Apparently, it *does* work for add, sub, div, mul & sqrt, but explicitly not for fma. Personally, I'd be extremely surprised if it worked for the transcendental functions, but I haven't tried to prove it.

My original intention when adding the promotion was to make the backend capable of handling any operation that can be expressed by IR, so that it could be robust against optimisations introducing half-precision operations. However, if this could give the wrong results, I agree it would be better to fail the compilation.

Would it be OK to simply remove all of the promotions to f32, leaving them to fail at instruction selection, or is there a better way to express that an operation is not supported?

Should I leave the promotions in for add, sub, div, mul & sqrt, or do you think it would it be better to be consistent?

So we want to be very careful in that area. If we really want to support half as a native type, we'll probably need to add libcalls for some operations.

I'm not currently aware of any source language other than opencl which allows operations on the half type (ACLE promotes to float first), so adding libcalls seems like overkill at the moment.

Hi Oliver,

Tim's email, which isn't showing up in phabricator:

Hi Oliver,

On 14 August 2014 16:35, Oliver Stannard <oliver.stannard@arm.com> wrote:

Would it be OK to simply remove all of the promotions to f32, leaving them to fail at instruction selection, or is there a better way to express that an operation is not supported?

I think that's probably the best we've got for now.

Should I leave the promotions in for add, sub, div, mul & sqrt, or do you think it would it be better to be consistent?

I'd be happy if you left those in, actually. I have an ongoing cunning
plan to get rid of @llvm.convert.to.fp16 in favour of fptrunc, and
doing those promotions will be a necessary step along the way, I
think.

So we want to be very careful in that area. If we really want to support half as a native type, we'll probably need to add libcalls for some operations.

I'm not currently aware of any source language other than opencl which allows operations on the half type (ACLE promotes to float first), so adding libcalls seems like overkill at the moment.

Agreed. Quite a bit of work without much gain at the moment.

Cheers.

This review now just covers the scalar part of the patch, I will upload a second patch with the vector parts.

Removed most of the type promotions, except for add, sub, mul, div, fp_round and fp_extend.

Hi Oliver,

This looks fine, apart from one nit. If you don't need FP_ROUND/FP_TRUNC feel free to remove those lines and just commit.

Cheers.

Tim.

lib/Target/AArch64/AArch64ISelLowering.cpp
289–290	Are these needed? I don't see any code in LegalizeDAG to promote FP_ROUND or FP_EXTEND (and doing so sounds a bit dodgy, given that they're what we'll be using to do the promotion in general).

Good catch, those promotions were unnecessary.

Committed revision 215891.

This revision is now accepted and ready to land.Aug 18 2014, 7:32 AM

olista01 closed this revision.Aug 18 2014, 7:33 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

3 lines

Target/

AArch64/

AArch64ISelLowering.cpp

11 lines

test/

CodeGen/

AArch64/

fp16-instructions.ll

109 lines

Diff 12616

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 4,258 Lines • ▼ Show 20 Lines	if (NVT.isInteger()) {
ExtOp = isSignedIntSetCC(CCCode) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;		ExtOp = isSignedIntSetCC(CCCode) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
}		}
Tmp1 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(0));		Tmp1 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(0));
Tmp2 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(1));		Tmp2 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(1));
Results.push_back(DAG.getNode(ISD::SETCC, dl, Node->getValueType(0),		Results.push_back(DAG.getNode(ISD::SETCC, dl, Node->getValueType(0),
Tmp1, Tmp2, Node->getOperand(2)));		Tmp1, Tmp2, Node->getOperand(2)));
break;		break;
}		}
		case ISD::FADD:
		case ISD::FSUB:
		case ISD::FMUL:
case ISD::FDIV:		case ISD::FDIV:
case ISD::FREM:		case ISD::FREM:
case ISD::FPOW: {		case ISD::FPOW: {
Tmp1 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(0));		Tmp1 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(0));
Tmp2 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(1));		Tmp2 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(1));
Tmp3 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2);		Tmp3 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2);
Results.push_back(DAG.getNode(ISD::FP_ROUND, dl, OVT,		Results.push_back(DAG.getNode(ISD::FP_ROUND, dl, OVT,
Tmp3, DAG.getIntPtrConstant(0)));		Tmp3, DAG.getIntPtrConstant(0)));
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(TargetMachine &TM)
setOperationAction(ISD::FSIN, MVT::f64, Expand);		setOperationAction(ISD::FSIN, MVT::f64, Expand);
setOperationAction(ISD::FCOS, MVT::f32, Expand);		setOperationAction(ISD::FCOS, MVT::f32, Expand);
setOperationAction(ISD::FCOS, MVT::f64, Expand);		setOperationAction(ISD::FCOS, MVT::f64, Expand);
setOperationAction(ISD::FPOW, MVT::f32, Expand);		setOperationAction(ISD::FPOW, MVT::f32, Expand);
setOperationAction(ISD::FPOW, MVT::f64, Expand);		setOperationAction(ISD::FPOW, MVT::f64, Expand);
setOperationAction(ISD::FCOPYSIGN, MVT::f64, Custom);		setOperationAction(ISD::FCOPYSIGN, MVT::f64, Custom);
setOperationAction(ISD::FCOPYSIGN, MVT::f32, Custom);		setOperationAction(ISD::FCOPYSIGN, MVT::f32, Custom);

		// f16 is storage-only, so we promote operations to f32 if we know this is
		// valid, and ignore them otherwise. The operations not mentioned here will
		// fail to select, but this is not a major problem as no source language
		// should be emitting native f16 operations yet.
		setOperationAction(ISD::FADD, MVT::f16, Promote);
		setOperationAction(ISD::FDIV, MVT::f16, Promote);
		setOperationAction(ISD::FMUL, MVT::f16, Promote);
		setOperationAction(ISD::FSUB, MVT::f16, Promote);
		setOperationAction(ISD::FP_ROUND, MVT::f16, Promote);
		setOperationAction(ISD::FP_EXTEND, MVT::f16, Promote);
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Are these needed? I don't see any code in LegalizeDAG to promote FP_ROUND or FP_EXTEND (and doing so sounds a bit dodgy, given that they're what we'll be using to do the promotion in general). t.p.northover: Are these needed? I don't see any code in LegalizeDAG to promote FP_ROUND or FP_EXTEND (and…

// AArch64 has implementations of a lot of rounding-like FP operations.		// AArch64 has implementations of a lot of rounding-like FP operations.
static MVT RoundingTypes[] = { MVT::f32, MVT::f64};		static MVT RoundingTypes[] = { MVT::f32, MVT::f64};
for (unsigned I = 0; I < array_lengthof(RoundingTypes); ++I) {		for (unsigned I = 0; I < array_lengthof(RoundingTypes); ++I) {
MVT Ty = RoundingTypes[I];		MVT Ty = RoundingTypes[I];
setOperationAction(ISD::FFLOOR, Ty, Legal);		setOperationAction(ISD::FFLOOR, Ty, Legal);
setOperationAction(ISD::FNEARBYINT, Ty, Legal);		setOperationAction(ISD::FNEARBYINT, Ty, Legal);
setOperationAction(ISD::FCEIL, Ty, Legal);		setOperationAction(ISD::FCEIL, Ty, Legal);
setOperationAction(ISD::FRINT, Ty, Legal);		setOperationAction(ISD::FRINT, Ty, Legal);
▲ Show 20 Lines • Show All 7,909 Lines • Show Last 20 Lines

test/CodeGen/AArch64/fp16-instructions.ll

This file was added.

				; RUN: llc < %s -mtriple=aarch64-none-eabi \| FileCheck %s

				define half @add_h(half %a, half %b) {
				entry:
				; CHECK-LABEL: add_h:
				; CHECK: fcvt
				; CHECK: fcvt
				; CHECK: fadd
				; CHECK: fcvt
				%0 = fadd half %a, %b
				ret half %0
				}


				define half @sub_h(half %a, half %b) {
				entry:
				; CHECK-LABEL: sub_h:
				; CHECK: fcvt
				; CHECK: fcvt
				; CHECK: fsub
				; CHECK: fcvt
				%0 = fsub half %a, %b
				ret half %0
				}


				define half @mul_h(half %a, half %b) {
				entry:
				; CHECK-LABEL: mul_h:
				; CHECK: fcvt
				; CHECK: fcvt
				; CHECK: fmul
				; CHECK: fcvt
				%0 = fmul half %a, %b
				ret half %0
				}


				define half @div_h(half %a, half %b) {
				entry:
				; CHECK-LABEL: div_h:
				; CHECK: fcvt
				; CHECK: fcvt
				; CHECK: fdiv
				; CHECK: fcvt
				%0 = fdiv half %a, %b
				ret half %0
				}


				define half @load_h(half* %a) {
				entry:
				; CHECK-LABEL: load_h:
				; CHECK: ldr h
				%0 = load half* %a, align 4
				ret half %0
				}


				define void @store_h(half* %a, half %b) {
				entry:
				; CHECK-LABEL: store_h:
				; CHECK: str h
				store half %b, half* %a, align 4
				ret void
				}

				define half @s_to_h(float %a) {
				; CHECK-LABEL: s_to_h:
				; CHECK: fcvt
				%1 = fptrunc float %a to half
				ret half %1
				}

				define half @d_to_h(double %a) {
				; CHECK-LABEL: d_to_h:
				; CHECK: fcvt
				%1 = fptrunc double %a to half
				ret half %1
				}

				define float @h_to_s(half %a) {
				; CHECK-LABEL: h_to_s:
				; CHECK: fcvt
				%1 = fpext half %a to float
				ret float %1
				}

				define double @h_to_d(half %a) {
				; CHECK-LABEL: h_to_d:
				; CHECK: fcvt
				%1 = fpext half %a to double
				ret double %1
				}

				define half @bitcast_i_to_h(i16 %a) {
				; CHECK-LABEL: bitcast_i_to_h:
				; CHECK: fmov
				%1 = bitcast i16 %a to half
				ret half %1
				}


				define i16 @bitcast_h_to_i(half %a) {
				; CHECK-LABEL: bitcast_h_to_i:
				; CHECK: fmov
				%1 = bitcast half %a to i16
				ret i16 %1
				}

This is an archive of the discontinued LLVM Phabricator instance.

Teach the AArch64 backend about half-precision floating pointClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 12616

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/AArch64/fp16-instructions.ll

Teach the AArch64 backend about half-precision floating point
ClosedPublic