This is an archive of the discontinued LLVM Phabricator instance.

Enable fma formation for fp16 on x86 and aarch64
Needs ReviewPublic

Authored by scanon on Jan 11 2019, 5:48 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
ahatanak
SjoerdMeijer
olista01

Summary

The value of isFMAFasterThanFMulAndFAdd for fp16 should match fp32 on these targets when we legalize by extending to fp32 operations, and (on aarch64) when fp16 arithmetic is supported directly the value should just be true.

Diff Detail

Event Timeline

scanon created this revision.Jan 11 2019, 5:48 AM

Herald added subscribers: llvm-commits, kristof.beyls, javed.absar. · View Herald TranscriptJan 11 2019, 5:48 AM

scanon added a reviewer: ahatanak.Jan 11 2019, 5:50 AM

The AArch64 side LGTM. Also adding Oliver and Sjoerd as they worked on FP16 side as well, in case they have any thoughts.

The only public documentation for Arm v8.2+ CPUs I could find indicates FADD,FMUL and FMADD have the same latencies and throughputs on Cortex-A75: https://static.docs.arm.com/101398/0200/arm_cortex_a75_software_optimization_guide_v2.pdf

For the X86, I think a test would be great.

The AArch64 side LGTM. Also adding Oliver and Sjoerd as they worked on FP16 side as well, in case they have any thoughts.

I did a same/similar exercise not so long ago for AArch32, M-cores (see https://reviews.llvm.org/D53314). I haven't looked into this for AArch64, but anyway, what I wanted to say is that checking the software optimisation guides is the thing to do: the A55 is also a v8.2 core with FP16. It has a public guide here: http://infocenter.arm.com/help/topic/com.arm.doc.epm128372/arm_cortex_a55_software_optimization_guide_v2.pdf

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

1 line

X86/

X86ISelLowering.cpp

1 line

test/

CodeGen/

AArch64/

f16-instructions.ll

10 lines

Diff 181255

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,672 Lines • ▼ Show 20 Lines

	bool AArch64TargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {			bool AArch64TargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {
	VT = VT.getScalarType();			VT = VT.getScalarType();

	if (!VT.isSimple())			if (!VT.isSimple())
	return false;			return false;

	switch (VT.getSimpleVT().SimpleTy) {			switch (VT.getSimpleVT().SimpleTy) {
				case MVT::f16:
	case MVT::f32:			case MVT::f32:
	case MVT::f64:			case MVT::f64:
	return true;			return true;
	default:			default:
	break;			break;
	}			}

	return false;			return false;
	▲ Show 20 Lines • Show All 3,150 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 27,553 Lines • ▼ Show 20 Lines	if (!Subtarget.hasAnyFMA())
return false;		return false;

VT = VT.getScalarType();		VT = VT.getScalarType();

if (!VT.isSimple())		if (!VT.isSimple())
return false;		return false;

switch (VT.getSimpleVT().SimpleTy) {		switch (VT.getSimpleVT().SimpleTy) {
		case MVT::f16:
case MVT::f32:		case MVT::f32:
case MVT::f64:		case MVT::f64:
return true;		return true;
default:		default:
break;		break;
}		}

return false;		return false;
▲ Show 20 Lines • Show All 15,121 Lines • Show Last 20 Lines

test/CodeGen/AArch64/f16-instructions.ll

	Show First 20 Lines • Show All 1,154 Lines • ▼ Show 20 Lines
	; CHECK-FP16-NEXT: ret			; CHECK-FP16-NEXT: ret

	define half @test_round(half %a) #0 {			define half @test_round(half %a) #0 {
	%r = call half @llvm.round.f16(half %a)			%r = call half @llvm.round.f16(half %a)
	ret half %r			ret half %r
	}			}

	; CHECK-CVT-LABEL: test_fmuladd:			; CHECK-CVT-LABEL: test_fmuladd:
				; CHECK-CVT-NEXT: fcvt s2, h2
	; CHECK-CVT-NEXT: fcvt s1, h1			; CHECK-CVT-NEXT: fcvt s1, h1
	; CHECK-CVT-NEXT: fcvt s0, h0			; CHECK-CVT-NEXT: fcvt s0, h0
	; CHECK-CVT-NEXT: fmul s0, s0, s1			; CHECK-CVT-NEXT: fmadd s0, s0, s1, s2
	; CHECK-CVT-NEXT: fcvt h0, s0
	; CHECK-CVT-NEXT: fcvt s0, h0
	; CHECK-CVT-NEXT: fcvt s1, h2
	; CHECK-CVT-NEXT: fadd s0, s0, s1
	; CHECK-CVT-NEXT: fcvt h0, s0			; CHECK-CVT-NEXT: fcvt h0, s0
	; CHECK-CVT-NEXT: ret			; CHECK-CVT-NEXT: ret

	; CHECK-FP16-LABEL: test_fmuladd:			; CHECK-FP16-LABEL: test_fmuladd:
	; CHECK-FP16-NEXT: fmul h0, h0, h1			; CHECK-FP16-NEXT: fmadd h0, h0, h1, h2
	; CHECK-FP16-NEXT: fadd h0, h0, h2
	; CHECK-FP16-NEXT: ret			; CHECK-FP16-NEXT: ret

	define half @test_fmuladd(half %a, half %b, half %c) #0 {			define half @test_fmuladd(half %a, half %b, half %c) #0 {
	%r = call half @llvm.fmuladd.f16(half %a, half %b, half %c)			%r = call half @llvm.fmuladd.f16(half %a, half %b, half %c)
	ret half %r			ret half %r
	}			}

	; CHECK-FP16-LABEL: test_vrecpeh_f16:			; CHECK-FP16-LABEL: test_vrecpeh_f16:
	Show All 27 Lines