This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVISelLowering.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
fp16-promote.ll

Differential D69246

[RISCV] Add support for half-precision floats
ClosedPublic

Authored by luismarques on Oct 21 2019, 2:57 AM.

Download Raw Diff

Details

Reviewers

asb
lenary

Commits

rG1baa50396d9b: [RISCV] Add support for half-precision floats

Summary

Most fp16 operations are automatically supported by promoting the half-precision values to single-precision ones. This patch completes fp16 support by ensuring that load extension / truncate store operations are properly expanded.

The tests included in the patch check the load ext / trunc store behavior, and add a few sanity checks for promoted fp16 operations. Testing with riscv32 using the ilp32d ABI is enough to check the 4 ext/trunc cases, and the riscv64 output doesn't differ in any important way, so the tests target only riscv32.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

luismarques created this revision.Oct 21 2019, 2:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2019, 2:57 AM

Herald added subscribers: llvm-commits, pzheng, s.egerton and 23 others. · View Herald Transcript

This is looking good to me. I'd like you to precommit the tests.

I find the libcall ABI slighly odd (__gnu_h2f_ieee takes the half-precision arg in a0 and returns the result in fa0, it seems, which strikes me as a bit odd, but maybe I'm missing something). That said, that probably isn't an issue with this patch specifically.

In D69246#1716876, @lenary wrote:

This is looking good to me. I'd like you to precommit the tests.

What do you mean? (Don't forget that the load ext / trunc store tests require this patch's code changes)

I find the libcall ABI slighly odd (__gnu_h2f_ieee takes the half-precision arg in a0 and returns the result in fa0, it seems, which strikes me as a bit odd, but maybe I'm missing something). That said, that probably isn't an issue with this patch specifically.

That makes sense. The half-precision float isn't a floating-point value as understood by the FP unit, so it has to go in a GPR, for ALU operations to slice up the fields and build the normal IEEE 754 32-bit representation, which can then be returned as a regular float -- which for ilp32f is of course returned in an FPR.

In D69246#1716922, @luismarques wrote:

In D69246#1716876, @lenary wrote:

I find the libcall ABI slighly odd (__gnu_h2f_ieee takes the half-precision arg in a0 and returns the result in fa0, it seems, which strikes me as a bit odd, but maybe I'm missing something). That said, that probably isn't an issue with this patch specifically.

That makes sense. The half-precision float isn't a floating-point value as understood by the FP unit, so it has to go in a GPR, for ALU operations to slice up the fields and build the normal IEEE 754 32-bit representation, which can then be returned as a regular float -- which for ilp32f is of course returned in an FPR.

No exception for half-precision floats has been included in the RISC-V ELF psABI (one would expect they should use the FP calling convention, as they are an FP real value). This should perhaps be rectified, but separately to this patch. I'll make a note.

In D69246#1716972, @lenary wrote:

No exception for half-precision floats has been included in the RISC-V ELF psABI (one would expect they should use the FP calling convention, as they are an FP real value). This should perhaps be rectified, but separately to this patch. I'll make a note.

It's probably a good idea to clarify the psABI's stance on half floats (possibly including not having any opinion about that), but do beware that this issue transcends the ABI. This half-float is being promoted at the LLVM IR level to a single-precision float, and once that happens everything behaves normally. What happens before the promotion is an LLVM implementation detail, so probably outside the scope of the ABI docs.

In D69246#1717209, @luismarques wrote:

It's probably a good idea to clarify the psABI's stance on half floats (possibly including not having any opinion about that), but do beware that this issue transcends the ABI. This half-float is being promoted at the LLVM IR level to a single-precision float, and once that happens everything behaves normally. What happens before the promotion is an LLVM implementation detail, so probably outside the scope of the ABI docs.

Just to clarify, because I used somewhat sloppy terminology. Consider this:

%r = fadd half %a, %b
->
; ... magic ...
; CHECK-NEXT:    fadd.s fa0, fs0, fa0
; ... magic ...

We are conceptually implementing half-float addition, but in reality promoting and expanding that to single-precision floating-point addition. If we consider that all of the promotion/expansion happens at a level that is not visible to the user and does not have to interoperate with other compilers then I would argue that it transcends the ABI requirements. But if it becomes visible then that's another story. Then I guess the answer depends on whether you want to standardize the ABI for a non-standard C type, and thus ensure interoperability across compilers even for that case.

In D69246#1717220, @luismarques wrote:

If we consider that all of the promotion/expansion happens at a level that is not visible to the user and does not have to interoperate with other compilers then I would argue that it transcends the ABI requirements. But if it becomes visible then that's another story.

I think most standards that even mention them, say that half-precision floats are a storage-only format, as these platforms cannot directly compute with them.

I do think we should care about how they are passed in the calling convention, as that is perhaps the most "visible" part of using them. At the moment, LLVM is doing something reasonable, that is compatible with GCC, so it would be good to codify.

Edit: Further discussion here about the psABI isn't useful, as this patch is about ensuring we don't abort when expanding half-precision to single-precision.

Removed the align 2 from the tests' loads and stores.

Herald added a subscriber: sameer.abuasal. · View Herald TranscriptOct 24 2019, 2:03 PM

LGTM

This revision is now accepted and ready to land.Oct 25 2019, 2:49 AM

Closed by commit rG1baa50396d9b: [RISCV] Add support for half-precision floats (authored by luismarques). · Explain WhyOct 25 2019, 6:03 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

7 lines

test/

CodeGen/

RISCV/

fp16-promote.ll

142 lines

Diff 226420

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::CTPOP, XLenVT, Expand);		setOperationAction(ISD::CTPOP, XLenVT, Expand);

ISD::CondCode FPCCToExtend[] = {		ISD::CondCode FPCCToExtend[] = {
ISD::SETOGT, ISD::SETOGE, ISD::SETONE, ISD::SETUEQ, ISD::SETUGT,		ISD::SETOGT, ISD::SETOGE, ISD::SETONE, ISD::SETUEQ, ISD::SETUGT,
ISD::SETUGE, ISD::SETULT, ISD::SETULE, ISD::SETUNE, ISD::SETGT,		ISD::SETUGE, ISD::SETULT, ISD::SETULE, ISD::SETUNE, ISD::SETGT,
ISD::SETGE, ISD::SETNE};		ISD::SETGE, ISD::SETNE};

ISD::NodeType FPOpToExtend[] = {		ISD::NodeType FPOpToExtend[] = {
ISD::FSIN, ISD::FCOS, ISD::FSINCOS, ISD::FPOW, ISD::FREM};		ISD::FSIN, ISD::FCOS, ISD::FSINCOS, ISD::FPOW, ISD::FREM, ISD::FP16_TO_FP,
		ISD::FP_TO_FP16};

if (Subtarget.hasStdExtF()) {		if (Subtarget.hasStdExtF()) {
setOperationAction(ISD::FMINNUM, MVT::f32, Legal);		setOperationAction(ISD::FMINNUM, MVT::f32, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f32, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f32, Legal);
for (auto CC : FPCCToExtend)		for (auto CC : FPCCToExtend)
setCondCodeAction(CC, MVT::f32, Expand);		setCondCodeAction(CC, MVT::f32, Expand);
setOperationAction(ISD::SELECT_CC, MVT::f32, Expand);		setOperationAction(ISD::SELECT_CC, MVT::f32, Expand);
setOperationAction(ISD::SELECT, MVT::f32, Custom);		setOperationAction(ISD::SELECT, MVT::f32, Custom);
setOperationAction(ISD::BR_CC, MVT::f32, Expand);		setOperationAction(ISD::BR_CC, MVT::f32, Expand);
for (auto Op : FPOpToExtend)		for (auto Op : FPOpToExtend)
setOperationAction(Op, MVT::f32, Expand);		setOperationAction(Op, MVT::f32, Expand);
		setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
		setTruncStoreAction(MVT::f32, MVT::f16, Expand);
}		}

if (Subtarget.hasStdExtF() && Subtarget.is64Bit())		if (Subtarget.hasStdExtF() && Subtarget.is64Bit())
setOperationAction(ISD::BITCAST, MVT::i32, Custom);		setOperationAction(ISD::BITCAST, MVT::i32, Custom);

if (Subtarget.hasStdExtD()) {		if (Subtarget.hasStdExtD()) {
setOperationAction(ISD::FMINNUM, MVT::f64, Legal);		setOperationAction(ISD::FMINNUM, MVT::f64, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);
for (auto CC : FPCCToExtend)		for (auto CC : FPCCToExtend)
setCondCodeAction(CC, MVT::f64, Expand);		setCondCodeAction(CC, MVT::f64, Expand);
setOperationAction(ISD::SELECT_CC, MVT::f64, Expand);		setOperationAction(ISD::SELECT_CC, MVT::f64, Expand);
setOperationAction(ISD::SELECT, MVT::f64, Custom);		setOperationAction(ISD::SELECT, MVT::f64, Custom);
setOperationAction(ISD::BR_CC, MVT::f64, Expand);		setOperationAction(ISD::BR_CC, MVT::f64, Expand);
setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f32, Expand);		setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f32, Expand);
setTruncStoreAction(MVT::f64, MVT::f32, Expand);		setTruncStoreAction(MVT::f64, MVT::f32, Expand);
for (auto Op : FPOpToExtend)		for (auto Op : FPOpToExtend)
setOperationAction(Op, MVT::f64, Expand);		setOperationAction(Op, MVT::f64, Expand);
		setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f16, Expand);
		setTruncStoreAction(MVT::f64, MVT::f16, Expand);
}		}

setOperationAction(ISD::GlobalAddress, XLenVT, Custom);		setOperationAction(ISD::GlobalAddress, XLenVT, Custom);
setOperationAction(ISD::BlockAddress, XLenVT, Custom);		setOperationAction(ISD::BlockAddress, XLenVT, Custom);
setOperationAction(ISD::ConstantPool, XLenVT, Custom);		setOperationAction(ISD::ConstantPool, XLenVT, Custom);

setOperationAction(ISD::GlobalTLSAddress, XLenVT, Custom);		setOperationAction(ISD::GlobalTLSAddress, XLenVT, Custom);

▲ Show 20 Lines • Show All 2,703 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/fp16-promote.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -mattr +d -target-abi ilp32d < %s \| FileCheck %s

				define void @test_load_store(half* %p, half* %q) nounwind {
				; CHECK-LABEL: test_load_store:
				; CHECK: # %bb.0:
				; CHECK-NEXT: lh a0, 0(a0)
				; CHECK-NEXT: sh a0, 0(a1)
				; CHECK-NEXT: ret
				%a = load half, half* %p
				store half %a, half* %q
				ret void
				}

				define float @test_fpextend_float(half* %p) nounwind {
				; CHECK-LABEL: test_fpextend_float:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addi sp, sp, -16
				; CHECK-NEXT: sw ra, 12(sp)
				; CHECK-NEXT: lhu a0, 0(a0)
				; CHECK-NEXT: call __gnu_h2f_ieee
				; CHECK-NEXT: lw ra, 12(sp)
				; CHECK-NEXT: addi sp, sp, 16
				; CHECK-NEXT: ret
				%a = load half, half* %p
				%r = fpext half %a to float
				ret float %r
				}

				define double @test_fpextend_double(half* %p) nounwind {
				; CHECK-LABEL: test_fpextend_double:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addi sp, sp, -16
				; CHECK-NEXT: sw ra, 12(sp)
				; CHECK-NEXT: lhu a0, 0(a0)
				; CHECK-NEXT: call __gnu_h2f_ieee
				; CHECK-NEXT: fcvt.d.s fa0, fa0
				; CHECK-NEXT: lw ra, 12(sp)
				; CHECK-NEXT: addi sp, sp, 16
				; CHECK-NEXT: ret
				%a = load half, half* %p
				%r = fpext half %a to double
				ret double %r
				}

				define void @test_fptrunc_float(float %f, half* %p) nounwind {
				; CHECK-LABEL: test_fptrunc_float:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addi sp, sp, -16
				; CHECK-NEXT: sw ra, 12(sp)
				; CHECK-NEXT: sw s0, 8(sp)
				; CHECK-NEXT: mv s0, a0
				; CHECK-NEXT: call __gnu_f2h_ieee
				; CHECK-NEXT: sh a0, 0(s0)
				; CHECK-NEXT: lw s0, 8(sp)
				; CHECK-NEXT: lw ra, 12(sp)
				; CHECK-NEXT: addi sp, sp, 16
				; CHECK-NEXT: ret
				%a = fptrunc float %f to half
				store half %a, half* %p
				ret void
				}

				define void @test_fptrunc_double(double %d, half* %p) nounwind {
				; CHECK-LABEL: test_fptrunc_double:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addi sp, sp, -16
				; CHECK-NEXT: sw ra, 12(sp)
				; CHECK-NEXT: sw s0, 8(sp)
				; CHECK-NEXT: mv s0, a0
				; CHECK-NEXT: call __truncdfhf2
				; CHECK-NEXT: sh a0, 0(s0)
				; CHECK-NEXT: lw s0, 8(sp)
				; CHECK-NEXT: lw ra, 12(sp)
				; CHECK-NEXT: addi sp, sp, 16
				; CHECK-NEXT: ret
				%a = fptrunc double %d to half
				store half %a, half* %p
				ret void
				}

				define void @test_fadd(half* %p, half* %q) nounwind {
				; CHECK-LABEL: test_fadd:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addi sp, sp, -32
				; CHECK-NEXT: sw ra, 28(sp)
				; CHECK-NEXT: sw s0, 24(sp)
				; CHECK-NEXT: sw s1, 20(sp)
				; CHECK-NEXT: fsd fs0, 8(sp)
				; CHECK-NEXT: mv s0, a1
				; CHECK-NEXT: mv s1, a0
				; CHECK-NEXT: lhu a0, 0(a0)
				; CHECK-NEXT: call __gnu_h2f_ieee
				; CHECK-NEXT: fmv.s fs0, fa0
				; CHECK-NEXT: lhu a0, 0(s0)
				; CHECK-NEXT: call __gnu_h2f_ieee
				; CHECK-NEXT: fadd.s fa0, fs0, fa0
				; CHECK-NEXT: call __gnu_f2h_ieee
				; CHECK-NEXT: sh a0, 0(s1)
				; CHECK-NEXT: fld fs0, 8(sp)
				; CHECK-NEXT: lw s1, 20(sp)
				; CHECK-NEXT: lw s0, 24(sp)
				; CHECK-NEXT: lw ra, 28(sp)
				; CHECK-NEXT: addi sp, sp, 32
				; CHECK-NEXT: ret
				%a = load half, half* %p
				%b = load half, half* %q
				%r = fadd half %a, %b
				store half %r, half* %p
				ret void
				}

				define void @test_fmul(half* %p, half* %q) nounwind {
				; CHECK-LABEL: test_fmul:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addi sp, sp, -32
				; CHECK-NEXT: sw ra, 28(sp)
				; CHECK-NEXT: sw s0, 24(sp)
				; CHECK-NEXT: sw s1, 20(sp)
				; CHECK-NEXT: fsd fs0, 8(sp)
				; CHECK-NEXT: mv s0, a1
				; CHECK-NEXT: mv s1, a0
				; CHECK-NEXT: lhu a0, 0(a0)
				; CHECK-NEXT: call __gnu_h2f_ieee
				; CHECK-NEXT: fmv.s fs0, fa0
				; CHECK-NEXT: lhu a0, 0(s0)
				; CHECK-NEXT: call __gnu_h2f_ieee
				; CHECK-NEXT: fmul.s fa0, fs0, fa0
				; CHECK-NEXT: call __gnu_f2h_ieee
				; CHECK-NEXT: sh a0, 0(s1)
				; CHECK-NEXT: fld fs0, 8(sp)
				; CHECK-NEXT: lw s1, 20(sp)
				; CHECK-NEXT: lw s0, 24(sp)
				; CHECK-NEXT: lw ra, 28(sp)
				; CHECK-NEXT: addi sp, sp, 32
				; CHECK-NEXT: ret
				%a = load half, half* %p
				%b = load half, half* %q
				%r = fmul half %a, %b
				store half %r, half* %p
				ret void
				}