This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][SelectionDAG] Add extendbfsf2 libcall and use it for bf16 extends with soft FP
AbandonedPublic

Authored by asb on May 25 2023, 6:04 AM.

Download Raw Diff

Details

Reviewers

compnerd
bkramer
t.p.northover
arsenm
craig.topper

Summary

Previously this resulted in an assert (reproducible on RISC-V with soft FP). The existing code path assumes a libcall is present, and adding the libcall seems like the easiest fix. This libcall _is_ provided by libgcc, which perhaps providing its own motivation for adding it here.

The legalisation code in LegalizeDAG lowers to an anyext and shift which might be an alternative. This would however be more invasive to support vs just adding an extra case to the existing libcall lowering logic, and these soft targets are likely not a target we care strongly about BF16 support beyond wanting some basic support for completeness.

I'm not able to convince myself that the anyext+shift lowering is always identical to the more elaborate extension performed by the libcall in all cases (and if so, why do the trunc and extend libcalls even exist?). though I'm not sure I can convince myself. I know @craig.topper was involved in a previous discussion on this so I'd appreciate your view.

Diff Detail

Unit TestsFailed

	Time	Test
	50 ms	x64 debian > LLVM.CodeGen/RISCV::bfloat.ll
	60,050 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

asb created this revision.May 25 2023, 6:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 6:04 AM

Herald added subscribers: luke, wingo, Enna1 and 24 others. · View Herald Transcript

asb requested review of this revision.May 25 2023, 6:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 6:04 AM

Herald added subscribers: • pcwang-thead, MaskRay, wdng. · View Herald Transcript

Harbormaster completed remote builds in B234484: Diff 525568.May 25 2023, 7:41 AM

I'm not able to convince myself that the anyext+shift lowering is always identical to the more elaborate extension performed by the libcall in all cases (and if so, why do the trunc and extend libcalls even exist?). though I'm not sure I can convince myself. I know @craig.topper was involved in a previous discussion on this so I'd appreciate your view.

fp32 has more bits of mantissa than bfloat16 but they have the same number of exponent bits.

The trunc libcall exists because the extra bits of mantissa that exist in fp32 need to be rounded to convert to bfloat16. Also some f32 subnormal values can't be represented in bfloat16. So it can't be done as an integer truncate.

For extend, we should just need to add 0s to the end of the mantissa. The +0.0, -0.0 are encoded as all 0s in the mantissa and exponent in both encodings. infinity is encoded with a special exponent and all 0 mantissa in both formats. nan uses the same exponent as infinity but a non-zero mantissa. If the mantissa is already non-zero, adding more zeros doesn't change that. Adding zeros to the end of the mantissa for normals and denormals shouldn't change their value.

asb edited the summary of this revision. (Show Details)May 25 2023, 9:03 AM

In D151436#4372633, @craig.topper wrote:

For extend, we should just need to add 0s to the end of the mantissa. The +0.0, -0.0 are encoded as all 0s in the mantissa and exponent in both encodings. infinity is encoded with a special exponent and all 0 mantissa in both formats. nan uses the same exponent as infinity but a non-zero mantissa. If the mantissa is already non-zero, adding more zeros doesn't change that. Adding zeros to the end of the mantissa for normals and denormals shouldn't change their value.

And then we'd just lose out on FE_INVALID being set if the input is a signalling NaN - it seems libgcc does have some support for setting these exception bits (on some platforms at least, with the right support hooks implemented) while compiler-rt has none. So I think that justifies the libcall for them. Thanks for helping clear that up.

You would only need to worry about snans with the constrained fptrunc

joshua-arch1 added a subscriber: joshua-arch1.May 25 2023, 7:13 PM

Retired in favour of D151563.

asb mentioned this in rG061e368fe213: [SelectionDAG] Implement soft FP legalisation for bf16 FP_EXTEND and BF16_TO_FP.May 29 2023, 2:34 AM

Revision Contents

Path

Size

compiler-rt/

lib/

builtins/

CMakeLists.txt

1 line

extendbfsf2.c

13 lines

fp_extend.h

9 lines

llvm/

include/

llvm/

IR/

RuntimeLibcalls.def

1 line

lib/

CodeGen/

SelectionDAG/

LegalizeFloatTypes.cpp

9 lines

TargetLoweringBase.cpp

3 lines

test/

CodeGen/

RISCV/

bfloat.ll

95 lines

Diff 525568

compiler-rt/lib/builtins/CMakeLists.txt

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	set(GENERIC_SOURCES
udivti3.c		udivti3.c
umoddi3.c		umoddi3.c
umodsi3.c		umodsi3.c
umodti3.c		umodti3.c
)		)

# We only build BF16 files when "__bf16" is available.		# We only build BF16 files when "__bf16" is available.
set(BF16_SOURCES		set(BF16_SOURCES
		extendbfsf2.c
truncdfbf2.c		truncdfbf2.c
truncsfbf2.c		truncsfbf2.c
)		)

# TODO: Several "tf" files (and divtc3.c, but not multc3.c) are in		# TODO: Several "tf" files (and divtc3.c, but not multc3.c) are in
# GENERIC_SOURCES instead of here.		# GENERIC_SOURCES instead of here.
set(GENERIC_TF_SOURCES		set(GENERIC_TF_SOURCES
addtf3.c		addtf3.c
▲ Show 20 Lines • Show All 704 Lines • Show Last 20 Lines

compiler-rt/lib/builtins/extendbfsf2.c

This file was added.

				//===-- lib/extendbfsf2.c - bfloat -> single conversion ------------ C --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#define SRC_BFLOAT
				#define DST_SINGLE
				#include "fp_extend_impl.inc"

				COMPILER_RT_ABI float __extendbfsf2(src_t a) { return __extendXfYf2__(a); }

compiler-rt/lib/builtins/fp_extend.h

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	#else			#else
	typedef uint16_t src_t;			typedef uint16_t src_t;
	#endif			#endif
	typedef uint16_t src_rep_t;			typedef uint16_t src_rep_t;
	#define SRC_REP_C UINT16_C			#define SRC_REP_C UINT16_C
	static const int srcSigBits = 10;			static const int srcSigBits = 10;
	#define src_rep_t_clz __builtin_clz			#define src_rep_t_clz __builtin_clz

				#elif defined SRC_BFLOAT
				typedef __bf16 src_t;
				typedef uint16_t src_rep_t;
				#define SRC_REP_C UINT16_C
				static const int srcSigBits = 7;
				#define src_rep_t_clz __builtin_clz

	#else			#else
	#error Source should be half, single, or double precision!			#error Source should be bfloat, half, single, or double precision!
	#endif // end source precision			#endif // end source precision

	#if defined DST_SINGLE			#if defined DST_SINGLE
	typedef float dst_t;			typedef float dst_t;
	typedef uint32_t dst_rep_t;			typedef uint32_t dst_rep_t;
	#define DST_REP_C UINT32_C			#define DST_REP_C UINT32_C
	static const int dstSigBits = 23;			static const int dstSigBits = 23;

	Show All 37 Lines

llvm/include/llvm/IR/RuntimeLibcalls.def

	Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
	HANDLE_LIBCALL(FPEXT_F80_F128, "__extendxftf2")			HANDLE_LIBCALL(FPEXT_F80_F128, "__extendxftf2")
	HANDLE_LIBCALL(FPEXT_F64_F128, "__extenddftf2")			HANDLE_LIBCALL(FPEXT_F64_F128, "__extenddftf2")
	HANDLE_LIBCALL(FPEXT_F32_F128, "__extendsftf2")			HANDLE_LIBCALL(FPEXT_F32_F128, "__extendsftf2")
	HANDLE_LIBCALL(FPEXT_F16_F128, "__extendhftf2")			HANDLE_LIBCALL(FPEXT_F16_F128, "__extendhftf2")
	HANDLE_LIBCALL(FPEXT_F16_F80, "__extendhfxf2")			HANDLE_LIBCALL(FPEXT_F16_F80, "__extendhfxf2")
	HANDLE_LIBCALL(FPEXT_F32_F64, "__extendsfdf2")			HANDLE_LIBCALL(FPEXT_F32_F64, "__extendsfdf2")
	HANDLE_LIBCALL(FPEXT_F16_F64, "__extendhfdf2")			HANDLE_LIBCALL(FPEXT_F16_F64, "__extendhfdf2")
	HANDLE_LIBCALL(FPEXT_F16_F32, "__gnu_h2f_ieee")			HANDLE_LIBCALL(FPEXT_F16_F32, "__gnu_h2f_ieee")
				HANDLE_LIBCALL(FPEXT_BF16_F32, "__extendbfsf2")
	HANDLE_LIBCALL(FPROUND_F32_F16, "__gnu_f2h_ieee")			HANDLE_LIBCALL(FPROUND_F32_F16, "__gnu_f2h_ieee")
	HANDLE_LIBCALL(FPROUND_F64_F16, "__truncdfhf2")			HANDLE_LIBCALL(FPROUND_F64_F16, "__truncdfhf2")
	HANDLE_LIBCALL(FPROUND_F80_F16, "__truncxfhf2")			HANDLE_LIBCALL(FPROUND_F80_F16, "__truncxfhf2")
	HANDLE_LIBCALL(FPROUND_F128_F16, "__trunctfhf2")			HANDLE_LIBCALL(FPROUND_F128_F16, "__trunctfhf2")
	HANDLE_LIBCALL(FPROUND_PPCF128_F16, "__trunctfhf2")			HANDLE_LIBCALL(FPROUND_PPCF128_F16, "__trunctfhf2")
	HANDLE_LIBCALL(FPROUND_F32_BF16, "__truncsfbf2")			HANDLE_LIBCALL(FPROUND_F32_BF16, "__truncsfbf2")
	HANDLE_LIBCALL(FPROUND_F64_BF16, "__truncdfbf2")			HANDLE_LIBCALL(FPROUND_F64_BF16, "__truncdfbf2")
	HANDLE_LIBCALL(FPROUND_F64_F32, "__truncdfsf2")			HANDLE_LIBCALL(FPROUND_F64_F32, "__truncdfsf2")
	▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

Show First 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SoftenFloatRes_FP_EXTEND(SDNode *N) {
if (getTypeAction(Op.getValueType()) == TargetLowering::TypePromoteFloat) {		if (getTypeAction(Op.getValueType()) == TargetLowering::TypePromoteFloat) {
Op = GetPromotedFloat(Op);		Op = GetPromotedFloat(Op);
// If the promotion did the FP_EXTEND to the destination type for us,		// If the promotion did the FP_EXTEND to the destination type for us,
// there's nothing left to do here.		// there's nothing left to do here.
if (Op.getValueType() == N->getValueType(0))		if (Op.getValueType() == N->getValueType(0))
return BitConvertToInteger(Op);		return BitConvertToInteger(Op);
}		}

// There's only a libcall for f16 -> f32, so proceed in two stages. Also, it's		// There's only a libcall for [b]f16 -> f32, so proceed in two stages. Also,
// entirely possible for both f16 and f32 to be legal, so use the fully		// it's entirely possible for both [b]f16 and f32 to be legal, so use the
// hard-float FP_EXTEND rather than FP16_TO_FP.		// fully hard-float FP_EXTEND rather than {FP16,BF16}_TO_FP.
if (Op.getValueType() == MVT::f16 && N->getValueType(0) != MVT::f32) {		if ((Op.getValueType() == MVT::f16 \|\| Op.getValueType() == MVT::bf16) &&
		N->getValueType(0) != MVT::f32) {
if (IsStrict) {		if (IsStrict) {
Op = DAG.getNode(ISD::STRICT_FP_EXTEND, SDLoc(N),		Op = DAG.getNode(ISD::STRICT_FP_EXTEND, SDLoc(N),
{ MVT::f32, MVT::Other }, { Chain, Op });		{ MVT::f32, MVT::Other }, { Chain, Op });
Chain = Op.getValue(1);		Chain = Op.getValue(1);
} else {		} else {
Op = DAG.getNode(ISD::FP_EXTEND, SDLoc(N), MVT::f32, Op);		Op = DAG.getNode(ISD::FP_EXTEND, SDLoc(N), MVT::f32, Op);
}		}
}		}
▲ Show 20 Lines • Show All 2,583 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	if (OpVT == MVT::f16) {
if (RetVT == MVT::f32)		if (RetVT == MVT::f32)
return FPEXT_F16_F32;		return FPEXT_F16_F32;
if (RetVT == MVT::f64)		if (RetVT == MVT::f64)
return FPEXT_F16_F64;		return FPEXT_F16_F64;
if (RetVT == MVT::f80)		if (RetVT == MVT::f80)
return FPEXT_F16_F80;		return FPEXT_F16_F80;
if (RetVT == MVT::f128)		if (RetVT == MVT::f128)
return FPEXT_F16_F128;		return FPEXT_F16_F128;
		} else if (OpVT == MVT::bf16) {
		if (RetVT == MVT::f32)
		return FPEXT_BF16_F32;
} else if (OpVT == MVT::f32) {		} else if (OpVT == MVT::f32) {
if (RetVT == MVT::f64)		if (RetVT == MVT::f64)
return FPEXT_F32_F64;		return FPEXT_F32_F64;
if (RetVT == MVT::f128)		if (RetVT == MVT::f128)
return FPEXT_F32_F128;		return FPEXT_F32_F128;
if (RetVT == MVT::ppcf128)		if (RetVT == MVT::ppcf128)
return FPEXT_F32_PPCF128;		return FPEXT_F32_PPCF128;
} else if (OpVT == MVT::f64) {		} else if (OpVT == MVT::f64) {
▲ Show 20 Lines • Show All 2,121 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/bfloat.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \| FileCheck %s -check-prefix=RV32I-ILP32
				; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \| FileCheck %s -check-prefix=RV64I-LP64

				; TODO: Enable codegen for hard float.

				define bfloat @float_to_bfloat(float %a) nounwind {
				; RV32I-ILP32-LABEL: float_to_bfloat:
				; RV32I-ILP32: # %bb.0:
				; RV32I-ILP32-NEXT: addi sp, sp, -16
				; RV32I-ILP32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
				; RV32I-ILP32-NEXT: call __truncsfbf2@plt
				; RV32I-ILP32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
				; RV32I-ILP32-NEXT: addi sp, sp, 16
				; RV32I-ILP32-NEXT: ret
				;
				; RV64I-LP64-LABEL: float_to_bfloat:
				; RV64I-LP64: # %bb.0:
				; RV64I-LP64-NEXT: addi sp, sp, -16
				; RV64I-LP64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
				; RV64I-LP64-NEXT: call __truncsfbf2@plt
				; RV64I-LP64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
				; RV64I-LP64-NEXT: addi sp, sp, 16
				; RV64I-LP64-NEXT: ret
				%1 = fptrunc float %a to bfloat
				ret bfloat %1
				}

				define bfloat @double_to_bfloat(double %a) nounwind {
				; RV32I-ILP32-LABEL: double_to_bfloat:
				; RV32I-ILP32: # %bb.0:
				; RV32I-ILP32-NEXT: addi sp, sp, -16
				; RV32I-ILP32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
				; RV32I-ILP32-NEXT: call __truncdfbf2@plt
				; RV32I-ILP32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
				; RV32I-ILP32-NEXT: addi sp, sp, 16
				; RV32I-ILP32-NEXT: ret
				;
				; RV64I-LP64-LABEL: double_to_bfloat:
				; RV64I-LP64: # %bb.0:
				; RV64I-LP64-NEXT: addi sp, sp, -16
				; RV64I-LP64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
				; RV64I-LP64-NEXT: call __truncdfbf2@plt
				; RV64I-LP64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
				; RV64I-LP64-NEXT: addi sp, sp, 16
				; RV64I-LP64-NEXT: ret
				%1 = fptrunc double %a to bfloat
				ret bfloat %1
				}

				define float @bfloat_to_float(bfloat %a) nounwind {
				; RV32I-ILP32-LABEL: bfloat_to_float:
				; RV32I-ILP32: # %bb.0:
				; RV32I-ILP32-NEXT: addi sp, sp, -16
				; RV32I-ILP32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
				; RV32I-ILP32-NEXT: call __extendbfsf2@plt
				; RV32I-ILP32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
				; RV32I-ILP32-NEXT: addi sp, sp, 16
				; RV32I-ILP32-NEXT: ret
				;
				; RV64I-LP64-LABEL: bfloat_to_float:
				; RV64I-LP64: # %bb.0:
				; RV64I-LP64-NEXT: addi sp, sp, -16
				; RV64I-LP64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
				; RV64I-LP64-NEXT: call __extendbfsf2@plt
				; RV64I-LP64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
				; RV64I-LP64-NEXT: addi sp, sp, 16
				; RV64I-LP64-NEXT: ret
				%1 = fpext bfloat %a to float
				ret float %1
				}

				define double @bfloat_to_double(bfloat %a) nounwind {
				; RV32I-ILP32-LABEL: bfloat_to_double:
				; RV32I-ILP32: # %bb.0:
				; RV32I-ILP32-NEXT: addi sp, sp, -16
				; RV32I-ILP32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
				; RV32I-ILP32-NEXT: call __extendbfsf2@plt
				; RV32I-ILP32-NEXT: call __extendsfdf2@plt
				; RV32I-ILP32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
				; RV32I-ILP32-NEXT: addi sp, sp, 16
				; RV32I-ILP32-NEXT: ret
				;
				; RV64I-LP64-LABEL: bfloat_to_double:
				; RV64I-LP64: # %bb.0:
				; RV64I-LP64-NEXT: addi sp, sp, -16
				; RV64I-LP64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
				; RV64I-LP64-NEXT: call __extendbfsf2@plt
				; RV64I-LP64-NEXT: call __extendsfdf2@plt
				; RV64I-LP64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
				; RV64I-LP64-NEXT: addi sp, sp, 16
				; RV64I-LP64-NEXT: ret
				%1 = fpext bfloat %a to double
				ret double %1
				}