This is an archive of the discontinued LLVM Phabricator instance.

Differential D126953

Promote bf16 to f32 when the target doesn't support it
ClosedPublic

Authored by bkramer on Jun 3 2022, 2:24 AM.

Download Raw Diff

Details

Reviewers

t.p.northover

Commits

rGfb34d531af95: Promote bf16 to f32 when the target doesn't support it

Summary

This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:

Targets with bf16 conversion instructions can now make fp_to_bf16 legal
The software conversion to bf16 can be replaced by a trivial implementation under fast math.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bkramer created this revision.Jun 3 2022, 2:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 2:24 AM

Herald added subscribers: jsji, Enna1, pengfei and 2 others. · View Herald Transcript

bkramer requested review of this revision.Jun 3 2022, 2:24 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 3 2022, 2:24 AM

Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript

Harbormaster completed remote builds in B167688: Diff 433987.Jun 3 2022, 3:13 AM

Post-wwdc ping.

FreddyYe added a subscriber: FreddyYe.Jun 14 2022, 6:11 PM

Looks like everything's in place and working to me.

This revision is now accepted and ready to land.Jun 15 2022, 1:11 AM

This revision was landed with ongoing or failed builds.Jun 15 2022, 4:01 AM

Closed by commit rGfb34d531af95: Promote bf16 to f32 when the target doesn't support it (authored by bkramer). · Explain Why

This revision was automatically updated to reflect the committed changes.

bkramer added a commit: rGfb34d531af95: Promote bf16 to f32 when the target doesn't support it.

arsenm added a subscriber: arsenm.Dec 10 2022, 5:08 AM

arsenm added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2912–2915	Why can this just shift into the high bits? Why don't the mantissa bits need to be adjusted down to the low bits?

pengfei added inline comments.Dec 10 2022, 7:14 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2912–2915	Expand a normal value doesn't need to adjust the mantissa bits. We do have concerns like DAZ or signaling NaN are not respected. But BF16 is not a IEEE standard type. There's no such strict rule for it AFAIK. And GCC does it in the same way.

FWIW, i agree with @arsenm, the legalization is wrong.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2912–2915	This answer makes no sense. This expansion is an active miscompile. The proper way to lower it is https://godbolt.org/z/GzM3n7Tdc

In D126953#3986425, @lebedev.ri wrote:

FWIW, i agree with @arsenm, the legalization is wrong.

The lowering is correct. Mantissa for ieee numbers are normalized by shifting left to avoid storing the first 1.

Consider the number 1.5. In f32 it is stored as 0x3fc00000
sign = 0
exponent = 127
mantissa = 0x400000

1.5 in bfloat16 is 0x3fc0
sign = 0
exponent = 127
mantissa = 0x400

In D126953#3986502, @craig.topper wrote:

In D126953#3986425, @lebedev.ri wrote:

FWIW, i agree with @arsenm, the legalization is wrong.

The lowering is correct. Mantissa for ieee numbers are normalized by shifting left to avoid storing the first 1.

Consider the number 1.5. In f32 it is stored as 0x3fc00000
sign = 0
exponent = 127
mantissa = 0x400000

1.5 in bfloat16 is 0x3fc0
sign = 0
exponent = 127
mantissa = 0x400

Ok, i forgot that bit (i even implemented similar widening elsewhere previously!).
So yes, this is identical except for subnormals

no subnormal normalization https://godbolt.org/z/TTqf9PeGc https://godbolt.org/z/TTqf9PeGc
with subnormal normalization https://godbolt.org/z/K88vh4xc8 https://alive2.llvm.org/ce/z/WeFY75

My apologies...

In D126953#3986514, @lebedev.ri wrote:

In D126953#3986502, @craig.topper wrote:

In D126953#3986425, @lebedev.ri wrote:

FWIW, i agree with @arsenm, the legalization is wrong.

The lowering is correct. Mantissa for ieee numbers are normalized by shifting left to avoid storing the first 1.

Consider the number 1.5. In f32 it is stored as 0x3fc00000
sign = 0
exponent = 127
mantissa = 0x400000

1.5 in bfloat16 is 0x3fc0
sign = 0
exponent = 127
mantissa = 0x400

Ok, i forgot that bit (i even implemented similar widening elsewhere previously!).
So yes, this is identical except for subnormals

no subnormal normalization https://godbolt.org/z/TTqf9PeGc https://godbolt.org/z/TTqf9PeGc

with subnormal normalization https://godbolt.org/z/K88vh4xc8 https://alive2.llvm.org/ce/z/WeFY75

My apologies...

It should be the same even for subnormals. The exponents in float32 and bfloat16 are the same width and use the same bias. A subnormal in bfloat16 can't be normalized in float32. The exponent can't get any smaller.

It appears the code in RawSpeed assumes the differences in biases is greater than width of the mantissa of the smaller type. If the number of shifts needed to normalize is greater than the difference in bias, the exponent will go negative, but the code doesn't check for that.

Craig is correct, a subnormal in bfloat16 is also subnormal in fp32 if no DAZ.

codemzs mentioned this in D150913: [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support..May 22 2023, 1:46 PM

asb mentioned this in D157288: [TargetLowering][NFC] Document overloaded meaning of TypeSoftPromoteHalf.Aug 7 2023, 8:02 AM

Revision Contents

Path

Size

compiler-rt/

lib/

builtins/

CMakeLists.txt

1 line

fp_trunc.h

6 lines

truncsfbf2.c

13 lines

llvm/

include/

llvm/

CodeGen/

ISDOpcodes.h

7 lines

IR/

RuntimeLibcalls.def

1 line

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

20 lines

LegalizeFloatTypes.cpp

20 lines

SelectionDAGDumper.cpp

2 lines

TargetLoweringBase.cpp

13 lines

Target/

X86/

X86ISelLowering.cpp

20 lines

test/

CodeGen/

X86/

bfloat.ll

28 lines

Diff 437106

compiler-rt/lib/builtins/CMakeLists.txt

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	set(GENERIC_SOURCES
subdf3.c		subdf3.c
subsf3.c		subsf3.c
subvdi3.c		subvdi3.c
subvsi3.c		subvsi3.c
subvti3.c		subvti3.c
trampoline_setup.c		trampoline_setup.c
truncdfhf2.c		truncdfhf2.c
truncdfsf2.c		truncdfsf2.c
		truncsfbf2.c
truncsfhf2.c		truncsfhf2.c
ucmpdi2.c		ucmpdi2.c
ucmpti2.c		ucmpti2.c
udivdi3.c		udivdi3.c
udivmoddi4.c		udivmoddi4.c
udivmodsi4.c		udivmodsi4.c
udivmodti4.c		udivmodti4.c
udivsi3.c		udivsi3.c
▲ Show 20 Lines • Show All 648 Lines • Show Last 20 Lines

compiler-rt/lib/builtins/fp_trunc.h

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	typedef _Float16 dst_t;			typedef _Float16 dst_t;
	#else			#else
	typedef uint16_t dst_t;			typedef uint16_t dst_t;
	#endif			#endif
	typedef uint16_t dst_rep_t;			typedef uint16_t dst_rep_t;
	#define DST_REP_C UINT16_C			#define DST_REP_C UINT16_C
	static const int dstSigBits = 10;			static const int dstSigBits = 10;

				#elif defined DST_BFLOAT
				typedef uint16_t dst_t;
				typedef uint16_t dst_rep_t;
				#define DST_REP_C UINT16_C
				static const int dstSigBits = 7;

	#else			#else
	#error Destination should be single precision or double precision!			#error Destination should be single precision or double precision!
	#endif // end destination precision			#endif // end destination precision

	// End of specialization parameters. Two helper routines for conversion to and			// End of specialization parameters. Two helper routines for conversion to and
	// from the representation of floating-point data as integer values follow.			// from the representation of floating-point data as integer values follow.

	static __inline src_rep_t srcToRep(src_t x) {			static __inline src_rep_t srcToRep(src_t x) {
	Show All 16 Lines

compiler-rt/lib/builtins/truncsfbf2.c

This file was added.

				//===-- lib/truncsfbf2.c - single -> bfloat conversion ------------- C --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#define SRC_SINGLE
				#define DST_BFLOAT
				#include "fp_trunc_impl.inc"

				COMPILER_RT_ABI dst_t __truncsfbf2(float a) { return __truncXfYf2__(a); }

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 892 Lines • ▼ Show 20 Lines	enum NodeType {
/// and truncation for half-precision (16 bit) floating numbers. These nodes		/// and truncation for half-precision (16 bit) floating numbers. These nodes
/// form a semi-softened interface for dealing with f16 (as an i16), which		/// form a semi-softened interface for dealing with f16 (as an i16), which
/// is often a storage-only type but has native conversions.		/// is often a storage-only type but has native conversions.
FP16_TO_FP,		FP16_TO_FP,
FP_TO_FP16,		FP_TO_FP16,
STRICT_FP16_TO_FP,		STRICT_FP16_TO_FP,
STRICT_FP_TO_FP16,		STRICT_FP_TO_FP16,

		/// BF16_TO_FP, FP_TO_BF16 - These operators are used to perform promotions
		/// and truncation for bfloat16. These nodes form a semi-softened interface
		/// for dealing with bf16 (as an i16), which is often a storage-only type but
		/// has native conversions.
		BF16_TO_FP,
		FP_TO_BF16,

/// Perform various unary floating-point operations inspired by libm. For		/// Perform various unary floating-point operations inspired by libm. For
/// FPOWI, the result is undefined if if the integer operand doesn't fit into		/// FPOWI, the result is undefined if if the integer operand doesn't fit into
/// sizeof(int).		/// sizeof(int).
FNEG,		FNEG,
FABS,		FABS,
FSQRT,		FSQRT,
FCBRT,		FCBRT,
FSIN,		FSIN,
▲ Show 20 Lines • Show All 589 Lines • Show Last 20 Lines

llvm/include/llvm/IR/RuntimeLibcalls.def

	Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines
	HANDLE_LIBCALL(FPEXT_F32_F64, "__extendsfdf2")			HANDLE_LIBCALL(FPEXT_F32_F64, "__extendsfdf2")
	HANDLE_LIBCALL(FPEXT_F16_F64, "__extendhfdf2")			HANDLE_LIBCALL(FPEXT_F16_F64, "__extendhfdf2")
	HANDLE_LIBCALL(FPEXT_F16_F32, "__gnu_h2f_ieee")			HANDLE_LIBCALL(FPEXT_F16_F32, "__gnu_h2f_ieee")
	HANDLE_LIBCALL(FPROUND_F32_F16, "__gnu_f2h_ieee")			HANDLE_LIBCALL(FPROUND_F32_F16, "__gnu_f2h_ieee")
	HANDLE_LIBCALL(FPROUND_F64_F16, "__truncdfhf2")			HANDLE_LIBCALL(FPROUND_F64_F16, "__truncdfhf2")
	HANDLE_LIBCALL(FPROUND_F80_F16, "__truncxfhf2")			HANDLE_LIBCALL(FPROUND_F80_F16, "__truncxfhf2")
	HANDLE_LIBCALL(FPROUND_F128_F16, "__trunctfhf2")			HANDLE_LIBCALL(FPROUND_F128_F16, "__trunctfhf2")
	HANDLE_LIBCALL(FPROUND_PPCF128_F16, "__trunctfhf2")			HANDLE_LIBCALL(FPROUND_PPCF128_F16, "__trunctfhf2")
				HANDLE_LIBCALL(FPROUND_F32_BF16, "__truncsfbf2")
	HANDLE_LIBCALL(FPROUND_F64_F32, "__truncdfsf2")			HANDLE_LIBCALL(FPROUND_F64_F32, "__truncdfsf2")
	HANDLE_LIBCALL(FPROUND_F80_F32, "__truncxfsf2")			HANDLE_LIBCALL(FPROUND_F80_F32, "__truncxfsf2")
	HANDLE_LIBCALL(FPROUND_F128_F32, "__trunctfsf2")			HANDLE_LIBCALL(FPROUND_F128_F32, "__trunctfsf2")
	HANDLE_LIBCALL(FPROUND_PPCF128_F32, "__gcc_qtos")			HANDLE_LIBCALL(FPROUND_PPCF128_F32, "__gcc_qtos")
	HANDLE_LIBCALL(FPROUND_F80_F64, "__truncxfdf2")			HANDLE_LIBCALL(FPROUND_F80_F64, "__truncxfdf2")
	HANDLE_LIBCALL(FPROUND_F128_F64, "__trunctfdf2")			HANDLE_LIBCALL(FPROUND_F128_F64, "__trunctfdf2")
	HANDLE_LIBCALL(FPROUND_PPCF128_F64, "__gcc_qtod")			HANDLE_LIBCALL(FPROUND_PPCF128_F64, "__gcc_qtod")
	HANDLE_LIBCALL(FPROUND_F128_F80, "__trunctfxf2")			HANDLE_LIBCALL(FPROUND_F128_F80, "__trunctfxf2")
	▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines	case ISD::GET_DYNAMIC_AREA_OFFSET:
break;		break;
case ISD::VAARG:		case ISD::VAARG:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
Node->getValueType(0));		Node->getValueType(0));
if (Action != TargetLowering::Promote)		if (Action != TargetLowering::Promote)
Action = TLI.getOperationAction(Node->getOpcode(), MVT::Other);		Action = TLI.getOperationAction(Node->getOpcode(), MVT::Other);
break;		break;
case ISD::FP_TO_FP16:		case ISD::FP_TO_FP16:
		case ISD::FP_TO_BF16:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
case ISD::LROUND:		case ISD::LROUND:
case ISD::LLROUND:		case ISD::LLROUND:
case ISD::LRINT:		case ISD::LRINT:
case ISD::LLRINT:		case ISD::LLRINT:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
▲ Show 20 Lines • Show All 1,890 Lines • ▼ Show 20 Lines	case ISD::STRICT_FP_EXTEND:
}		}
break;		break;
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
if ((Tmp1 = EmitStackConvert(Node->getOperand(0),		if ((Tmp1 = EmitStackConvert(Node->getOperand(0),
Node->getOperand(0).getValueType(),		Node->getOperand(0).getValueType(),
Node->getValueType(0), dl)))		Node->getValueType(0), dl)))
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
		case ISD::BF16_TO_FP: {
		// Always expand bf16 to f32 casts, they lower to ext + shift.
		SDValue Op = DAG.getNode(ISD::BITCAST, dl, MVT::i16, Node->getOperand(0));
		Op = DAG.getNode(ISD::ANY_EXTEND, dl, MVT::i32, Op);
		Op = DAG.getNode(
		ISD::SHL, dl, MVT::i32, Op,
		DAG.getConstant(16, dl,
		TLI.getShiftAmountTy(MVT::i32, DAG.getDataLayout())));
		arsenmUnsubmitted Not Done Reply Inline Actions Why can this just shift into the high bits? Why don't the mantissa bits need to be adjusted down to the low bits? arsenm: Why can this just shift into the high bits? Why don't the mantissa bits need to be adjusted…
		pengfeiUnsubmitted Not Done Reply Inline Actions Expand a normal value doesn't need to adjust the mantissa bits. We do have concerns like DAZ or signaling NaN are not respected. But BF16 is not a IEEE standard type. There's no such strict rule for it AFAIK. And GCC does it in the same way. pengfei: Expand a normal value doesn't need to adjust the mantissa bits. We do have concerns like DAZ or…
		lebedev.riUnsubmitted Not Done Reply Inline Actions This answer makes no sense. This expansion is an active miscompile. The proper way to lower it is https://godbolt.org/z/GzM3n7Tdc lebedev.ri: This answer makes no sense. This expansion is an active miscompile. The proper way to lower it…
		Op = DAG.getNode(ISD::BITCAST, dl, MVT::f32, Op);
		Results.push_back(Op);
		break;
		}
case ISD::SIGN_EXTEND_INREG: {		case ISD::SIGN_EXTEND_INREG: {
EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();		EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);

// An in-register sign-extend of a boolean is a negation:		// An in-register sign-extend of a boolean is a negation:
// 'true' (1) sign-extended is -1.		// 'true' (1) sign-extended is -1.
// 'false' (0) sign-extended is 0.		// 'false' (0) sign-extended is 0.
// However, we must mask the high bits of the source operand because the		// However, we must mask the high bits of the source operand because the
▲ Show 20 Lines • Show All 1,296 Lines • ▼ Show 20 Lines	void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
}		}
case ISD::FP_TO_FP16: {		case ISD::FP_TO_FP16: {
RTLIB::Libcall LC =		RTLIB::Libcall LC =
RTLIB::getFPROUND(Node->getOperand(0).getValueType(), MVT::f16);		RTLIB::getFPROUND(Node->getOperand(0).getValueType(), MVT::f16);
assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unable to expand fp_to_fp16");		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unable to expand fp_to_fp16");
Results.push_back(ExpandLibCall(LC, Node, false));		Results.push_back(ExpandLibCall(LC, Node, false));
break;		break;
}		}
		case ISD::FP_TO_BF16: {
		RTLIB::Libcall LC =
		RTLIB::getFPROUND(Node->getOperand(0).getValueType(), MVT::bf16);
		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unable to expand fp_to_bf16");
		Results.push_back(ExpandLibCall(LC, Node, false));
		break;
		}
case ISD::STRICT_SINT_TO_FP:		case ISD::STRICT_SINT_TO_FP:
case ISD::STRICT_UINT_TO_FP:		case ISD::STRICT_UINT_TO_FP:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP: {		case ISD::UINT_TO_FP: {
// TODO - Common the code with DAGTypeLegalizer::SoftenFloatRes_XINT_TO_FP		// TODO - Common the code with DAGTypeLegalizer::SoftenFloatRes_XINT_TO_FP
bool IsStrict = Node->isStrictFPOpcode();		bool IsStrict = Node->isStrictFPOpcode();
bool Signed = Node->getOpcode() == ISD::SINT_TO_FP \|\|		bool Signed = Node->getOpcode() == ISD::SINT_TO_FP \|\|
Node->getOpcode() == ISD::STRICT_SINT_TO_FP;		Node->getOpcode() == ISD::STRICT_SINT_TO_FP;
▲ Show 20 Lines • Show All 869 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

Show First 20 Lines • Show All 828 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
N->dump(&DAG); dbgs() << "\n";		N->dump(&DAG); dbgs() << "\n";
#endif		#endif
llvm_unreachable("Do not know how to soften this operator's operand!");		llvm_unreachable("Do not know how to soften this operator's operand!");

case ISD::BITCAST: Res = SoftenFloatOp_BITCAST(N); break;		case ISD::BITCAST: Res = SoftenFloatOp_BITCAST(N); break;
case ISD::BR_CC: Res = SoftenFloatOp_BR_CC(N); break;		case ISD::BR_CC: Res = SoftenFloatOp_BR_CC(N); break;
case ISD::STRICT_FP_TO_FP16:		case ISD::STRICT_FP_TO_FP16:
case ISD::FP_TO_FP16: // Same as FP_ROUND for softening purposes		case ISD::FP_TO_FP16: // Same as FP_ROUND for softening purposes
		case ISD::FP_TO_BF16:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::FP_ROUND: Res = SoftenFloatOp_FP_ROUND(N); break;		case ISD::FP_ROUND: Res = SoftenFloatOp_FP_ROUND(N); break;
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
case ISD::STRICT_FP_TO_UINT:		case ISD::STRICT_FP_TO_UINT:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT: Res = SoftenFloatOp_FP_TO_XINT(N); break;		case ISD::FP_TO_UINT: Res = SoftenFloatOp_FP_TO_XINT(N); break;
case ISD::FP_TO_SINT_SAT:		case ISD::FP_TO_SINT_SAT:
case ISD::FP_TO_UINT_SAT:		case ISD::FP_TO_UINT_SAT:
Show All 35 Lines	SDValue DAGTypeLegalizer::SoftenFloatOp_BITCAST(SDNode *N) {
return DAG.getNode(ISD::BITCAST, SDLoc(N), N->getValueType(0), Op0);		return DAG.getNode(ISD::BITCAST, SDLoc(N), N->getValueType(0), Op0);
}		}

SDValue DAGTypeLegalizer::SoftenFloatOp_FP_ROUND(SDNode *N) {		SDValue DAGTypeLegalizer::SoftenFloatOp_FP_ROUND(SDNode *N) {
// We actually deal with the partially-softened FP_TO_FP16 node too, which		// We actually deal with the partially-softened FP_TO_FP16 node too, which
// returns an i16 so doesn't meet the constraints necessary for FP_ROUND.		// returns an i16 so doesn't meet the constraints necessary for FP_ROUND.
assert(N->getOpcode() == ISD::FP_ROUND \|\| N->getOpcode() == ISD::FP_TO_FP16 \|\|		assert(N->getOpcode() == ISD::FP_ROUND \|\| N->getOpcode() == ISD::FP_TO_FP16 \|\|
N->getOpcode() == ISD::STRICT_FP_TO_FP16 \|\|		N->getOpcode() == ISD::STRICT_FP_TO_FP16 \|\|
		N->getOpcode() == ISD::FP_TO_BF16 \|\|
N->getOpcode() == ISD::STRICT_FP_ROUND);		N->getOpcode() == ISD::STRICT_FP_ROUND);

bool IsStrict = N->isStrictFPOpcode();		bool IsStrict = N->isStrictFPOpcode();
SDValue Op = N->getOperand(IsStrict ? 1 : 0);		SDValue Op = N->getOperand(IsStrict ? 1 : 0);
EVT SVT = Op.getValueType();		EVT SVT = Op.getValueType();
EVT RVT = N->getValueType(0);		EVT RVT = N->getValueType(0);
EVT FloatRVT = (N->getOpcode() == ISD::FP_TO_FP16 \|\|		EVT FloatRVT = RVT;
		if (N->getOpcode() == ISD::FP_TO_FP16 \|\|
N->getOpcode() == ISD::STRICT_FP_TO_FP16)		N->getOpcode() == ISD::STRICT_FP_TO_FP16)
? MVT::f16		FloatRVT = MVT::f16;
: RVT;		else if (N->getOpcode() == ISD::FP_TO_BF16)
		FloatRVT = MVT::bf16;

RTLIB::Libcall LC = RTLIB::getFPROUND(SVT, FloatRVT);		RTLIB::Libcall LC = RTLIB::getFPROUND(SVT, FloatRVT);
assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported FP_ROUND libcall");		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported FP_ROUND libcall");

SDValue Chain = IsStrict ? N->getOperand(0) : SDValue();		SDValue Chain = IsStrict ? N->getOperand(0) : SDValue();
Op = GetSoftenedFloat(Op);		Op = GetSoftenedFloat(Op);
TargetLowering::MakeLibCallOptions CallOptions;		TargetLowering::MakeLibCallOptions CallOptions;
CallOptions.setTypeListBeforeSoften(SVT, RVT, true);		CallOptions.setTypeListBeforeSoften(SVT, RVT, true);
▲ Show 20 Lines • Show All 1,157 Lines • ▼ Show 20 Lines

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Float Operand Promotion		// Float Operand Promotion
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//

static ISD::NodeType GetPromotionOpcode(EVT OpVT, EVT RetVT) {		static ISD::NodeType GetPromotionOpcode(EVT OpVT, EVT RetVT) {
if (OpVT == MVT::f16) {		if (OpVT == MVT::f16) {
return ISD::FP16_TO_FP;		return ISD::FP16_TO_FP;
} else if (RetVT == MVT::f16) {		} else if (RetVT == MVT::f16) {
return ISD::FP_TO_FP16;		return ISD::FP_TO_FP16;
		} else if (OpVT == MVT::bf16) {
		return ISD::BF16_TO_FP;
		} else if (RetVT == MVT::bf16) {
		return ISD::FP_TO_BF16;
}		}

report_fatal_error("Attempt at an invalid promotion-related conversion");		report_fatal_error("Attempt at an invalid promotion-related conversion");
}		}

bool DAGTypeLegalizer::PromoteFloatOperand(SDNode *N, unsigned OpNo) {		bool DAGTypeLegalizer::PromoteFloatOperand(SDNode *N, unsigned OpNo) {
LLVM_DEBUG(dbgs() << "Promote float operand " << OpNo << ": "; N->dump(&DAG);		LLVM_DEBUG(dbgs() << "Promote float operand " << OpNo << ": "; N->dump(&DAG);
dbgs() << "\n");		dbgs() << "\n");
▲ Show 20 Lines • Show All 955 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	#endif
case ISD::FP_TO_SINT_SAT: return "fp_to_sint_sat";		case ISD::FP_TO_SINT_SAT: return "fp_to_sint_sat";
case ISD::FP_TO_UINT_SAT: return "fp_to_uint_sat";		case ISD::FP_TO_UINT_SAT: return "fp_to_uint_sat";
case ISD::BITCAST: return "bitcast";		case ISD::BITCAST: return "bitcast";
case ISD::ADDRSPACECAST: return "addrspacecast";		case ISD::ADDRSPACECAST: return "addrspacecast";
case ISD::FP16_TO_FP: return "fp16_to_fp";		case ISD::FP16_TO_FP: return "fp16_to_fp";
case ISD::STRICT_FP16_TO_FP: return "strict_fp16_to_fp";		case ISD::STRICT_FP16_TO_FP: return "strict_fp16_to_fp";
case ISD::FP_TO_FP16: return "fp_to_fp16";		case ISD::FP_TO_FP16: return "fp_to_fp16";
case ISD::STRICT_FP_TO_FP16: return "strict_fp_to_fp16";		case ISD::STRICT_FP_TO_FP16: return "strict_fp_to_fp16";
		case ISD::BF16_TO_FP: return "bf16_to_fp";
		case ISD::FP_TO_BF16: return "fp_to_bf16";
case ISD::LROUND: return "lround";		case ISD::LROUND: return "lround";
case ISD::STRICT_LROUND: return "strict_lround";		case ISD::STRICT_LROUND: return "strict_lround";
case ISD::LLROUND: return "llround";		case ISD::LLROUND: return "llround";
case ISD::STRICT_LLROUND: return "strict_llround";		case ISD::STRICT_LLROUND: return "strict_llround";
case ISD::LRINT: return "lrint";		case ISD::LRINT: return "lrint";
case ISD::STRICT_LRINT: return "strict_lrint";		case ISD::STRICT_LRINT: return "strict_lrint";
case ISD::LLRINT: return "llrint";		case ISD::LLRINT: return "llrint";
case ISD::STRICT_LLRINT: return "strict_llrint";		case ISD::STRICT_LLRINT: return "strict_llrint";
▲ Show 20 Lines • Show All 689 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	if (RetVT == MVT::f16) {
if (OpVT == MVT::f64)		if (OpVT == MVT::f64)
return FPROUND_F64_F16;		return FPROUND_F64_F16;
if (OpVT == MVT::f80)		if (OpVT == MVT::f80)
return FPROUND_F80_F16;		return FPROUND_F80_F16;
if (OpVT == MVT::f128)		if (OpVT == MVT::f128)
return FPROUND_F128_F16;		return FPROUND_F128_F16;
if (OpVT == MVT::ppcf128)		if (OpVT == MVT::ppcf128)
return FPROUND_PPCF128_F16;		return FPROUND_PPCF128_F16;
		} else if (RetVT == MVT::bf16) {
		if (OpVT == MVT::f32)
		return FPROUND_F32_BF16;
} else if (RetVT == MVT::f32) {		} else if (RetVT == MVT::f32) {
if (OpVT == MVT::f64)		if (OpVT == MVT::f64)
return FPROUND_F64_F32;		return FPROUND_F64_F32;
if (OpVT == MVT::f80)		if (OpVT == MVT::f80)
return FPROUND_F80_F32;		return FPROUND_F80_F32;
if (OpVT == MVT::f128)		if (OpVT == MVT::f128)
return FPROUND_F128_F32;		return FPROUND_F128_F32;
if (OpVT == MVT::ppcf128)		if (OpVT == MVT::ppcf128)
▲ Show 20 Lines • Show All 1,083 Lines • ▼ Show 20 Lines	if (!isTypeLegal(MVT::f16)) {
} else {		} else {
NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::f32];		NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::f32];
RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::f32];		RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::f32];
TransformToType[MVT::f16] = MVT::f32;		TransformToType[MVT::f16] = MVT::f32;
ValueTypeActions.setTypeAction(MVT::f16, TypePromoteFloat);		ValueTypeActions.setTypeAction(MVT::f16, TypePromoteFloat);
}		}
}		}

		// Decide how to handle bf16. If the target does not have native bf16 support,
		// promote it to f32, because there are no bf16 library calls (except for
		// converting from f32 to bf16).
		if (!isTypeLegal(MVT::bf16)) {
		NumRegistersForVT[MVT::bf16] = NumRegistersForVT[MVT::f32];
		RegisterTypeForVT[MVT::bf16] = RegisterTypeForVT[MVT::f32];
		TransformToType[MVT::bf16] = MVT::f32;
		ValueTypeActions.setTypeAction(MVT::bf16, TypePromoteFloat);
		}

// Loop over all of the vector value types to see which need transformations.		// Loop over all of the vector value types to see which need transformations.
for (unsigned i = MVT::FIRST_VECTOR_VALUETYPE;		for (unsigned i = MVT::FIRST_VECTOR_VALUETYPE;
i <= (unsigned)MVT::LAST_VECTOR_VALUETYPE; ++i) {		i <= (unsigned)MVT::LAST_VECTOR_VALUETYPE; ++i) {
MVT VT = (MVT::SimpleValueType) i;		MVT VT = (MVT::SimpleValueType) i;
if (isTypeLegal(VT))		if (isTypeLegal(VT))
continue;		continue;

MVT EltVT = VT.getVectorElementType();		MVT EltVT = VT.getVectorElementType();
▲ Show 20 Lines • Show All 972 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	setOperationAction(
Op, MVT::f32,		Op, MVT::f32,
(!Subtarget.useSoftFloat() && Subtarget.hasF16C()) ? Custom : Expand);		(!Subtarget.useSoftFloat() && Subtarget.hasF16C()) ? Custom : Expand);
// There's never any support for operations beyond MVT::f32.		// There's never any support for operations beyond MVT::f32.
setOperationAction(Op, MVT::f64, Expand);		setOperationAction(Op, MVT::f64, Expand);
setOperationAction(Op, MVT::f80, Expand);		setOperationAction(Op, MVT::f80, Expand);
setOperationAction(Op, MVT::f128, Expand);		setOperationAction(Op, MVT::f128, Expand);
}		}

setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);		for (MVT VT : {MVT::f32, MVT::f64, MVT::f80, MVT::f128}) {
setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f16, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, MVT::f16, Expand);
setLoadExtAction(ISD::EXTLOAD, MVT::f80, MVT::f16, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, MVT::bf16, Expand);
setLoadExtAction(ISD::EXTLOAD, MVT::f128, MVT::f16, Expand);		setTruncStoreAction(VT, MVT::f16, Expand);
setTruncStoreAction(MVT::f32, MVT::f16, Expand);		setTruncStoreAction(VT, MVT::bf16, Expand);
setTruncStoreAction(MVT::f64, MVT::f16, Expand);
setTruncStoreAction(MVT::f80, MVT::f16, Expand);		setOperationAction(ISD::BF16_TO_FP, VT, Expand);
setTruncStoreAction(MVT::f128, MVT::f16, Expand);		setOperationAction(ISD::FP_TO_BF16, VT, Expand);
		}

setOperationAction(ISD::PARITY, MVT::i8, Custom);		setOperationAction(ISD::PARITY, MVT::i8, Custom);
setOperationAction(ISD::PARITY, MVT::i16, Custom);		setOperationAction(ISD::PARITY, MVT::i16, Custom);
setOperationAction(ISD::PARITY, MVT::i32, Custom);		setOperationAction(ISD::PARITY, MVT::i32, Custom);
if (Subtarget.is64Bit())		if (Subtarget.is64Bit())
setOperationAction(ISD::PARITY, MVT::i64, Custom);		setOperationAction(ISD::PARITY, MVT::i64, Custom);
if (Subtarget.hasPOPCNT()) {		if (Subtarget.hasPOPCNT()) {
setOperationPromotedToType(ISD::CTPOP, MVT::i8, MVT::i32);		setOperationPromotedToType(ISD::CTPOP, MVT::i8, MVT::i32);
▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	for (MVT InnerVT : MVT::fixedlen_vector_valuetypes()) {
// types, we have to deal with them whether we ask for Expansion or not.		// types, we have to deal with them whether we ask for Expansion or not.
// Setting Expand causes its own optimisation problems though, so leave		// Setting Expand causes its own optimisation problems though, so leave
// them legal.		// them legal.
if (VT.getVectorElementType() == MVT::i1)		if (VT.getVectorElementType() == MVT::i1)
setLoadExtAction(ISD::EXTLOAD, InnerVT, VT, Expand);		setLoadExtAction(ISD::EXTLOAD, InnerVT, VT, Expand);

// EXTLOAD for MVT::f16 vectors is not legal because f16 vectors are		// EXTLOAD for MVT::f16 vectors is not legal because f16 vectors are
// split/scalarized right now.		// split/scalarized right now.
if (VT.getVectorElementType() == MVT::f16)		if (VT.getVectorElementType() == MVT::f16 \|\|
		VT.getVectorElementType() == MVT::bf16)
setLoadExtAction(ISD::EXTLOAD, InnerVT, VT, Expand);		setLoadExtAction(ISD::EXTLOAD, InnerVT, VT, Expand);
}		}
}		}

// FIXME: In order to prevent SSE instructions being expanded to MMX ones		// FIXME: In order to prevent SSE instructions being expanded to MMX ones
// with -msoft-float, disable use of MMX as well.		// with -msoft-float, disable use of MMX as well.
if (!Subtarget.useSoftFloat() && Subtarget.hasMMX()) {		if (!Subtarget.useSoftFloat() && Subtarget.hasMMX()) {
addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);		addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);
▲ Show 20 Lines • Show All 55,173 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/bfloat.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-linux-gnu \| FileCheck %s

				define void @add(ptr %pa, ptr %pb, ptr %pc) {
				; CHECK-LABEL: add:
				; CHECK: # %bb.0:
				; CHECK-NEXT: pushq %rbx
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset %rbx, -16
				; CHECK-NEXT: movq %rdx, %rbx
				; CHECK-NEXT: movzwl (%rdi), %eax
				; CHECK-NEXT: shll $16, %eax
				; CHECK-NEXT: movd %eax, %xmm1
				; CHECK-NEXT: movzwl (%rsi), %eax
				; CHECK-NEXT: shll $16, %eax
				; CHECK-NEXT: movd %eax, %xmm0
				; CHECK-NEXT: addss %xmm1, %xmm0
				; CHECK-NEXT: callq __truncsfbf2@PLT
				; CHECK-NEXT: movw %ax, (%rbx)
				; CHECK-NEXT: popq %rbx
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq
				%a = load bfloat, ptr %pa
				%b = load bfloat, ptr %pb
				%add = fadd bfloat %a, %b
				store bfloat %add, ptr %pc
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Promote bf16 to f32 when the target doesn't support itClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437106

compiler-rt/lib/builtins/CMakeLists.txt

compiler-rt/lib/builtins/fp_trunc.h

compiler-rt/lib/builtins/truncsfbf2.c

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/RuntimeLibcalls.def

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/bfloat.ll

Promote bf16 to f32 when the target doesn't support it
ClosedPublic