This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Convert double -> __fp16 in one step.
ClosedPublic

Authored by ab on Jul 21 2014, 4:36 AM.

Download Raw Diff

Details

Reviewers

ab
olista01

Commits

rG47ec2c7479de: [CodeGen] Convert double -> __fp16 in one step.
rC232968: [CodeGen] Convert double -> __fp16 in one step.

Summary

Hi,

The attached patch changes Clang so that for types bigger than float, instead of converting to fp16 via the sequence "InTy -> float -> fp16", we perform conversions in just one step. This avoids the double rounding which potentially changes results from a natural IEEE-754 operation.

There are potential problems, but I believe the benefits outweigh them:

It's a change in semantics. I believe it's compatible with the major standards though (OpenCL requires accesses go via a builtin; n1833 for C would *demand* this change, from my reading of it).
It means double -> __fp16 conversion will fail on x86 and v7 ARM CPUs for now. Specifically, we will generate a libcall which isn't actually widespread (or probably implemented anywhere). I think this is preferable to the status-quo of producing a possibly incorrect result though.

Longer term, I'd like to improve the codegen here to use real fpext/fptrunc operations and remove many of the special cases for half. Unfortunately the LLVM CodeGen isn't up to this change yet, so I've just extended the use of the @llvm.convert.to.fp16 intrinsic.

So, is it OK to change this?

Cheers.

Tim.

Diff Detail

Event Timeline

t.p.northover updated this revision to Diff 11706.Jul 21 2014, 4:36 AM

t.p.northover retitled this revision from to CodeGen: convert double -> __fp16 in one step.

t.p.northover updated this object.

t.p.northover edited the test plan for this revision. (Show Details)

t.p.northover added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptJul 21 2014, 4:36 AM

Ping?

Cheers.

Tim.

ab commandeered this revision.Feb 11 2015, 4:43 PM

ab added a reviewer: t.p.northover.

Hi all,

Ping for Tim's patch!

I rebased it and added the one-step __fp16->double conversion, which, while not necessary for correctness, is nice to have (if only for symmetry.)
We're still missing the libcalls, so this will break the conversions from/to double. However, I agree with Tim in that it's better than incorrect results.

-Ahmed

Ping!

Hi Ahmed,

The patch looks good to me. However, I don't think I am knowledgeable enough to give the final approval. So, somebody else should review this patch.

My personal opinion (for what is worth..) is that this is the correct approach. On X86, I am not particularly worried about the extra runtime libcalls which could be generated for double-to-half conversions. Also, (as you already know) the target independent legalizer now knows how to split a double-to-half conversion into the sequence 'double-to-float' plus 'float-to-half' (only under fast-math).

24 February 2015 at 10:47, Ahmed Bougacha <ahmed.bougacha@gmail.com> wrote:

Ping!

I still think this is the right approach too. A 2-step conversion is
pretty much impossible to justify logically; users who really still
want it can force that behaviour by casting to float first.

Cheers.

Tim.

Ping again!

(Andrea & Tim, thanks for the comments)

-Ahmed

Monday ping!

Ping!

Tentatively adding Oliver as a reviewer =)

-Ahmed

ab mentioned this in D8367: [CodeGen] Properly support the half FP type with non-native operations (-fallow-half-args-and-returns).Mar 16 2015, 3:33 PM

Reviewed as part of http://reviews.llvm.org/D8367

This revision is now accepted and ready to land.Mar 23 2015, 10:58 AM

.. and committed in r232968.

-Ahmed

Revision Contents

Path

Size


	i/

lib/

CodeGen/

CGExprScalar.cpp

27 lines

test/

CodeGen/

fp16-ops.c

26 lines

Diff 19797

lib/CodeGen/CGExprScalar.cpp

Show First 20 Lines • Show All 739 Lines • ▼ Show 20 Lines	Value ScalarExprEmitter::EmitScalarConversion(Value Src, QualType SrcType,
if (SrcType == DstType) return Src;		if (SrcType == DstType) return Src;

if (DstType->isVoidType()) return nullptr;		if (DstType->isVoidType()) return nullptr;

llvm::Value *OrigSrc = Src;		llvm::Value *OrigSrc = Src;
QualType OrigSrcType = SrcType;		QualType OrigSrcType = SrcType;
llvm::Type *SrcTy = Src->getType();		llvm::Type *SrcTy = Src->getType();

// If casting to/from storage-only half FP, use special intrinsics.		// Handle conversions to bool first, they are special: comparisons against 0.
		if (DstType->isBooleanType())
		return EmitConversionToBool(Src, SrcType);

		llvm::Type *DstTy = ConvertType(DstType);

		// Cast from storage-only half FP using the special intrinsic.
if (SrcType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&		if (SrcType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&
!CGF.getContext().getLangOpts().HalfArgsAndReturns) {		!CGF.getContext().getLangOpts().HalfArgsAndReturns) {
		if (DstTy->isFloatingPointTy())
		return Builder.CreateCall(
		CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16, DstTy), Src);

		// If this isn't an FP->FP conversion, go through float.
Src = Builder.CreateCall(		Src = Builder.CreateCall(
CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16,		CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16,
CGF.CGM.FloatTy),		CGF.CGM.FloatTy),
Src);		Src);
SrcType = CGF.getContext().FloatTy;		SrcType = CGF.getContext().FloatTy;
SrcTy = CGF.FloatTy;		SrcTy = CGF.FloatTy;
}		}

// Handle conversions to bool first, they are special: comparisons against 0.
if (DstType->isBooleanType())
return EmitConversionToBool(Src, SrcType);

llvm::Type *DstTy = ConvertType(DstType);

// Ignore conversions like int -> uint.		// Ignore conversions like int -> uint.
if (SrcTy == DstTy)		if (SrcTy == DstTy)
return Src;		return Src;

// Handle pointer conversions next: pointers can only be converted to/from		// Handle pointer conversions next: pointers can only be converted to/from
// other pointers and integers. Check for pointer types in terms of LLVM, as		// other pointers and integers. Check for pointer types in terms of LLVM, as
// some native types (like Obj-C id) may map to a pointer type.		// some native types (like Obj-C id) may map to a pointer type.
if (isa<llvm::PointerType>(DstTy)) {		if (isa<llvm::PointerType>(DstTy)) {
Show All 40 Lines	Value ScalarExprEmitter::EmitScalarConversion(Value Src, QualType SrcType,

// An overflowing conversion has undefined behavior if either the source type		// An overflowing conversion has undefined behavior if either the source type
// or the destination type is a floating-point type.		// or the destination type is a floating-point type.
if (CGF.SanOpts.has(SanitizerKind::FloatCastOverflow) &&		if (CGF.SanOpts.has(SanitizerKind::FloatCastOverflow) &&
(OrigSrcType->isFloatingType() \|\| DstType->isFloatingType()))		(OrigSrcType->isFloatingType() \|\| DstType->isFloatingType()))
EmitFloatConversionCheck(OrigSrc, OrigSrcType, Src, SrcType, DstType,		EmitFloatConversionCheck(OrigSrc, OrigSrcType, Src, SrcType, DstType,
DstTy);		DstTy);

// Cast to half via float		// Cast to half using the intrinsic if from FP type, through float otherwise.
if (DstType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&		if (DstType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&
!CGF.getContext().getLangOpts().HalfArgsAndReturns)		!CGF.getContext().getLangOpts().HalfArgsAndReturns) {
		if (SrcTy->isFloatingPointTy())
		return Builder.CreateCall(
		CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_to_fp16, SrcTy), Src);
DstTy = CGF.FloatTy;		DstTy = CGF.FloatTy;
		}

if (isa<llvm::IntegerType>(SrcTy)) {		if (isa<llvm::IntegerType>(SrcTy)) {
bool InputSigned = SrcType->isSignedIntegerOrEnumerationType();		bool InputSigned = SrcType->isSignedIntegerOrEnumerationType();
if (isa<llvm::IntegerType>(DstTy))		if (isa<llvm::IntegerType>(DstTy))
Res = Builder.CreateIntCast(Src, DstTy, InputSigned, "conv");		Res = Builder.CreateIntCast(Src, DstTy, InputSigned, "conv");
else if (InputSigned)		else if (InputSigned)
Res = Builder.CreateSIToFP(Src, DstTy, "conv");		Res = Builder.CreateSIToFP(Src, DstTy, "conv");
else		else
▲ Show 20 Lines • Show All 2,671 Lines • Show Last 20 Lines

test/CodeGen/fp16-ops.c

// REQUIRES: arm-registered-target		// REQUIRES: arm-registered-target
// RUN: %clang_cc1 -emit-llvm -o - -triple arm-none-linux-gnueabi %s \| FileCheck %s		// RUN: %clang_cc1 -emit-llvm -o - -triple arm-none-linux-gnueabi %s \| FileCheck %s
typedef unsigned cond_t;		typedef unsigned cond_t;

volatile cond_t test;		volatile cond_t test;
volatile __fp16 h0 = 0.0, h1 = 1.0, h2;		volatile __fp16 h0 = 0.0, h1 = 1.0, h2;
volatile float f0, f1, f2;		volatile float f0, f1, f2;
		volatile double d0;

void foo(void) {		void foo(void) {
// CHECK-LABEL: define void @foo()		// CHECK-LABEL: define void @foo()

// Check unary ops		// Check unary ops

// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK fptoi float		// CHECK fptoi float
Show All 31 Lines	void foo(void) {
// CHECK: fmul float		// CHECK: fmul float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = h0 * h2;		h1 = h0 * h2;
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fmul float		// CHECK: fmul float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = h0 * (__fp16) -2.0;		h1 = h0 * (__fp16) -2.0f;
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fmul float		// CHECK: fmul float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = h0 * f2;		h1 = h0 * f2;
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fmul float		// CHECK: fmul float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = f0 * h2;		h1 = f0 * h2;

// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fdiv float		// CHECK: fdiv float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (h0 / h2);		h1 = (h0 / h2);
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fdiv float		// CHECK: fdiv float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (h0 / (__fp16) -2.0);		h1 = (h0 / (__fp16) -2.0f);
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fdiv float		// CHECK: fdiv float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (h0 / f2);		h1 = (h0 / f2);
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fdiv float		// CHECK: fdiv float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (f0 / h2);		h1 = (f0 / h2);
Show All 21 Lines	void foo(void) {
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fsub float		// CHECK: fsub float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (h2 - h0);		h1 = (h2 - h0);
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fsub float		// CHECK: fsub float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = ((__fp16)-2.0 - h0);		h1 = ((__fp16)-2.0f - h0);
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fsub float		// CHECK: fsub float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (h2 - f0);		h1 = (h2 - f0);
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fsub float		// CHECK: fsub float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (f2 - h0);		h1 = (f2 - h0);
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	void foo(void) {
// CHECK: fcmp une		// CHECK: fcmp une
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h1 = (h1 ? h2 : h0);		h1 = (h1 ? h2 : h0);
// Check assignments (inc. compound)		// Check assignments (inc. compound)
h0 = h1;		h0 = h1;
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h0 = (__fp16)-2.0;		h0 = (__fp16)-2.0f;
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h0 = f0;		h0 = f0;

// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fadd float		// CHECK: fadd float
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h0 += h1;		h0 += h1;
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fadd		// CHECK: fadd
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h0 += (__fp16)1.0;		h0 += (__fp16)1.0f;
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fadd		// CHECK: fadd
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h0 += f2;		h0 += f2;

// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fsub		// CHECK: fsub
Show All 33 Lines	void foo(void) {
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fdiv		// CHECK: fdiv
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h0 /= (__fp16)1.0;		h0 /= (__fp16)1.0;
// CHECK: call float @llvm.convert.from.fp16.f32(		// CHECK: call float @llvm.convert.from.fp16.f32(
// CHECK: fdiv		// CHECK: fdiv
// CHECK: call i16 @llvm.convert.to.fp16.f32(		// CHECK: call i16 @llvm.convert.to.fp16.f32(
h0 /= f2;		h0 /= f2;

		// Check conversions to/from double
		// CHECK: call i16 @llvm.convert.to.fp16.f64(
		h0 = d0;

		// CHECK: [[MID:%.]] = fptrunc double {{%.}} to float
		// CHECK: call i16 @llvm.convert.to.fp16.f32(float [[MID]])
		h0 = (float)d0;

		// CHECK: call double @llvm.convert.from.fp16.f64(
		d0 = h0;

		// CHECK: [[MID:%.*]] = call float @llvm.convert.from.fp16.f32(
		// CHECK: fpext float [[MID]] to double
		d0 = (float)h0;
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Convert double -> __fp16 in one step.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 19797

lib/CodeGen/CGExprScalar.cpp

test/CodeGen/fp16-ops.c

[CodeGen] Convert double -> __fp16 in one step.
ClosedPublic