This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Pass _Float16 as int or float
ClosedPublic

Authored by SjoerdMeijer on Jan 19 2018, 2:35 PM.

Download Raw Diff

Details

Reviewers

olista01
t.p.northover
rjmccall
aschwaighofer
samparker

Commits

rGca8f4e745148: [ARM] Pass _Float16 as int or float
rC323185: [ARM] Pass _Float16 as int or float
rL323185: [ARM] Pass _Float16 as int or float

Summary

Pass and return _Float16 as if it were an int or float for ARM, but with the
top 16 bits unspecified, similarly like we already do for __fp16.

We will implement proper half-precision function argument lowering in the ARM backend
soon, but want to use this workaround in the mean time.

Diff Detail

Repository: rL LLVM

Event Timeline

SjoerdMeijer created this revision.Jan 19 2018, 2:35 PM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptJan 19 2018, 2:35 PM

Hi Sjoerd,

Seems sensible to me to treat these two types the same way, though I must admit having different half types confuses me... So a few questions for my understanding:

What issue are you trying to workaround?
What would the ideal solution be?
Why do we need a workaround instead of implementing the ideal solution?

cheers!

test/CodeGen/arm-float16-arguments.c
4 ↗	(On Diff #130694)	Probably worth keeping these in arm-fp16-arguments.c since they're basically the same.

Thanks for reviewing!

We are trying to achieve correct AAPCS parameter passing:

"If the argument is a Half-precision Floating Point Type its size is set to 4 bytes as if it
had been copied to the least significant bits of a 32-bit register and the remaining bits filled
with unspecified values"

and for returning results:

"A Half-precision Floating Point Type is returned in the least significant 16 bits of r0."

Summarising: AAPCS compliance for passing/returning _Float16 values.

Ideal solution would be to lower this:

_Float16 sub(_Float16 a, _Float16 b) {
  return a + b;
}

to this:

define half @sub(half %a, half %b) local_unnamed_addr {
entry:
  %add = fadd half %a, %b
  ret half %add
}

but with this patch we are generating:

define float @sub(float %a.coerce, float %b.coerce) local_unnamed_addr #0 {
entry:
  %0 = bitcast float %a.coerce to i32
  %tmp.0.extract.trunc = trunc i32 %0 to i16
  %1 = bitcast i16 %tmp.0.extract.trunc to half
  <SNIP>
  %add = fadd half %1, %3
  <SNIP>
}

With this we achieve that we pass a float, and interpret only the lower 16 bits (and
similar approach for the return value that I've omitted here).

Thus, we are working around the problem of legalizing f16 arguments/return values;
we are now doing this in Clang and thus don't have to do anything at all in the backend.
This is a 2-lines change, and enables to make progress with the Armv8.2-A FP16 tablegen
descriptions and also to start testing/using them; adjusting the calling conventions in the backend are
a bit more involved. I will start working on this ideal solution now, and once that is in place,
we can properly pass the half types and remove this workaround.

Moved the tests to the existing file (and fixed a few typos in the tests).

Thanks for the explanation, LGTM, thanks!

This revision is now accepted and ready to land.Jan 23 2018, 1:36 AM

Closed by commit rL323185: [ARM] Pass _Float16 as int or float (authored by SjoerdMeijer). · Explain WhyJan 23 2018, 2:15 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 23 2018, 2:15 AM

SjoerdMeijer mentioned this in D38315: [ARM] Armv8.2-A FP16 code generation (part 1/3).Jan 26 2018, 1:20 AM

SjoerdMeijer mentioned this in rL323512: [ARM] Armv8.2-A FP16 code generation (part 1/3).Jan 26 2018, 1:28 AM

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

AST/

Type.h

7 lines

lib/

CodeGen/

TargetInfo.cpp

18 lines

test/

CodeGen/

arm-fp16-arguments.c

24 lines

Diff 131011

cfe/trunk/include/clang/AST/Type.h

Show First 20 Lines • Show All 1,722 Lines • ▼ Show 20 Lines	public:
/// Floating point categories.		/// Floating point categories.
bool isRealFloatingType() const; // C99 6.2.5p10 (float, double, long double)		bool isRealFloatingType() const; // C99 6.2.5p10 (float, double, long double)
/// isComplexType() does not include complex integers (a GCC extension).		/// isComplexType() does not include complex integers (a GCC extension).
/// isComplexIntegerType() can be used to test for complex integers.		/// isComplexIntegerType() can be used to test for complex integers.
bool isComplexType() const; // C99 6.2.5p11 (complex)		bool isComplexType() const; // C99 6.2.5p11 (complex)
bool isAnyComplexType() const; // C99 6.2.5p11 (complex) + Complex Int.		bool isAnyComplexType() const; // C99 6.2.5p11 (complex) + Complex Int.
bool isFloatingType() const; // C99 6.2.5p11 (real floating + complex)		bool isFloatingType() const; // C99 6.2.5p11 (real floating + complex)
bool isHalfType() const; // OpenCL 6.1.1.1, NEON (IEEE 754-2008 half)		bool isHalfType() const; // OpenCL 6.1.1.1, NEON (IEEE 754-2008 half)
		bool isFloat16Type() const; // C11 extension ISO/IEC TS 18661
bool isRealType() const; // C99 6.2.5p17 (real floating + integer)		bool isRealType() const; // C99 6.2.5p17 (real floating + integer)
bool isArithmeticType() const; // C99 6.2.5p18 (integer + floating)		bool isArithmeticType() const; // C99 6.2.5p18 (integer + floating)
bool isVoidType() const; // C99 6.2.5p19		bool isVoidType() const; // C99 6.2.5p19
bool isScalarType() const; // C99 6.2.5p21 (arithmetic + pointers)		bool isScalarType() const; // C99 6.2.5p21 (arithmetic + pointers)
bool isAggregateType() const;		bool isAggregateType() const;
bool isFundamentalType() const;		bool isFundamentalType() const;
bool isCompoundType() const;		bool isCompoundType() const;

▲ Show 20 Lines • Show All 4,435 Lines • ▼ Show 20 Lines

inline bool Type::isHalfType() const {		inline bool Type::isHalfType() const {
if (const BuiltinType *BT = dyn_cast<BuiltinType>(CanonicalType))		if (const BuiltinType *BT = dyn_cast<BuiltinType>(CanonicalType))
return BT->getKind() == BuiltinType::Half;		return BT->getKind() == BuiltinType::Half;
// FIXME: Should we allow complex __fp16? Probably not.		// FIXME: Should we allow complex __fp16? Probably not.
return false;		return false;
}		}

		inline bool Type::isFloat16Type() const {
		if (const BuiltinType *BT = dyn_cast<BuiltinType>(CanonicalType))
		return BT->getKind() == BuiltinType::Float16;
		return false;
		}

inline bool Type::isNullPtrType() const {		inline bool Type::isNullPtrType() const {
if (const BuiltinType *BT = getAs<BuiltinType>())		if (const BuiltinType *BT = getAs<BuiltinType>())
return BT->getKind() == BuiltinType::NullPtr;		return BT->getKind() == BuiltinType::NullPtr;
return false;		return false;
}		}

bool IsEnumDeclComplete(EnumDecl *);		bool IsEnumDeclComplete(EnumDecl *);
bool IsEnumDeclScoped(EnumDecl *);		bool IsEnumDeclScoped(EnumDecl *);
▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,715 Lines • ▼ Show 20 Lines	if (isIllegalVectorType(Ty)) {
if (Size == 128) {		if (Size == 128) {
llvm::Type *ResType = llvm::VectorType::get(		llvm::Type *ResType = llvm::VectorType::get(
llvm::Type::getInt32Ty(getVMContext()), 4);		llvm::Type::getInt32Ty(getVMContext()), 4);
return ABIArgInfo::getDirect(ResType);		return ABIArgInfo::getDirect(ResType);
}		}
return getNaturalAlignIndirect(Ty, /ByVal=/false);		return getNaturalAlignIndirect(Ty, /ByVal=/false);
}		}

// __fp16 gets passed as if it were an int or float, but with the top 16 bits		// _Float16 and __fp16 get passed as if it were an int or float, but with
// unspecified. This is not done for OpenCL as it handles the half type		// the top 16 bits unspecified. This is not done for OpenCL as it handles the
// natively, and does not need to interwork with AAPCS code.		// half type natively, and does not need to interwork with AAPCS code.
if (Ty->isHalfType() && !getContext().getLangOpts().NativeHalfArgsAndReturns) {		if ((Ty->isFloat16Type() \|\| Ty->isHalfType()) &&
		!getContext().getLangOpts().NativeHalfArgsAndReturns) {
llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?		llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?
llvm::Type::getFloatTy(getVMContext()) :		llvm::Type::getFloatTy(getVMContext()) :
llvm::Type::getInt32Ty(getVMContext());		llvm::Type::getInt32Ty(getVMContext());
return ABIArgInfo::getDirect(ResType);		return ABIArgInfo::getDirect(ResType);
}		}

if (!isAggregateTypeForABI(Ty)) {		if (!isAggregateTypeForABI(Ty)) {
// Treat an enum type as its underlying type.		// Treat an enum type as its underlying type.
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	ABIArgInfo ARMABIInfo::classifyReturnType(QualType RetTy,
if (RetTy->isVoidType())		if (RetTy->isVoidType())
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

// Large vector types should be returned via memory.		// Large vector types should be returned via memory.
if (RetTy->isVectorType() && getContext().getTypeSize(RetTy) > 128) {		if (RetTy->isVectorType() && getContext().getTypeSize(RetTy) > 128) {
return getNaturalAlignIndirect(RetTy);		return getNaturalAlignIndirect(RetTy);
}		}

// __fp16 gets returned as if it were an int or float, but with the top 16		// _Float16 and __fp16 get returned as if it were an int or float, but with
// bits unspecified. This is not done for OpenCL as it handles the half type		// the top 16 bits unspecified. This is not done for OpenCL as it handles the
// natively, and does not need to interwork with AAPCS code.		// half type natively, and does not need to interwork with AAPCS code.
if (RetTy->isHalfType() && !getContext().getLangOpts().NativeHalfArgsAndReturns) {		if ((RetTy->isFloat16Type() \|\| RetTy->isHalfType()) &&
		!getContext().getLangOpts().NativeHalfArgsAndReturns) {
llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?		llvm::Type *ResType = IsEffectivelyAAPCS_VFP ?
llvm::Type::getFloatTy(getVMContext()) :		llvm::Type::getFloatTy(getVMContext()) :
llvm::Type::getInt32Ty(getVMContext());		llvm::Type::getInt32Ty(getVMContext());
return ABIArgInfo::getDirect(ResType);		return ABIArgInfo::getDirect(ResType);
}		}

if (!isAggregateTypeForABI(RetTy)) {		if (!isAggregateTypeForABI(RetTy)) {
// Treat an enum type as its underlying type.		// Treat an enum type as its underlying type.
▲ Show 20 Lines • Show All 3,301 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/arm-fp16-arguments.c

	Show All 19 Lines
	// NATIVE: define half @t2()			// NATIVE: define half @t2()
	// CHECK: [[LOAD:%.]] = load i16, i16 bitcast (half* @g to i16*)			// CHECK: [[LOAD:%.]] = load i16, i16 bitcast (half* @g to i16*)
	// CHECK: [[ZEXT:%.*]] = zext i16 [[LOAD]] to i32			// CHECK: [[ZEXT:%.*]] = zext i16 [[LOAD]] to i32
	// SOFT: ret i32 [[ZEXT]]			// SOFT: ret i32 [[ZEXT]]
	// HARD: [[BITCAST:%.*]] = bitcast i32 [[ZEXT]] to float			// HARD: [[BITCAST:%.*]] = bitcast i32 [[ZEXT]] to float
	// HARD: ret float [[BITCAST]]			// HARD: ret float [[BITCAST]]
	// NATIVE: [[LOAD:%.]] = load half, half @g			// NATIVE: [[LOAD:%.]] = load half, half @g
	// NATIVE: ret half [[LOAD]]			// NATIVE: ret half [[LOAD]]

				_Float16 h;

				void t3(_Float16 a) { h = a; }
				// SOFT: define void @t3(i32 [[PARAM:%.*]])
				// SOFT: [[TRUNC:%.*]] = trunc i32 [[PARAM]] to i16
				// HARD: define arm_aapcs_vfpcc void @t3(float [[PARAM:%.*]])
				// HARD: [[BITCAST:%.*]] = bitcast float [[PARAM]] to i32
				// HARD: [[TRUNC:%.*]] = trunc i32 [[BITCAST]] to i16
				// CHECK: store i16 [[TRUNC]], i16* bitcast (half* @h to i16*)
				// NATIVE: define void @t3(half [[PARAM:%.*]])
				// NATIVE: store half [[PARAM]], half* @h

				_Float16 t4() { return h; }
				// SOFT: define i32 @t4()
				// HARD: define arm_aapcs_vfpcc float @t4()
				// NATIVE: define half @t4()
				// CHECK: [[LOAD:%.]] = load i16, i16 bitcast (half* @h to i16*)
				// CHECK: [[ZEXT:%.*]] = zext i16 [[LOAD]] to i32
				// SOFT: ret i32 [[ZEXT]]
				// HARD: [[BITCAST:%.*]] = bitcast i32 [[ZEXT]] to float
				// HARD: ret float [[BITCAST]]
				// NATIVE: [[LOAD:%.]] = load half, half @h
				// NATIVE: ret half [[LOAD]]