This is an archive of the discontinued LLVM Phabricator instance.

Differential D4456

Allow __fp16 as a function arg or return type for AArch64
ClosedPublic

Authored by olista01 on Jul 10 2014, 2:54 AM.

Download Raw Diff

Details

Reviewers

olista01

Summary

ACLE 2.0 [1] allows __fp16 to be used as a function argument or return type. This enables this for AArch64.

I have not enabled this for 32-bit ARM targets yet, as we are expecting to release an updated version of the 32-bit AAPCS soon, which changes the handling of __fp16 in a non backwards-compatible way.

This also fixes an existing bug that causes clang to not allow homogeneous floating-point aggregates with a base type of __fp16. This is valid for AAPCS64, but not for AAPCS-VFP.

[1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

Diff Detail

Event Timeline

olista01 updated this revision to Diff 11255.Jul 10 2014, 2:54 AM

olista01 retitled this revision from to Allow __fp16 as a function arg or return type for AArch64.

olista01 updated this object.

olista01 edited the test plan for this revision. (Show Details)

olista01 added a subscriber: Unknown Object (MLST).

Herald added subscribers: mroth, mcrosier, aemerson. · View Herald TranscriptJul 10 2014, 2:54 AM

olista01 added a parent revision: D4455: Allow __fp16 as a function arg or return type for AArch64.Jul 10 2014, 2:54 AM

Hi Oliver,

I've done some more thinking about this, and posted an RFC to llvmdev and cfe-dev with the direction I'd like to pursue here longer term.

Regardless of the outcome of that more general proposal, I don't think the AArch64 backend is ready to cope with the code this patch produces yet. Allowing "half" as a function argument immediately introduces more possible operations to the IR we have to deal with: bitcast and load/store for one.

For example, all three of these functions exhibit poor behaviour at the moment:

__fp16 varFloat;

short foo(__fp16 in) {
  // __fp16 bitcast lowered to load/store
  return *(short *)&in;
}

__fp16 bar(short in) {
  // __fp16 bitcast lowered to load/store
  return *(__fp16 *)&in;
}

float baz() {
  // extload would be created in DAG and crash ISel. Crashes clang instead.
  return varFloat;
}

The last one actually crashes in Clang itself, which is worrying, since I was expecting an LLVM backend crash and hadn't spotted anything obviously wrong with this patch. I've not investigated exactly what's wrong there though.

The bitcasts are particularly nasty because, as far as I can see, there *is* no generic way to express a "half <-> i16" bitcast when the latter type is illegal (as it is for us). The best I've come up with this afternoon is EXTRACT_SUBREG or SUBREG_TO_REG, but creating those during ISelLowering is really ugly; if you have any suggestions...

Cheers.

Tim.

I've also just noticed this highly disturbing example:

#include <arm_neon.h>
#if WTF
short varInt;
__fp16 varFloat;

short foo(__fp16 in) {
  return *(short *)&in;
}

__fp16 bar(short in) {
  return *(__fp16 *)&in;
}

float baz() {
  //  return varFloat;
}
#endif


int16x4_t tim(float16x4_t a) {
  return vreinterpret_s16_f16(a);
  //  return vreinterpret_s16_f16(a);
}

The parameter type for "tim" changes for me depending on whether WTF is defined (and if it's "<4 x half>", AArch64 copes badly).

Cheers.

Tim.

The changing parameter type in your second example was due to clang caching the mapping between clang::Type and llvm::Type. This also hid a bug where HFAs would still be emitted using i16 unless there was a bare __fp16 parameter earlier in the file.

Hi Oliver,

I don't think the right way to handle this is by mapping __fp16 to
different types depending on the context. It's a hack in the front-end
to work around some not particularly difficult backend problems.

If we need __fp16 to map to half, we should make the extra effort so
that it can be a blanket change. It's the right direction to go in
anyway.

Cheers.

Tim.

This patch now makes half a valid IR type that can be emitted for AArch64. It can be passed to and returned from functions by value, and we use the normal IR instructions to convert between half and float, but we retain the C semantics of always promoting to float before performing any arithmetic operations on them.

olista01 added a parent revision: D4879: Teach the AArch64 backend about half-precision floating point.Aug 13 2014, 2:22 AM

This was accepted on the list and committed > 1 month ago, but this did not make it into phab.

This revision is now accepted and ready to land.Sep 29 2014, 9:56 AM

olista01 closed this revision.Sep 29 2014, 9:56 AM

Revision Contents

Path

Size

include/

clang/

Basic/

LangOptions.def

1 line

Driver/

CC1Options.td

2 lines

lib/

CodeGen/

3 lines

12 lines

7 lines

46 lines

Driver/

Tools.cpp

4 lines

Frontend/

CompilerInvocation.cpp

2 lines

Sema/

SemaType.cpp

8 lines

test/

CodeGen/

arm64-aapcs-arguments.c

11 lines

Diff 12436

include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	LANGOPT(ShortWChar , 1, 0, "unsigned short wchar_t")			LANGOPT(ShortWChar , 1, 0, "unsigned short wchar_t")
	ENUM_LANGOPT(MSPointerToMemberRepresentationMethod, PragmaMSPointersToMembersKind, 2, PPTMK_BestCase, "member-pointer representation method")			ENUM_LANGOPT(MSPointerToMemberRepresentationMethod, PragmaMSPointersToMembersKind, 2, PPTMK_BestCase, "member-pointer representation method")

	LANGOPT(ShortEnums , 1, 0, "short enum types")			LANGOPT(ShortEnums , 1, 0, "short enum types")

	LANGOPT(OpenCL , 1, 0, "OpenCL")			LANGOPT(OpenCL , 1, 0, "OpenCL")
	LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")			LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")
	LANGOPT(NativeHalfType , 1, 0, "Native half type support")			LANGOPT(NativeHalfType , 1, 0, "Native half type support")
				LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")
	LANGOPT(CUDA , 1, 0, "CUDA")			LANGOPT(CUDA , 1, 0, "CUDA")
	LANGOPT(OpenMP , 1, 0, "OpenMP support")			LANGOPT(OpenMP , 1, 0, "OpenMP support")

	LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")			LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")
	LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")			LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")
	BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")			BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")
	BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")			BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")
	BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")			BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

include/clang/Driver/CC1Options.td

	Show First 20 Lines • Show All 501 Lines • ▼ Show 20 Lines
	def fsized_deallocation : Flag<["-"], "fsized-deallocation">,			def fsized_deallocation : Flag<["-"], "fsized-deallocation">,
	HelpText<"Enable C++1y sized global deallocation functions">;			HelpText<"Enable C++1y sized global deallocation functions">;
	def fobjc_subscripting_legacy_runtime : Flag<["-"], "fobjc-subscripting-legacy-runtime">,			def fobjc_subscripting_legacy_runtime : Flag<["-"], "fobjc-subscripting-legacy-runtime">,
	HelpText<"Allow Objective-C array and dictionary subscripting in legacy runtime">;			HelpText<"Allow Objective-C array and dictionary subscripting in legacy runtime">;
	def vtordisp_mode_EQ : Joined<["-"], "vtordisp-mode=">,			def vtordisp_mode_EQ : Joined<["-"], "vtordisp-mode=">,
	HelpText<"Control vtordisp placement on win32 targets">;			HelpText<"Control vtordisp placement on win32 targets">;
	def fno_rtti_data : Flag<["-"], "fno-rtti-data">,			def fno_rtti_data : Flag<["-"], "fno-rtti-data">,
	HelpText<"Control emission of RTTI data">;			HelpText<"Control emission of RTTI data">;
				def fallow_half_arguments_and_returns : Flag<["-"], "fallow-half-arguments-and-returns">,
				HelpText<"Allow function arguments and returns of type half">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Header Search Options			// Header Search Options
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def nostdsysteminc : Flag<["-"], "nostdsysteminc">,			def nostdsysteminc : Flag<["-"], "nostdsysteminc">,
	HelpText<"Disable standard system #include directories">;			HelpText<"Disable standard system #include directories">;
	def fdisable_module_hash : Flag<["-"], "fdisable-module-hash">,			def fdisable_module_hash : Flag<["-"], "fdisable-module-hash">,
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

lib/CodeGen/CGExprConstant.cpp

Show First 20 Lines • Show All 1,126 Lines • ▼ Show 20 Lines	case APValue::ComplexInt: {
llvm::StructType *STy = llvm::StructType::get(Complex[0]->getType(),		llvm::StructType *STy = llvm::StructType::get(Complex[0]->getType(),
Complex[1]->getType(),		Complex[1]->getType(),
NULL);		NULL);
return llvm::ConstantStruct::get(STy, Complex);		return llvm::ConstantStruct::get(STy, Complex);
}		}
case APValue::Float: {		case APValue::Float: {
const llvm::APFloat &Init = Value.getFloat();		const llvm::APFloat &Init = Value.getFloat();
if (&Init.getSemantics() == &llvm::APFloat::IEEEhalf &&		if (&Init.getSemantics() == &llvm::APFloat::IEEEhalf &&
!Context.getLangOpts().NativeHalfType)		!Context.getLangOpts().NativeHalfType &&
		!Context.getLangOpts().HalfArgsAndReturns)
return llvm::ConstantInt::get(VMContext, Init.bitcastToAPInt());		return llvm::ConstantInt::get(VMContext, Init.bitcastToAPInt());
else		else
return llvm::ConstantFP::get(VMContext, Init);		return llvm::ConstantFP::get(VMContext, Init);
}		}
case APValue::ComplexFloat: {		case APValue::ComplexFloat: {
llvm::Constant *Complex[2];		llvm::Constant *Complex[2];

Complex[0] = llvm::ConstantFP::get(VMContext,		Complex[0] = llvm::ConstantFP::get(VMContext,
▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines

lib/CodeGen/CGExprScalar.cpp

Show First 20 Lines • Show All 695 Lines • ▼ Show 20 Lines	Value ScalarExprEmitter::EmitScalarConversion(Value Src, QualType SrcType,

if (DstType->isVoidType()) return nullptr;		if (DstType->isVoidType()) return nullptr;

llvm::Value *OrigSrc = Src;		llvm::Value *OrigSrc = Src;
QualType OrigSrcType = SrcType;		QualType OrigSrcType = SrcType;
llvm::Type *SrcTy = Src->getType();		llvm::Type *SrcTy = Src->getType();

// If casting to/from storage-only half FP, use special intrinsics.		// If casting to/from storage-only half FP, use special intrinsics.
if (SrcType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType) {		if (SrcType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&
		!CGF.getContext().getLangOpts().HalfArgsAndReturns) {
Src = Builder.CreateCall(		Src = Builder.CreateCall(
CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16,		CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16,
CGF.CGM.FloatTy),		CGF.CGM.FloatTy),
Src);		Src);
SrcType = CGF.getContext().FloatTy;		SrcType = CGF.getContext().FloatTy;
SrcTy = CGF.FloatTy;		SrcTy = CGF.FloatTy;
}		}

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	Value ScalarExprEmitter::EmitScalarConversion(Value Src, QualType SrcType,
// An overflowing conversion has undefined behavior if either the source type		// An overflowing conversion has undefined behavior if either the source type
// or the destination type is a floating-point type.		// or the destination type is a floating-point type.
if (CGF.SanOpts->FloatCastOverflow &&		if (CGF.SanOpts->FloatCastOverflow &&
(OrigSrcType->isFloatingType() \|\| DstType->isFloatingType()))		(OrigSrcType->isFloatingType() \|\| DstType->isFloatingType()))
EmitFloatConversionCheck(OrigSrc, OrigSrcType, Src, SrcType, DstType,		EmitFloatConversionCheck(OrigSrc, OrigSrcType, Src, SrcType, DstType,
DstTy);		DstTy);

// Cast to half via float		// Cast to half via float
if (DstType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType)		if (DstType->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&
		!CGF.getContext().getLangOpts().HalfArgsAndReturns)
DstTy = CGF.FloatTy;		DstTy = CGF.FloatTy;

if (isa<llvm::IntegerType>(SrcTy)) {		if (isa<llvm::IntegerType>(SrcTy)) {
bool InputSigned = SrcType->isSignedIntegerOrEnumerationType();		bool InputSigned = SrcType->isSignedIntegerOrEnumerationType();
if (isa<llvm::IntegerType>(DstTy))		if (isa<llvm::IntegerType>(DstTy))
Res = Builder.CreateIntCast(Src, DstTy, InputSigned, "conv");		Res = Builder.CreateIntCast(Src, DstTy, InputSigned, "conv");
else if (InputSigned)		else if (InputSigned)
Res = Builder.CreateSIToFP(Src, DstTy, "conv");		Res = Builder.CreateSIToFP(Src, DstTy, "conv");
▲ Show 20 Lines • Show All 901 Lines • ▼ Show 20 Lines	if (type->hasIntegerRepresentation()) {
isInc ? "inc" : "dec");		isInc ? "inc" : "dec");
}		}

// Floating point.		// Floating point.
} else if (type->isRealFloatingType()) {		} else if (type->isRealFloatingType()) {
// Add the inc/dec to the real part.		// Add the inc/dec to the real part.
llvm::Value *amt;		llvm::Value *amt;

if (type->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType) {		if (type->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&
		!CGF.getContext().getLangOpts().HalfArgsAndReturns) {
// Another special case: half FP increment should be done via float		// Another special case: half FP increment should be done via float
value = Builder.CreateCall(		value = Builder.CreateCall(
CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16,		CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16,
CGF.CGM.FloatTy),		CGF.CGM.FloatTy),
input);		input);
}		}

if (value->getType()->isFloatTy())		if (value->getType()->isFloatTy())
amt = llvm::ConstantFP::get(VMContext,		amt = llvm::ConstantFP::get(VMContext,
llvm::APFloat(static_cast<float>(amount)));		llvm::APFloat(static_cast<float>(amount)));
else if (value->getType()->isDoubleTy())		else if (value->getType()->isDoubleTy())
amt = llvm::ConstantFP::get(VMContext,		amt = llvm::ConstantFP::get(VMContext,
llvm::APFloat(static_cast<double>(amount)));		llvm::APFloat(static_cast<double>(amount)));
else {		else {
llvm::APFloat F(static_cast<float>(amount));		llvm::APFloat F(static_cast<float>(amount));
bool ignored;		bool ignored;
F.convert(CGF.getTarget().getLongDoubleFormat(),		F.convert(CGF.getTarget().getLongDoubleFormat(),
llvm::APFloat::rmTowardZero, &ignored);		llvm::APFloat::rmTowardZero, &ignored);
amt = llvm::ConstantFP::get(VMContext, F);		amt = llvm::ConstantFP::get(VMContext, F);
}		}
value = Builder.CreateFAdd(value, amt, isInc ? "inc" : "dec");		value = Builder.CreateFAdd(value, amt, isInc ? "inc" : "dec");

if (type->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType)		if (type->isHalfType() && !CGF.getContext().getLangOpts().NativeHalfType &&
		!CGF.getContext().getLangOpts().HalfArgsAndReturns)
value = Builder.CreateCall(		value = Builder.CreateCall(
CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_to_fp16,		CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_to_fp16,
CGF.CGM.FloatTy),		CGF.CGM.FloatTy),
value);		value);

// Objective-C pointer types.		// Objective-C pointer types.
} else {		} else {
const ObjCObjectPointerType *OPT = type->castAs<ObjCObjectPointerType>();		const ObjCObjectPointerType *OPT = type->castAs<ObjCObjectPointerType>();
▲ Show 20 Lines • Show All 1,687 Lines • Show Last 20 Lines

lib/CodeGen/CodeGenTypes.cpp

Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	case Type::Builtin: {
case BuiltinType::Char16:		case BuiltinType::Char16:
case BuiltinType::Char32:		case BuiltinType::Char32:
ResultType = llvm::IntegerType::get(getLLVMContext(),		ResultType = llvm::IntegerType::get(getLLVMContext(),
static_cast<unsigned>(Context.getTypeSize(T)));		static_cast<unsigned>(Context.getTypeSize(T)));
break;		break;

case BuiltinType::Half:		case BuiltinType::Half:
// Half FP can either be storage-only (lowered to i16) or native.		// Half FP can either be storage-only (lowered to i16) or native.
ResultType = getTypeForFormat(getLLVMContext(),		ResultType =
Context.getFloatTypeSemantics(T),		getTypeForFormat(getLLVMContext(), Context.getFloatTypeSemantics(T),
Context.getLangOpts().NativeHalfType);		Context.getLangOpts().NativeHalfType \|\|
		Context.getLangOpts().HalfArgsAndReturns);
break;		break;
case BuiltinType::Float:		case BuiltinType::Float:
case BuiltinType::Double:		case BuiltinType::Double:
case BuiltinType::LongDouble:		case BuiltinType::LongDouble:
ResultType = getTypeForFormat(getLLVMContext(),		ResultType = getTypeForFormat(getLLVMContext(),
Context.getFloatTypeSemantics(T),		Context.getFloatTypeSemantics(T),
/* UseNativeHalf = */ false);		/* UseNativeHalf = */ false);
break;		break;
▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

lib/CodeGen/TargetInfo.cpp

Show First 20 Lines • Show All 3,536 Lines • ▼ Show 20 Lines	public:
}		}

int getDwarfEHStackPointer(CodeGen::CodeGenModule &M) const { return 31; }		int getDwarfEHStackPointer(CodeGen::CodeGenModule &M) const { return 31; }

virtual bool doesReturnSlotInterfereWithArgs() const { return false; }		virtual bool doesReturnSlotInterfereWithArgs() const { return false; }
};		};
}		}

static bool isHomogeneousAggregate(QualType Ty, const Type *&Base,		static bool isARMHomogeneousAggregate(QualType Ty, const Type *&Base,
ASTContext &Context,		ASTContext &Context,
		bool isAArch64,
uint64_t *HAMembers = nullptr);		uint64_t *HAMembers = nullptr);

ABIArgInfo AArch64ABIInfo::classifyArgumentType(QualType Ty,		ABIArgInfo AArch64ABIInfo::classifyArgumentType(QualType Ty,
unsigned &AllocatedVFP,		unsigned &AllocatedVFP,
bool &IsHA,		bool &IsHA,
unsigned &AllocatedGPR,		unsigned &AllocatedGPR,
bool &IsSmallAggr,		bool &IsSmallAggr,
bool IsNamedArg) const {		bool IsNamedArg) const {
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	if (isEmptyRecord(getContext(), Ty, true)) {

++AllocatedGPR;		++AllocatedGPR;
return ABIArgInfo::getDirect(llvm::Type::getInt8Ty(getVMContext()));		return ABIArgInfo::getDirect(llvm::Type::getInt8Ty(getVMContext()));
}		}

// Homogeneous Floating-point Aggregates (HFAs) need to be expanded.		// Homogeneous Floating-point Aggregates (HFAs) need to be expanded.
const Type *Base = nullptr;		const Type *Base = nullptr;
uint64_t Members = 0;		uint64_t Members = 0;
if (isHomogeneousAggregate(Ty, Base, getContext(), &Members)) {		if (isARMHomogeneousAggregate(Ty, Base, getContext(), true, &Members)) {
IsHA = true;		IsHA = true;
if (!IsNamedArg && isDarwinPCS()) {		if (!IsNamedArg && isDarwinPCS()) {
// With the Darwin ABI, variadic arguments are always passed on the stack		// With the Darwin ABI, variadic arguments are always passed on the stack
// and should not be expanded. Treat variadic HFAs as arrays of doubles.		// and should not be expanded. Treat variadic HFAs as arrays of doubles.
uint64_t Size = getContext().getTypeSize(Ty);		uint64_t Size = getContext().getTypeSize(Ty);
llvm::Type *BaseTy = llvm::Type::getDoubleTy(getVMContext());		llvm::Type *BaseTy = llvm::Type::getDoubleTy(getVMContext());
return ABIArgInfo::getDirect(llvm::ArrayType::get(BaseTy, Size / 64));		return ABIArgInfo::getDirect(llvm::ArrayType::get(BaseTy, Size / 64));
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	return (RetTy->isPromotableIntegerType() && isDarwinPCS()
? ABIArgInfo::getExtend()		? ABIArgInfo::getExtend()
: ABIArgInfo::getDirect());		: ABIArgInfo::getDirect());
}		}

if (isEmptyRecord(getContext(), RetTy, true))		if (isEmptyRecord(getContext(), RetTy, true))
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

const Type *Base = nullptr;		const Type *Base = nullptr;
if (isHomogeneousAggregate(RetTy, Base, getContext()))		if (isARMHomogeneousAggregate(RetTy, Base, getContext(), true))
// Homogeneous Floating-point Aggregates (HFAs) are returned directly.		// Homogeneous Floating-point Aggregates (HFAs) are returned directly.
return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();

// Aggregates <= 16 bytes are returned directly in registers or on the stack.		// Aggregates <= 16 bytes are returned directly in registers or on the stack.
uint64_t Size = getContext().getTypeSize(RetTy);		uint64_t Size = getContext().getTypeSize(RetTy);
if (Size <= 128) {		if (Size <= 128) {
Size = 64 * ((Size + 63) / 64); // round up to multiple of 8 bytes		Size = 64 * ((Size + 63) / 64); // round up to multiple of 8 bytes
return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(), Size));		return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(), Size));
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	static llvm::Value EmitAArch64VAArg(llvm::Value VAListAddr, QualType Ty,
if (IsIndirect) {		if (IsIndirect) {
// If it's been passed indirectly (actually a struct), whatever we find from		// If it's been passed indirectly (actually a struct), whatever we find from
// stored registers or on the stack will actually be a struct **.		// stored registers or on the stack will actually be a struct **.
MemTy = llvm::PointerType::getUnqual(MemTy);		MemTy = llvm::PointerType::getUnqual(MemTy);
}		}

const Type *Base = nullptr;		const Type *Base = nullptr;
uint64_t NumMembers;		uint64_t NumMembers;
bool IsHFA = isHomogeneousAggregate(Ty, Base, Ctx, &NumMembers);		bool IsHFA = isARMHomogeneousAggregate(Ty, Base, Ctx, true, &NumMembers);
if (IsHFA && NumMembers > 1) {		if (IsHFA && NumMembers > 1) {
// Homogeneous aggregates passed in registers will have their elements split		// Homogeneous aggregates passed in registers will have their elements split
// and stored 16-bytes apart regardless of size (they're notionally in qN,		// and stored 16-bytes apart regardless of size (they're notionally in qN,
// qN+1, ...). We reload and store into a temporary local variable		// qN+1, ...). We reload and store into a temporary local variable
// contiguously.		// contiguously.
assert(!IsIndirect && "Homogeneous aggregates should be passed directly");		assert(!IsIndirect && "Homogeneous aggregates should be passed directly");
llvm::Type *BaseTy = CGF.ConvertType(QualType(Base, 0));		llvm::Type *BaseTy = CGF.ConvertType(QualType(Base, 0));
llvm::Type *HFATy = llvm::ArrayType::get(BaseTy, NumMembers);		llvm::Type *HFATy = llvm::ArrayType::get(BaseTy, NumMembers);
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	llvm::Value AArch64ABIInfo::EmitDarwinVAArg(llvm::Value VAListAddr, QualType Ty,
// other cases.		// other cases.
if (!isAggregateTypeForABI(Ty) && !isIllegalVectorType(Ty))		if (!isAggregateTypeForABI(Ty) && !isIllegalVectorType(Ty))
return nullptr;		return nullptr;

uint64_t Size = CGF.getContext().getTypeSize(Ty) / 8;		uint64_t Size = CGF.getContext().getTypeSize(Ty) / 8;
uint64_t Align = CGF.getContext().getTypeAlign(Ty) / 8;		uint64_t Align = CGF.getContext().getTypeAlign(Ty) / 8;

const Type *Base = nullptr;		const Type *Base = nullptr;
bool isHA = isHomogeneousAggregate(Ty, Base, getContext());		bool isHA = isARMHomogeneousAggregate(Ty, Base, getContext(), true);

bool isIndirect = false;		bool isIndirect = false;
// Arguments bigger than 16 bytes which aren't homogeneous aggregates should		// Arguments bigger than 16 bytes which aren't homogeneous aggregates should
// be passed indirectly.		// be passed indirectly.
if (Size > 16 && !isHA) {		if (Size > 16 && !isHA) {
isIndirect = true;		isIndirect = true;
Size = 8;		Size = 8;
Align = 8;		Align = 8;
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	void ARMABIInfo::setRuntimeCC() {

// Don't muddy up the IR with a ton of explicit annotations if		// Don't muddy up the IR with a ton of explicit annotations if
// they'd just match what LLVM will infer from the triple.		// they'd just match what LLVM will infer from the triple.
llvm::CallingConv::ID abiCC = getABIDefaultCC();		llvm::CallingConv::ID abiCC = getABIDefaultCC();
if (abiCC != getLLVMDefaultCC())		if (abiCC != getLLVMDefaultCC())
RuntimeCC = abiCC;		RuntimeCC = abiCC;
}		}

/// isHomogeneousAggregate - Return true if a type is an AAPCS-VFP homogeneous		/// isARMHomogeneousAggregate - Return true if a type is an AAPCS-VFP homogeneous
/// aggregate. If HAMembers is non-null, the number of base elements		/// aggregate. If HAMembers is non-null, the number of base elements
/// contained in the type is returned through it; this is used for the		/// contained in the type is returned through it; this is used for the
/// recursive calls that check aggregate component types.		/// recursive calls that check aggregate component types.
static bool isHomogeneousAggregate(QualType Ty, const Type *&Base,		static bool isARMHomogeneousAggregate(QualType Ty, const Type *&Base,
ASTContext &Context, uint64_t *HAMembers) {		ASTContext &Context, bool isAArch64,
		uint64_t *HAMembers) {
uint64_t Members = 0;		uint64_t Members = 0;
if (const ConstantArrayType *AT = Context.getAsConstantArrayType(Ty)) {		if (const ConstantArrayType *AT = Context.getAsConstantArrayType(Ty)) {
if (!isHomogeneousAggregate(AT->getElementType(), Base, Context, &Members))		if (!isARMHomogeneousAggregate(AT->getElementType(), Base, Context, isAArch64, &Members))
return false;		return false;
Members *= AT->getSize().getZExtValue();		Members *= AT->getSize().getZExtValue();
} else if (const RecordType *RT = Ty->getAs<RecordType>()) {		} else if (const RecordType *RT = Ty->getAs<RecordType>()) {
const RecordDecl *RD = RT->getDecl();		const RecordDecl *RD = RT->getDecl();
if (RD->hasFlexibleArrayMember())		if (RD->hasFlexibleArrayMember())
return false;		return false;

Members = 0;		Members = 0;
for (const auto *FD : RD->fields()) {		for (const auto *FD : RD->fields()) {
uint64_t FldMembers;		uint64_t FldMembers;
if (!isHomogeneousAggregate(FD->getType(), Base, Context, &FldMembers))		if (!isARMHomogeneousAggregate(FD->getType(), Base, Context, isAArch64, &FldMembers))
return false;		return false;

Members = (RD->isUnion() ?		Members = (RD->isUnion() ?
std::max(Members, FldMembers) : Members + FldMembers);		std::max(Members, FldMembers) : Members + FldMembers);
}		}
} else {		} else {
Members = 1;		Members = 1;
if (const ComplexType *CT = Ty->getAs<ComplexType>()) {		if (const ComplexType *CT = Ty->getAs<ComplexType>()) {
Members = 2;		Members = 2;
Ty = CT->getElementType();		Ty = CT->getElementType();
}		}

// Homogeneous aggregates for AAPCS-VFP must have base types of float,		// Homogeneous aggregates for AAPCS-VFP must have base types of float,
// double, or 64-bit or 128-bit vectors.		// double, or 64-bit or 128-bit vectors. "long double" has the same machine
		// type as double, so it is also allowed as a base type.
		// Homogeneous aggregates for AAPCS64 must have base types of a floating
		// point type or a short-vector type. This is the same as the 32-bit ABI,
		// but with the difference that any floating-point type is allowed,
		// including __fp16.
if (const BuiltinType *BT = Ty->getAs<BuiltinType>()) {		if (const BuiltinType *BT = Ty->getAs<BuiltinType>()) {
		if (isAArch64) {
		if (!BT->isFloatingPoint())
		return false;
		} else {
if (BT->getKind() != BuiltinType::Float &&		if (BT->getKind() != BuiltinType::Float &&
BT->getKind() != BuiltinType::Double &&		BT->getKind() != BuiltinType::Double &&
BT->getKind() != BuiltinType::LongDouble)		BT->getKind() != BuiltinType::LongDouble)
return false;		return false;
		}
} else if (const VectorType *VT = Ty->getAs<VectorType>()) {		} else if (const VectorType *VT = Ty->getAs<VectorType>()) {
unsigned VecSize = Context.getTypeSize(VT);		unsigned VecSize = Context.getTypeSize(VT);
if (VecSize != 64 && VecSize != 128)		if (VecSize != 64 && VecSize != 128)
return false;		return false;
} else {		} else {
return false;		return false;
}		}

▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	ABIArgInfo ARMABIInfo::classifyArgumentType(QualType Ty, bool isVariadic,
if (isEmptyRecord(getContext(), Ty, true))		if (isEmptyRecord(getContext(), Ty, true))
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

if (getABIKind() == ARMABIInfo::AAPCS_VFP && !isVariadic) {		if (getABIKind() == ARMABIInfo::AAPCS_VFP && !isVariadic) {
// Homogeneous Aggregates need to be expanded when we can fit the aggregate		// Homogeneous Aggregates need to be expanded when we can fit the aggregate
// into VFP registers.		// into VFP registers.
const Type *Base = nullptr;		const Type *Base = nullptr;
uint64_t Members = 0;		uint64_t Members = 0;
if (isHomogeneousAggregate(Ty, Base, getContext(), &Members)) {		if (isARMHomogeneousAggregate(Ty, Base, getContext(), false, &Members)) {
assert(Base && "Base class should be set for homogeneous aggregate");		assert(Base && "Base class should be set for homogeneous aggregate");
// Base can be a floating-point or a vector.		// Base can be a floating-point or a vector.
if (Base->isVectorType()) {		if (Base->isVectorType()) {
// ElementSize is in number of floats.		// ElementSize is in number of floats.
unsigned ElementSize = getContext().getTypeSize(Base) == 64 ? 2 : 4;		unsigned ElementSize = getContext().getTypeSize(Base) == 64 ? 2 : 4;
markAllocatedVFPs(ElementSize,		markAllocatedVFPs(ElementSize,
Members * ElementSize);		Members * ElementSize);
} else if (Base->isSpecificBuiltinType(BuiltinType::Float))		} else if (Base->isSpecificBuiltinType(BuiltinType::Float))
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	ABIArgInfo ARMABIInfo::classifyReturnType(QualType RetTy,
// Otherwise this is an AAPCS variant.		// Otherwise this is an AAPCS variant.

if (isEmptyRecord(getContext(), RetTy, true))		if (isEmptyRecord(getContext(), RetTy, true))
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

// Check for homogeneous aggregates with AAPCS-VFP.		// Check for homogeneous aggregates with AAPCS-VFP.
if (getABIKind() == AAPCS_VFP && !isVariadic) {		if (getABIKind() == AAPCS_VFP && !isVariadic) {
const Type *Base = nullptr;		const Type *Base = nullptr;
if (isHomogeneousAggregate(RetTy, Base, getContext())) {		if (isARMHomogeneousAggregate(RetTy, Base, getContext(), false)) {
assert(Base && "Base class should be set for homogeneous aggregate");		assert(Base && "Base class should be set for homogeneous aggregate");
// Homogeneous Aggregates are returned directly.		// Homogeneous Aggregates are returned directly.
return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();
}		}
}		}

// Aggregates <= 4 bytes are returned in r0; other aggregates		// Aggregates <= 4 bytes are returned in r0; other aggregates
// are returned indirectly.		// are returned indirectly.
▲ Show 20 Lines • Show All 2,181 Lines • Show Last 20 Lines

lib/Driver/Tools.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,692 Lines • ▼ Show 20 Lines	if (!Args.hasFlag(options::OPT_mno_stackrealign, options::OPT_mstackrealign,
CmdArgs.push_back(Args.MakeArgString("-mstackrealign"));		CmdArgs.push_back(Args.MakeArgString("-mstackrealign"));
}		}

if (Args.hasArg(options::OPT_mstack_alignment)) {		if (Args.hasArg(options::OPT_mstack_alignment)) {
StringRef alignment = Args.getLastArgValue(options::OPT_mstack_alignment);		StringRef alignment = Args.getLastArgValue(options::OPT_mstack_alignment);
CmdArgs.push_back(Args.MakeArgString("-mstack-alignment=" + alignment));		CmdArgs.push_back(Args.MakeArgString("-mstack-alignment=" + alignment));
}		}

		if (getToolChain().getTriple().getArch() == llvm::Triple::aarch64 \|\|
		getToolChain().getTriple().getArch() == llvm::Triple::aarch64_be)
		CmdArgs.push_back("-fallow-half-arguments-and-returns");

if (Arg *A = Args.getLastArg(options::OPT_mrestrict_it,		if (Arg *A = Args.getLastArg(options::OPT_mrestrict_it,
options::OPT_mno_restrict_it)) {		options::OPT_mno_restrict_it)) {
if (A->getOption().matches(options::OPT_mrestrict_it)) {		if (A->getOption().matches(options::OPT_mrestrict_it)) {
CmdArgs.push_back("-backend-option");		CmdArgs.push_back("-backend-option");
CmdArgs.push_back("-arm-restrict-it");		CmdArgs.push_back("-arm-restrict-it");
} else {		} else {
CmdArgs.push_back("-backend-option");		CmdArgs.push_back("-backend-option");
CmdArgs.push_back("-arm-no-restrict-it");		CmdArgs.push_back("-arm-no-restrict-it");
▲ Show 20 Lines • Show All 4,375 Lines • Show Last 20 Lines

lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 1,493 Lines • ▼ Show 20 Lines	#include "clang/Frontend/LangStandards.def"
Opts.ParseUnknownAnytype = Args.hasArg(OPT_funknown_anytype);		Opts.ParseUnknownAnytype = Args.hasArg(OPT_funknown_anytype);
Opts.DebuggerSupport = Args.hasArg(OPT_fdebugger_support);		Opts.DebuggerSupport = Args.hasArg(OPT_fdebugger_support);
Opts.DebuggerCastResultToId = Args.hasArg(OPT_fdebugger_cast_result_to_id);		Opts.DebuggerCastResultToId = Args.hasArg(OPT_fdebugger_cast_result_to_id);
Opts.DebuggerObjCLiteral = Args.hasArg(OPT_fdebugger_objc_literal);		Opts.DebuggerObjCLiteral = Args.hasArg(OPT_fdebugger_objc_literal);
Opts.ApplePragmaPack = Args.hasArg(OPT_fapple_pragma_pack);		Opts.ApplePragmaPack = Args.hasArg(OPT_fapple_pragma_pack);
Opts.CurrentModule = Args.getLastArgValue(OPT_fmodule_name);		Opts.CurrentModule = Args.getLastArgValue(OPT_fmodule_name);
Opts.ImplementationOfModule =		Opts.ImplementationOfModule =
Args.getLastArgValue(OPT_fmodule_implementation_of);		Args.getLastArgValue(OPT_fmodule_implementation_of);
		Opts.NativeHalfType = Opts.NativeHalfType;
		Opts.HalfArgsAndReturns = Args.hasArg(OPT_fallow_half_arguments_and_returns);

if (!Opts.CurrentModule.empty() && !Opts.ImplementationOfModule.empty() &&		if (!Opts.CurrentModule.empty() && !Opts.ImplementationOfModule.empty() &&
Opts.CurrentModule != Opts.ImplementationOfModule) {		Opts.CurrentModule != Opts.ImplementationOfModule) {
Diags.Report(diag::err_conflicting_module_names)		Diags.Report(diag::err_conflicting_module_names)
<< Opts.CurrentModule << Opts.ImplementationOfModule;		<< Opts.CurrentModule << Opts.ImplementationOfModule;
}		}

if (Arg *A = Args.getLastArg(OPT_faddress_space_map_mangling_EQ)) {		if (Arg *A = Args.getLastArg(OPT_faddress_space_map_mangling_EQ)) {
▲ Show 20 Lines • Show All 537 Lines • Show Last 20 Lines

lib/Sema/SemaType.cpp

Show First 20 Lines • Show All 1,740 Lines • ▼ Show 20 Lines
bool Sema::CheckFunctionReturnType(QualType T, SourceLocation Loc) {		bool Sema::CheckFunctionReturnType(QualType T, SourceLocation Loc) {
if (T->isArrayType() \|\| T->isFunctionType()) {		if (T->isArrayType() \|\| T->isFunctionType()) {
Diag(Loc, diag::err_func_returning_array_function)		Diag(Loc, diag::err_func_returning_array_function)
<< T->isFunctionType() << T;		<< T->isFunctionType() << T;
return true;		return true;
}		}

// Functions cannot return half FP.		// Functions cannot return half FP.
if (T->isHalfType()) {		if (T->isHalfType() && !getLangOpts().HalfArgsAndReturns) {
Diag(Loc, diag::err_parameters_retval_cannot_have_fp16_type) << 1 <<		Diag(Loc, diag::err_parameters_retval_cannot_have_fp16_type) << 1 <<
FixItHint::CreateInsertion(Loc, "*");		FixItHint::CreateInsertion(Loc, "*");
return true;		return true;
}		}

// Methods cannot return interface types. All ObjC objects are		// Methods cannot return interface types. All ObjC objects are
// passed by reference.		// passed by reference.
if (T->isObjCObjectType()) {		if (T->isObjCObjectType()) {
Show All 13 Lines	QualType Sema::BuildFunctionType(QualType T,
Invalid \|= CheckFunctionReturnType(T, Loc);		Invalid \|= CheckFunctionReturnType(T, Loc);

for (unsigned Idx = 0, Cnt = ParamTypes.size(); Idx < Cnt; ++Idx) {		for (unsigned Idx = 0, Cnt = ParamTypes.size(); Idx < Cnt; ++Idx) {
// FIXME: Loc is too inprecise here, should use proper locations for args.		// FIXME: Loc is too inprecise here, should use proper locations for args.
QualType ParamType = Context.getAdjustedParameterType(ParamTypes[Idx]);		QualType ParamType = Context.getAdjustedParameterType(ParamTypes[Idx]);
if (ParamType->isVoidType()) {		if (ParamType->isVoidType()) {
Diag(Loc, diag::err_param_with_void_type);		Diag(Loc, diag::err_param_with_void_type);
Invalid = true;		Invalid = true;
} else if (ParamType->isHalfType()) {		} else if (ParamType->isHalfType() && !getLangOpts().HalfArgsAndReturns) {
// Disallow half FP arguments.		// Disallow half FP arguments.
Diag(Loc, diag::err_parameters_retval_cannot_have_fp16_type) << 0 <<		Diag(Loc, diag::err_parameters_retval_cannot_have_fp16_type) << 0 <<
FixItHint::CreateInsertion(Loc, "*");		FixItHint::CreateInsertion(Loc, "*");
Invalid = true;		Invalid = true;
}		}

ParamTypes[Idx] = ParamType;		ParamTypes[Idx] = ParamType;
}		}
▲ Show 20 Lines • Show All 958 Lines • ▼ Show 20 Lines	case DeclaratorChunk::Function: {
// Do not allow returning half FP value.		// Do not allow returning half FP value.
// FIXME: This really should be in BuildFunctionType.		// FIXME: This really should be in BuildFunctionType.
if (T->isHalfType()) {		if (T->isHalfType()) {
if (S.getLangOpts().OpenCL) {		if (S.getLangOpts().OpenCL) {
if (!S.getOpenCLOptions().cl_khr_fp16) {		if (!S.getOpenCLOptions().cl_khr_fp16) {
S.Diag(D.getIdentifierLoc(), diag::err_opencl_half_return) << T;		S.Diag(D.getIdentifierLoc(), diag::err_opencl_half_return) << T;
D.setInvalidType(true);		D.setInvalidType(true);
}		}
} else {		} else if (!S.getLangOpts().HalfArgsAndReturns) {
S.Diag(D.getIdentifierLoc(),		S.Diag(D.getIdentifierLoc(),
diag::err_parameters_retval_cannot_have_fp16_type) << 1;		diag::err_parameters_retval_cannot_have_fp16_type) << 1;
D.setInvalidType(true);		D.setInvalidType(true);
}		}
}		}

// Methods cannot return interface types. All ObjC objects are		// Methods cannot return interface types. All ObjC objects are
// passed by reference.		// passed by reference.
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	case DeclaratorChunk::Function: {
// FIXME: This really should be in BuildFunctionType.		// FIXME: This really should be in BuildFunctionType.
if (S.getLangOpts().OpenCL) {		if (S.getLangOpts().OpenCL) {
if (!S.getOpenCLOptions().cl_khr_fp16) {		if (!S.getOpenCLOptions().cl_khr_fp16) {
S.Diag(Param->getLocation(),		S.Diag(Param->getLocation(),
diag::err_opencl_half_param) << ParamTy;		diag::err_opencl_half_param) << ParamTy;
D.setInvalidType();		D.setInvalidType();
Param->setInvalidDecl();		Param->setInvalidDecl();
}		}
} else {		} else if (!S.getLangOpts().HalfArgsAndReturns) {
S.Diag(Param->getLocation(),		S.Diag(Param->getLocation(),
diag::err_parameters_retval_cannot_have_fp16_type) << 0;		diag::err_parameters_retval_cannot_have_fp16_type) << 0;
D.setInvalidType();		D.setInvalidType();
}		}
} else if (!FTI.hasPrototype) {		} else if (!FTI.hasPrototype) {
if (ParamTy->isPromotableIntegerType()) {		if (ParamTy->isPromotableIntegerType()) {
ParamTy = Context.getPromotedIntegerType(ParamTy);		ParamTy = Context.getPromotedIntegerType(ParamTy);
Param->setKNRPromoted(true);		Param->setKNRPromoted(true);
▲ Show 20 Lines • Show All 2,661 Lines • Show Last 20 Lines

test/CodeGen/arm64-aapcs-arguments.c

	// RUN: %clang_cc1 -triple arm64-linux-gnu -target-feature +neon -target-abi aapcs -ffreestanding -emit-llvm -w -o - %s \| FileCheck %s			// RUN: %clang_cc1 -triple aarch64-linux-gnu -target-feature +neon -target-abi aapcs -ffreestanding -fallow-half-arguments-and-returns -emit-llvm -w -o - %s \| FileCheck %s

	// AAPCS clause C.8 says: If the argument has an alignment of 16 then the NGRN			// AAPCS clause C.8 says: If the argument has an alignment of 16 then the NGRN
	// is rounded up to the next even number.			// is rounded up to the next even number.

	// CHECK: void @test1(i32 %x0, i128 %x2_x3, i128 %x4_x5, i128 %x6_x7, i128 %sp.coerce)			// CHECK: void @test1(i32 %x0, i128 %x2_x3, i128 %x4_x5, i128 %x6_x7, i128 %sp.coerce)
	typedef union { __int128 a; } Small;			typedef union { __int128 a; } Small;
	void test1(int x0, __int128 x2_x3, __int128 x4_x5, __int128 x6_x7, Small sp) {			void test1(int x0, __int128 x2_x3, __int128 x4_x5, __int128 x6_x7, Small sp) {
	}			}
	Show All 25 Lines
	}			}

	// It's the job of the argument consumer to perform the required sign & zero			// It's the job of the argument consumer to perform the required sign & zero
	// extensions under AAPCS. There shouldn't be			// extensions under AAPCS. There shouldn't be

	// CHECK: define i8 @test5(i8 %a, i16 %b)			// CHECK: define i8 @test5(i8 %a, i16 %b)
	unsigned char test5(unsigned char a, signed short b) {			unsigned char test5(unsigned char a, signed short b) {
	}			}

				// __fp16 can be used as a function argument or return type (ACLE 2.0)
				// CHECK: define half @test_half(half %{{.*}})
				__fp16 test_half(__fp16 A) { }

				// __fp16 is a base type for homogeneous floating-point aggregates for AArch64 (but not 32-bit ARM).
				// CHECK: define %struct.HFA_half @test_half_hfa(half %{{.}}, half %{{.}}, half %{{.}}, half %{{.}})
				struct HFA_half { __fp16 a[4]; };
				struct HFA_half test_half_hfa(struct HFA_half A) { }

This is an archive of the discontinued LLVM Phabricator instance.

Allow __fp16 as a function arg or return type for AArch64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 12436

include/clang/Basic/LangOptions.def

include/clang/Driver/CC1Options.td

lib/CodeGen/CGExprConstant.cpp

lib/CodeGen/CGExprScalar.cpp

lib/CodeGen/CodeGenTypes.cpp

lib/CodeGen/TargetInfo.cpp

lib/Driver/Tools.cpp

lib/Frontend/CompilerInvocation.cpp

lib/Sema/SemaType.cpp

test/CodeGen/arm64-aapcs-arguments.c

Allow __fp16 as a function arg or return type for AArch64
ClosedPublic