This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeFloatTypes.cpp
-
LegalizeTypes.h
-
Target/ARM/
-
ARM/
-
ARMISelLowering.h
-
ARMISelLowering.cpp
-
test/
-
CodeGen/ARM/
-
ARM/
-
fp16-promote.ll
-
fp16-v3.ll
-
vfp16-calling-conv.ll
-
Transforms/LoopVectorize/ARM/
-
LoopVectorize/
-
ARM/
-
interleaved_cost.ll

Differential D49987

[ARM] Make FP16 vectors legal
AbandonedPublic

Authored by miyuki on Jul 30 2018, 7:49 AM.

Download Raw Diff

Details

Reviewers

olista01
eli.friedman
javed.absar

Summary

On targets that do not support FP16 natively LLVM currently legalizes
vectors of FP16 values by scalarizing them and promoting to FP32. This
causes problems for the following code:

void foo(int, ...);

typedef __attribute__((neon_vector_type(4))) __fp16 float16x4_t;
void bar(float16x4_t x) {
  foo(42, x);
}

According to the AAPCS (appendix A.2) float16x4_t is a containerized
vector fundamental type, so 'foo' expects that the 4 16-bit FP values
are packed into 2 32-bit registers, but instead bar promotes them to
4 single precision values.

This patch makes FP16 vectors legal in the backend, to that they can
be marshalled correctly when passed as parameters. All operations
(except for loads and stores) on FP16 vectors get expanded. The change
required several adjustments in SelectionDAG and in ARM FP16 tests.

Diff Detail

Event Timeline

miyuki created this revision.Jul 30 2018, 7:49 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJul 30 2018, 7:50 AM

Herald added subscribers: chrib, kristof.beyls. · View Herald Transcript

Please make sure there's test coverage for illegal half operations (fadd <4 x half> etc.)

I tried compiling a fadd <4 x half> and it actually does not work: the type gets legalized into v4i16 and we get an unknown libcall. So, the problem is that we need to soften v4f16 to v4i16 when passing it as a function parameter, but at the same time expand to f32 when performing arithmetics on it. Do you have any suggestion how to implement this correctly? Do any other targets face a similar problem?

Can we make <4 x half> a legal type on all subtargets with NEON, and just mark all the operations "expand" for subtargets which don't support math on it?

This approach worked, but required handling two more operations in PromoteFloatOperand: PromoteFloatOp_BUILD_VECTOR and PromoteFloatOp_INSERT_VECTOR_ELT, which in my implementation don't actually promote anything but rather do some FP softening. Not sure if there is a better solution.

SjoerdMeijer added a subscriber: SjoerdMeijer.Aug 2 2018, 3:30 AM

Needs tests for hardfloat ABI, in addition to soft-float.

Does half also need to be legal to get passed/returned correctly?

It looks like clang already has special handling for half types in some cases; how does that interact with this patch?

LukeGeeson mentioned this in D50252: [ARM] Added FP16 VREV Vector Instrinsic CodeGen support.Aug 3 2018, 6:30 AM

I have added a hardfp test. Clang has some logic to handle scalar FP16 values, it performs softening when needed (e.g. tools/clang/test/CodeGen/arm-fp16-arguments.c), whereas NEON vectors are lowered into LLVM vector types. So we don't need additional changes neither in Clang nor in the LLVM scalar FP16 handling.

We should handle f16 and vectors of f16 in a consistent manner, for the sake of maintaining the code in the future. (Either handle both in the backend, or handle both in clang.)

Alternative (Clang) patch: https://reviews.llvm.org/D50507

Abandoning in favor of https://reviews.llvm.org/D50507

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

LegalizeFloatTypes.cpp

27 lines

LegalizeTypes.h

2 lines

Target/

ARM/

ARMISelLowering.h

5 lines

ARMISelLowering.cpp

117 lines

test/

CodeGen/

ARM/

fp16-promote.ll

24 lines

fp16-v3.ll

12 lines

vfp16-calling-conv.ll

33 lines

Transforms/

LoopVectorize/

ARM/

interleaved_cost.ll

4 lines

Diff 158541

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

Show First 20 Lines • Show All 1,761 Lines • ▼ Show 20 Lines	switch (N->getOpcode()) {
case ISD::BITCAST: R = PromoteFloatOp_BITCAST(N, OpNo); break;		case ISD::BITCAST: R = PromoteFloatOp_BITCAST(N, OpNo); break;
case ISD::FCOPYSIGN: R = PromoteFloatOp_FCOPYSIGN(N, OpNo); break;		case ISD::FCOPYSIGN: R = PromoteFloatOp_FCOPYSIGN(N, OpNo); break;
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT: R = PromoteFloatOp_FP_TO_XINT(N, OpNo); break;		case ISD::FP_TO_UINT: R = PromoteFloatOp_FP_TO_XINT(N, OpNo); break;
case ISD::FP_EXTEND: R = PromoteFloatOp_FP_EXTEND(N, OpNo); break;		case ISD::FP_EXTEND: R = PromoteFloatOp_FP_EXTEND(N, OpNo); break;
case ISD::SELECT_CC: R = PromoteFloatOp_SELECT_CC(N, OpNo); break;		case ISD::SELECT_CC: R = PromoteFloatOp_SELECT_CC(N, OpNo); break;
case ISD::SETCC: R = PromoteFloatOp_SETCC(N, OpNo); break;		case ISD::SETCC: R = PromoteFloatOp_SETCC(N, OpNo); break;
case ISD::STORE: R = PromoteFloatOp_STORE(N, OpNo); break;		case ISD::STORE: R = PromoteFloatOp_STORE(N, OpNo); break;
		case ISD::BUILD_VECTOR: R = PromoteFloatOp_BUILD_VECTOR(N, OpNo); break;
		case ISD::INSERT_VECTOR_ELT:
		R = PromoteFloatOp_INSERT_VECTOR_ELT(N, OpNo);
		break;
}		}

if (R.getNode())		if (R.getNode())
ReplaceValueWith(SDValue(N, 0), R);		ReplaceValueWith(SDValue(N, 0), R);
return false;		return false;
}		}

SDValue DAGTypeLegalizer::PromoteFloatOp_BITCAST(SDNode *N, unsigned OpNo) {		SDValue DAGTypeLegalizer::PromoteFloatOp_BITCAST(SDNode *N, unsigned OpNo) {
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteFloatOp_STORE(SDNode *N, unsigned OpNo) {
SDValue NewVal;		SDValue NewVal;
NewVal = DAG.getNode(GetPromotionOpcode(Promoted.getValueType(), VT), DL,		NewVal = DAG.getNode(GetPromotionOpcode(Promoted.getValueType(), VT), DL,
IVT, Promoted);		IVT, Promoted);

return DAG.getStore(ST->getChain(), DL, NewVal, ST->getBasePtr(),		return DAG.getStore(ST->getChain(), DL, NewVal, ST->getBasePtr(),
ST->getMemOperand());		ST->getMemOperand());
}		}

		SDValue DAGTypeLegalizer::PromoteFloatOp_BUILD_VECTOR(SDNode *N,
		unsigned OpNo) {
		SmallVector<SDValue, 8> ConvertedValues;
		llvm::transform(
		N->op_values(), std::back_inserter(ConvertedValues),
		[this](const SDValue &Val) { return BitConvertToInteger(Val); });

		SDValue IntRes = DAG.getNode(
		ISD::BUILD_VECTOR, SDLoc(N),
		N->getValueType(0).changeVectorElementTypeToInteger(), ConvertedValues);
		return DAG.getNode(ISD::BITCAST, SDLoc(N), N->getValueType(0), IntRes);
		}

		SDValue DAGTypeLegalizer::PromoteFloatOp_INSERT_VECTOR_ELT(SDNode *N,
		unsigned OpNo) {
		SDValue IntVec = BitConvertVectorToIntegerVector(N->getOperand(0));
		SDValue IntElem = BitConvertToInteger(N->getOperand(1));
		SDValue IntRes =
		DAG.getNode(ISD::INSERT_VECTOR_ELT, SDLoc(N), IntVec.getValueType(),
		IntVec, IntElem, N->getOperand(2));
		return DAG.getNode(ISD::BITCAST, SDLoc(IntVec), N->getValueType(0), IntRes);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Float Result Promotion		// Float Result Promotion
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void DAGTypeLegalizer::PromoteFloatResult(SDNode *N, unsigned ResNo) {		void DAGTypeLegalizer::PromoteFloatResult(SDNode *N, unsigned ResNo) {
SDValue R = SDValue();		SDValue R = SDValue();

switch (N->getOpcode()) {		switch (N->getOpcode()) {
▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 622 Lines • ▼ Show 20 Lines	private:
bool PromoteFloatOperand(SDNode *N, unsigned OpNo);		bool PromoteFloatOperand(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_BITCAST(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_BITCAST(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_FCOPYSIGN(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_FCOPYSIGN(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_FP_EXTEND(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_FP_EXTEND(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_FP_TO_XINT(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_FP_TO_XINT(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_STORE(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_STORE(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_SELECT_CC(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_SELECT_CC(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_SETCC(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_SETCC(SDNode *N, unsigned OpNo);
		SDValue PromoteFloatOp_BUILD_VECTOR(SDNode *N, unsigned OpNo);
		SDValue PromoteFloatOp_INSERT_VECTOR_ELT(SDNode *N, unsigned OpNo);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Scalarization Support: LegalizeVectorTypes.cpp		// Scalarization Support: LegalizeVectorTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// Given a processed one-element vector Op which was scalarized to its		/// Given a processed one-element vector Op which was scalarized to its
/// element type, this returns the element. For example, if Op is a v1i32,		/// element type, this returns the element. For example, if Op is a v1i32,
/// Op = < i32 val >, this method returns val, an i32.		/// Op = < i32 val >, this method returns val, an i32.
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 603 Lines • ▼ Show 20 Lines	private:
// check.		// check.
bool InsertFencesForAtomic;		bool InsertFencesForAtomic;

bool HasStandaloneRem = true;		bool HasStandaloneRem = true;

void addTypeForNEON(MVT VT, MVT PromotedLdStVT, MVT PromotedBitwiseVT);		void addTypeForNEON(MVT VT, MVT PromotedLdStVT, MVT PromotedBitwiseVT);
void addDRTypeForNEON(MVT VT);		void addDRTypeForNEON(MVT VT);
void addQRTypeForNEON(MVT VT);		void addQRTypeForNEON(MVT VT);
		/// Expand all operations (except loads, stores and basic arithmetic)
		/// for a given FP type
		void setFPFunctionsExpand(MVT VT);
		/// Expand all operations (except loads and stores) for a given FP type
		void setFPOperationsExpand(MVT VT);
std::pair<SDValue, SDValue> getARMXALUOOp(SDValue Op, SelectionDAG &DAG, SDValue &ARMcc) const;		std::pair<SDValue, SDValue> getARMXALUOOp(SDValue Op, SelectionDAG &DAG, SDValue &ARMcc) const;

using RegsToPassVector = SmallVector<std::pair<unsigned, SDValue>, 8>;		using RegsToPassVector = SmallVector<std::pair<unsigned, SDValue>, 8>;

void PassF64ArgInRegs(const SDLoc &dl, SelectionDAG &DAG, SDValue Chain,		void PassF64ArgInRegs(const SDLoc &dl, SelectionDAG &DAG, SDValue Chain,
SDValue &Arg, RegsToPassVector &RegsToPass,		SDValue &Arg, RegsToPassVector &RegsToPass,
CCValAssign &VA, CCValAssign &NextVA,		CCValAssign &VA, CCValAssign &NextVA,
SDValue &StackPtr,		SDValue &StackPtr,
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	void ARMTargetLowering::addDRTypeForNEON(MVT VT) {
addTypeForNEON(VT, MVT::f64, MVT::v2i32);		addTypeForNEON(VT, MVT::f64, MVT::v2i32);
}		}

void ARMTargetLowering::addQRTypeForNEON(MVT VT) {		void ARMTargetLowering::addQRTypeForNEON(MVT VT) {
addRegisterClass(VT, &ARM::DPairRegClass);		addRegisterClass(VT, &ARM::DPairRegClass);
addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);		addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
}		}

		void ARMTargetLowering::setFPFunctionsExpand(MVT VT) {
		for (ISD::NodeType Op : { ISD::FSQRT, ISD::FSIN, ISD::FCOS,
		ISD::FPOW, ISD::FLOG, ISD::FLOG2,
		ISD::FLOG10, ISD::FEXP, ISD::FEXP2,
		ISD::FCEIL, ISD::FTRUNC, ISD::FRINT,
		ISD::FNEARBYINT, ISD::FFLOOR })
		setOperationAction(Op, VT, Expand);
		}

		void ARMTargetLowering::setFPOperationsExpand(MVT VT) {
		for (ISD::NodeType Op : { ISD::FADD, ISD::FSUB, ISD::FMUL,
		ISD::FMA, ISD::FDIV, ISD::FREM,
		ISD::FCOPYSIGN, ISD::FGETSIGN, ISD::SETCC,
		ISD::FNEG, ISD::FABS })
		setOperationAction(Op, VT, Expand);
		setFPFunctionsExpand(VT);
		}

ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,		ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,
const ARMSubtarget &STI)		const ARMSubtarget &STI)
: TargetLowering(TM), Subtarget(&STI) {		: TargetLowering(TM), Subtarget(&STI) {
RegInfo = Subtarget->getRegisterInfo();		RegInfo = Subtarget->getRegisterInfo();
Itins = Subtarget->getInstrItineraryData();		Itins = Subtarget->getInstrItineraryData();

setBooleanContents(ZeroOrOneBooleanContent);		setBooleanContents(ZeroOrOneBooleanContent);
setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);		setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);
▲ Show 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	if (Subtarget->hasNEON()) {

addQRTypeForNEON(MVT::v4f32);		addQRTypeForNEON(MVT::v4f32);
addQRTypeForNEON(MVT::v2f64);		addQRTypeForNEON(MVT::v2f64);
addQRTypeForNEON(MVT::v16i8);		addQRTypeForNEON(MVT::v16i8);
addQRTypeForNEON(MVT::v8i16);		addQRTypeForNEON(MVT::v8i16);
addQRTypeForNEON(MVT::v4i32);		addQRTypeForNEON(MVT::v4i32);
addQRTypeForNEON(MVT::v2i64);		addQRTypeForNEON(MVT::v2i64);

if (Subtarget->hasFullFP16()) {		// Even if the target does not support FP16 operations we want to keep
		// <4 x half> and <8 x half> legal, because they can still be used as
		// storage types and need to be handled correctly when passed as function
		// parameters (the calling convention requires to treat them as
		// containerized vectors)
addQRTypeForNEON(MVT::v8f16);		addQRTypeForNEON(MVT::v8f16);
addDRTypeForNEON(MVT::v4f16);		addDRTypeForNEON(MVT::v4f16);
		if (!Subtarget->hasFullFP16()) {
		setFPOperationsExpand(MVT::v8f16);
		setFPOperationsExpand(MVT::v4f16);
}		}

// v2f64 is legal so that QR subregs can be extracted as f64 elements, but		// v2f64 is legal so that QR subregs can be extracted as f64 elements, but
// neither Neon nor VFP support any arithmetic operations on it.		// neither Neon nor VFP support any arithmetic operations on it.
// The same with v4f32. But keep in mind that vadd, vsub, vmul are natively		// The same with v4f32. But keep in mind that vadd, vsub, vmul are natively
// supported for v4f32.		// supported for v4f32.
setOperationAction(ISD::FADD, MVT::v2f64, Expand);		// FIXME: Create unittest for FCOPYSIGN.
setOperationAction(ISD::FSUB, MVT::v2f64, Expand);
setOperationAction(ISD::FMUL, MVT::v2f64, Expand);
// FIXME: Code duplication: FDIV and FREM are expanded always, see
// ARMTargetLowering::addTypeForNEON method for details.
setOperationAction(ISD::FDIV, MVT::v2f64, Expand);
setOperationAction(ISD::FREM, MVT::v2f64, Expand);
// FIXME: Create unittest.
// In another words, find a way when "copysign" appears in DAG with vector		// In another words, find a way when "copysign" appears in DAG with vector
// operands.		// operands.
setOperationAction(ISD::FCOPYSIGN, MVT::v2f64, Expand);
// FIXME: Code duplication: SETCC has custom operation action, see		// FIXME: Code duplication: SETCC has custom operation action, see
// ARMTargetLowering::addTypeForNEON method for details.		// ARMTargetLowering::addTypeForNEON method for details.
setOperationAction(ISD::SETCC, MVT::v2f64, Expand);
// FIXME: Create unittest for FNEG and for FABS.		// FIXME: Create unittest for FNEG and for FABS.
setOperationAction(ISD::FNEG, MVT::v2f64, Expand);
setOperationAction(ISD::FABS, MVT::v2f64, Expand);
setOperationAction(ISD::FSQRT, MVT::v2f64, Expand);
setOperationAction(ISD::FSIN, MVT::v2f64, Expand);
setOperationAction(ISD::FCOS, MVT::v2f64, Expand);
setOperationAction(ISD::FPOW, MVT::v2f64, Expand);
setOperationAction(ISD::FLOG, MVT::v2f64, Expand);
setOperationAction(ISD::FLOG2, MVT::v2f64, Expand);
setOperationAction(ISD::FLOG10, MVT::v2f64, Expand);
setOperationAction(ISD::FEXP, MVT::v2f64, Expand);
setOperationAction(ISD::FEXP2, MVT::v2f64, Expand);
// FIXME: Create unittest for FCEIL, FTRUNC, FRINT, FNEARBYINT, FFLOOR.		// FIXME: Create unittest for FCEIL, FTRUNC, FRINT, FNEARBYINT, FFLOOR.
setOperationAction(ISD::FCEIL, MVT::v2f64, Expand);		setFPOperationsExpand(MVT::v2f64);
setOperationAction(ISD::FTRUNC, MVT::v2f64, Expand);
setOperationAction(ISD::FRINT, MVT::v2f64, Expand);		setFPFunctionsExpand(MVT::v4f32);
setOperationAction(ISD::FNEARBYINT, MVT::v2f64, Expand);
setOperationAction(ISD::FFLOOR, MVT::v2f64, Expand);
setOperationAction(ISD::FMA, MVT::v2f64, Expand);

setOperationAction(ISD::FSQRT, MVT::v4f32, Expand);
setOperationAction(ISD::FSIN, MVT::v4f32, Expand);
setOperationAction(ISD::FCOS, MVT::v4f32, Expand);
setOperationAction(ISD::FPOW, MVT::v4f32, Expand);
setOperationAction(ISD::FLOG, MVT::v4f32, Expand);
setOperationAction(ISD::FLOG2, MVT::v4f32, Expand);
setOperationAction(ISD::FLOG10, MVT::v4f32, Expand);
setOperationAction(ISD::FEXP, MVT::v4f32, Expand);
setOperationAction(ISD::FEXP2, MVT::v4f32, Expand);
setOperationAction(ISD::FCEIL, MVT::v4f32, Expand);
setOperationAction(ISD::FTRUNC, MVT::v4f32, Expand);
setOperationAction(ISD::FRINT, MVT::v4f32, Expand);
setOperationAction(ISD::FNEARBYINT, MVT::v4f32, Expand);
setOperationAction(ISD::FFLOOR, MVT::v4f32, Expand);

// Mark v2f32 intrinsics.		// Mark v2f32 intrinsics.
setOperationAction(ISD::FSQRT, MVT::v2f32, Expand);		setFPFunctionsExpand(MVT::v2f32);
setOperationAction(ISD::FSIN, MVT::v2f32, Expand);
setOperationAction(ISD::FCOS, MVT::v2f32, Expand);
setOperationAction(ISD::FPOW, MVT::v2f32, Expand);
setOperationAction(ISD::FLOG, MVT::v2f32, Expand);
setOperationAction(ISD::FLOG2, MVT::v2f32, Expand);
setOperationAction(ISD::FLOG10, MVT::v2f32, Expand);
setOperationAction(ISD::FEXP, MVT::v2f32, Expand);
setOperationAction(ISD::FEXP2, MVT::v2f32, Expand);
setOperationAction(ISD::FCEIL, MVT::v2f32, Expand);
setOperationAction(ISD::FTRUNC, MVT::v2f32, Expand);
setOperationAction(ISD::FRINT, MVT::v2f32, Expand);
setOperationAction(ISD::FNEARBYINT, MVT::v2f32, Expand);
setOperationAction(ISD::FFLOOR, MVT::v2f32, Expand);

// Neon does not support some operations on v1i64 and v2i64 types.		// Neon does not support some operations on v1i64 and v2i64 types.
setOperationAction(ISD::MUL, MVT::v1i64, Expand);		setOperationAction(ISD::MUL, MVT::v1i64, Expand);
// Custom handling for some quad-vector types to detect VMULL.		// Custom handling for some quad-vector types to detect VMULL.
setOperationAction(ISD::MUL, MVT::v8i16, Custom);		setOperationAction(ISD::MUL, MVT::v8i16, Custom);
setOperationAction(ISD::MUL, MVT::v4i32, Custom);		setOperationAction(ISD::MUL, MVT::v4i32, Custom);
setOperationAction(ISD::MUL, MVT::v2i64, Custom);		setOperationAction(ISD::MUL, MVT::v2i64, Custom);
// Custom handling for some vector types to avoid expensive expansions		// Custom handling for some vector types to avoid expensive expansions
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	if (Subtarget->hasNEON()) {
}		}
}		}

if (Subtarget->isFPOnlySP()) {		if (Subtarget->isFPOnlySP()) {
// When targeting a floating-point unit with only single-precision		// When targeting a floating-point unit with only single-precision
// operations, f64 is legal for the few double-precision instructions which		// operations, f64 is legal for the few double-precision instructions which
// are present However, no double-precision operations other than moves,		// are present However, no double-precision operations other than moves,
// loads and stores are provided by the hardware.		// loads and stores are provided by the hardware.
setOperationAction(ISD::FADD, MVT::f64, Expand);		setFPOperationsExpand(MVT::f64);
setOperationAction(ISD::FSUB, MVT::f64, Expand);
setOperationAction(ISD::FMUL, MVT::f64, Expand);
setOperationAction(ISD::FMA, MVT::f64, Expand);
setOperationAction(ISD::FDIV, MVT::f64, Expand);
setOperationAction(ISD::FREM, MVT::f64, Expand);
setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
setOperationAction(ISD::FGETSIGN, MVT::f64, Expand);
setOperationAction(ISD::FNEG, MVT::f64, Expand);
setOperationAction(ISD::FABS, MVT::f64, Expand);
setOperationAction(ISD::FSQRT, MVT::f64, Expand);
setOperationAction(ISD::FSIN, MVT::f64, Expand);
setOperationAction(ISD::FCOS, MVT::f64, Expand);
setOperationAction(ISD::FPOW, MVT::f64, Expand);
setOperationAction(ISD::FLOG, MVT::f64, Expand);
setOperationAction(ISD::FLOG2, MVT::f64, Expand);
setOperationAction(ISD::FLOG10, MVT::f64, Expand);
setOperationAction(ISD::FEXP, MVT::f64, Expand);
setOperationAction(ISD::FEXP2, MVT::f64, Expand);
setOperationAction(ISD::FCEIL, MVT::f64, Expand);
setOperationAction(ISD::FTRUNC, MVT::f64, Expand);
setOperationAction(ISD::FRINT, MVT::f64, Expand);
setOperationAction(ISD::FNEARBYINT, MVT::f64, Expand);
setOperationAction(ISD::FFLOOR, MVT::f64, Expand);
setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::i32, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::i32, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::f64, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::f64, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::f64, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::f64, Custom);
setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);		setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);
setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);		setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);
▲ Show 20 Lines • Show All 14,228 Lines • Show Last 20 Lines

test/CodeGen/ARM/fp16-promote.ll

Show First 20 Lines • Show All 814 Lines • ▼ Show 20 Lines

; f16 vectors are not legal in the backend. Vector elements are not assigned		; f16 vectors are not legal in the backend. Vector elements are not assigned
; to the register, but are stored in the stack instead. Hence insertelement		; to the register, but are stored in the stack instead. Hence insertelement
; and extractelement have these extra loads and stores.		; and extractelement have these extra loads and stores.

; CHECK-ALL-LABEL: test_insertelement:		; CHECK-ALL-LABEL: test_insertelement:
; CHECK-ALL: sub sp, sp, #8		; CHECK-ALL: sub sp, sp, #8

; CHECK-VFP: and		; CHECK-VFP: and
; CHECK-VFP: mov		; CHECK-VFP: mov
; CHECK-VFP: ldrd		; CHECK-VFP: vldr
; CHECK-VFP: orr		; CHECK-VFP: orr
; CHECK-VFP: ldrh		; CHECK-VFP: ldrh
; CHECK-VFP: stm		; CHECK-VFP: vstr
; CHECK-VFP: strh		; CHECK-VFP: strh
; CHECK-VFP: ldm		; CHECK-VFP: vldr
; CHECK-VFP: stm		; CHECK-VFP: vstr

; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP-DAG: strh		; CHECK-NOVFP-DAG: strh
; CHECK-NOVFP-DAG: strh		; CHECK-NOVFP-DAG: strh
; CHECK-NOVFP-DAG: mov		; CHECK-NOVFP-DAG: mov
Show All 15 Lines	define void @test_insertelement(half* %p, <4 x half>* %q, i32 %i) #0 {
%a = load half, half* %p, align 2		%a = load half, half* %p, align 2
%b = load <4 x half>, <4 x half>* %q, align 8		%b = load <4 x half>, <4 x half>* %q, align 8
%c = insertelement <4 x half> %b, half %a, i32 %i		%c = insertelement <4 x half> %b, half %a, i32 %i
store <4 x half> %c, <4 x half>* %q		store <4 x half> %c, <4 x half>* %q
ret void		ret void
}		}

; CHECK-ALL-LABEL: test_extractelement:		; CHECK-ALL-LABEL: test_extractelement:
; CHECK-VFP: push {{{.*}}, lr}
; CHECK-VFP: sub sp, sp, #8		; CHECK-VFP: sub sp, sp, #8
; CHECK-VFP: ldrd		; CHECK-VFP: vldr
		; CHECK-VFP: and
; CHECK-VFP: mov		; CHECK-VFP: mov
; CHECK-VFP: orr		; CHECK-VFP: orr
		; CHECK-VFP: vstr
; CHECK-VFP: ldrh		; CHECK-VFP: ldrh
; CHECK-VFP: strh		; CHECK-VFP: strh
; CHECK-VFP: add sp, sp, #8		; CHECK-VFP: add sp, sp, #8
; CHECK-VFP: pop {{{.*}}, pc}
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

test/CodeGen/ARM/fp16-v3.ll

	; RUN: llc -mattr=+fp16 < %s \| FileCheck %s			; RUN: llc -mattr=+fp16 < %s \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "armv7a--none-eabi"			target triple = "armv7a--none-eabi"

	; CHECK-LABEL: test_vec3:			; CHECK-LABEL: test_vec3:
	; CHECK-DAG: vmov.f32 [[SREG1:s[0-9]+]], #1.200000e+01			; CHECK-DAG: vmov.f32 [[SREG1:s[0-9]+]], #1.200000e+01
	; CHECK-DAG: vcvt.f32.s32 [[SREG2:s[0-9]+]],			; CHECK-DAG: vcvt.f32.s32 [[SREG2:s[0-9]+]],
	; CHECK-DAG: vcvtb.f16.f32 [[SREG3:s[0-9]+]], [[SREG2]]			; CHECK-DAG: vcvtb.f16.f32 [[SREG3:s[0-9]+]], [[SREG2]]
	; CHECK-DAG: vcvtb.f32.f16 [[SREG4:s[0-9]+]], [[SREG3]]			; CHECK-DAG: vcvtb.f32.f16 [[SREG4:s[0-9]+]], [[SREG3]]
	; CHECK: vadd.f32 [[SREG5:s[0-9]+]], [[SREG4]], [[SREG1]]			; CHECK: vadd.f32 [[SREG5:s[0-9]+]], [[SREG4]], [[SREG1]]
	; CHECK-NEXT: vcvtb.f16.f32 [[SREG6:s[0-9]+]], [[SREG5]]			; CHECK-NEXT: vcvtb.f16.f32 [[SREG6:s[0-9]+]], [[SREG5]]
	; CHECK-NEXT: vmov [[RREG1:r[0-9]+]], [[SREG6]]			; CHECK-NEXT: vmov [[RREG1:r[0-9]+]], [[SREG6]]
	; CHECK-DAG: uxth [[RREG2:r[0-9]+]], [[RREG1]]
	; CHECK-DAG: pkhbt [[RREG3:r[0-9]+]], [[RREG1]], [[RREG1]], lsl #16
	; CHECK-DAG: strh [[RREG1]], [r0, #4]			; CHECK-DAG: strh [[RREG1]], [r0, #4]
	; CHECK-DAG: vmov [[DREG:d[0-9]+]], [[RREG3]], [[RREG2]]			; CHECK-DAG: vdup.16 [[DREG:d[0-9]+]], [[RREG1]]
	; CHECK-DAG: vst1.32 {[[DREG]][0]}, [r0:32]			; CHECK-DAG: vst1.32 {[[DREG]][0]}, [r0:32]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	define void @test_vec3(<3 x half>* %arr, i32 %i) #0 {			define void @test_vec3(<3 x half>* %arr, i32 %i) #0 {
	%H = sitofp i32 %i to half			%H = sitofp i32 %i to half
	%S = fadd half %H, 0xH4A00			%S = fadd half %H, 0xH4A00
	%1 = insertelement <3 x half> undef, half %S, i32 0			%1 = insertelement <3 x half> undef, half %S, i32 0
	%2 = insertelement <3 x half> %1, half %S, i32 1			%2 = insertelement <3 x half> %1, half %S, i32 1
	%3 = insertelement <3 x half> %2, half %S, i32 2			%3 = insertelement <3 x half> %2, half %S, i32 2
	store <3 x half> %3, <3 x half>* %arr, align 8			store <3 x half> %3, <3 x half>* %arr, align 8
	ret void			ret void
	}			}

	; CHECK-LABEL: test_bitcast:			; CHECK-LABEL: test_bitcast:
	; CHECK: vcvtb.f16.f32			; CHECK-DAG: vst1.16
	; CHECK: vcvtb.f16.f32			; CHECK-DAG: vst1.32
	; CHECK: vcvtb.f16.f32			; CHECK: bx lr
	; CHECK: pkhbt
	; CHECK: uxth
	define void @test_bitcast(<3 x half> %inp, <3 x i16>* %arr) #0 {			define void @test_bitcast(<3 x half> %inp, <3 x i16>* %arr) #0 {
	%bc = bitcast <3 x half> %inp to <3 x i16>			%bc = bitcast <3 x half> %inp to <3 x i16>
	store <3 x i16> %bc, <3 x i16>* %arr, align 8			store <3 x i16> %bc, <3 x i16>* %arr, align 8
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/CodeGen/ARM/vfp16-calling-conv.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "armv7-none--eabi"

				@v = local_unnamed_addr global <4 x half> zeroinitializer, align 8

				declare void @callee(<4 x half>) #0

				; CHECK-LABEL: test_soften:
				; CHECK: vldr [[DREG:d[0-9]+]], {{\[r[0-9]+]}}
				; CHECK-NEXT: vmov r0, r1, [[DREG]]
				; CHECK-NEXT: b callee
				define void @test_soften() #0 {
				entry:
				%0 = load <4 x half>, <4 x half>* @v, align 8
				tail call void (<4 x half>) @callee(<4 x half> %0)
				ret void
				}

				; CHECK-LABEL: test_illegal_op:
				; CHECK: vadd.f32
				; CHECK: vadd.f32
				; CHECK: vadd.f32
				; CHECK: vadd.f32
				; CHECK: b callee
				define void @test_illegal_op(<4 x half> %a, <4 x half> %b) #0 {
				%c = fadd <4 x half> %a, %b
				tail call void (<4 x half>) @callee(<4 x half> %c)
				ret void
				}

				attributes #0 = { nounwind }

test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	}			}

	%half.2 = type {half, half}			%half.2 = type {half, half}
	define void @half_factor_2(%half.2* %data, i64 %n) {			define void @half_factor_2(%half.2* %data, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	; VF_4-LABEL: Checking a loop in "half_factor_2"			; VF_4-LABEL: Checking a loop in "half_factor_2"
	; VF_4: Found an estimated cost of 40 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_4: Found an estimated cost of 33 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_4-NEXT: Found an estimated cost of 32 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 32 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2
	; VF_8-LABEL: Checking a loop in "half_factor_2"			; VF_8-LABEL: Checking a loop in "half_factor_2"
	; VF_8: Found an estimated cost of 80 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_8: Found an estimated cost of 66 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 64 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 64 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1
	%tmp2 = load half, half* %tmp0, align 2			%tmp2 = load half, half* %tmp0, align 2
	Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Make FP16 vectors legalAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 158541

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

lib/CodeGen/SelectionDAG/LegalizeTypes.h

lib/Target/ARM/ARMISelLowering.h

lib/Target/ARM/ARMISelLowering.cpp

test/CodeGen/ARM/fp16-promote.ll

test/CodeGen/ARM/fp16-v3.ll

test/CodeGen/ARM/vfp16-calling-conv.ll

test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

[ARM] Make FP16 vectors legal
AbandonedPublic