This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
CodeGen/
-
SelectionDAG/
-
LegalizeFloatTypes.cpp
-
LegalizeTypes.h
-
TargetLoweringBase.cpp
-
Target/ARM/
-
ARM/
-
ARMISelLowering.h
-
ARMISelLowering.cpp
-
test/
-
CodeGen/ARM/
-
ARM/
-
fp16-promote.ll
-
fp16-soften.ll
-
fp16-v3.ll
-
Transforms/LoopVectorize/ARM/
-
LoopVectorize/
-
ARM/
-
interleaved_cost.ll

Differential D49987

[ARM] Make FP16 vectors legal
AbandonedPublic

Authored by miyuki on Jul 30 2018, 7:49 AM.

Download Raw Diff

Details

Reviewers

olista01
eli.friedman
javed.absar

Summary

On targets that do not support FP16 natively LLVM currently legalizes
vectors of FP16 values by scalarizing them and promoting to FP32. This
causes problems for the following code:

void foo(int, ...);

typedef __attribute__((neon_vector_type(4))) __fp16 float16x4_t;
void bar(float16x4_t x) {
  foo(42, x);
}

According to the AAPCS (appendix A.2) float16x4_t is a containerized
vector fundamental type, so 'foo' expects that the 4 16-bit FP values
are packed into 2 32-bit registers, but instead bar promotes them to
4 single precision values.

This patch makes FP16 vectors legal in the backend, to that they can
be marshalled correctly when passed as parameters. All operations
(except for loads and stores) on FP16 vectors get expanded. The change
required several adjustments in SelectionDAG and in ARM FP16 tests.

Diff Detail

Event Timeline

miyuki created this revision.Jul 30 2018, 7:49 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJul 30 2018, 7:50 AM

Herald added subscribers: chrib, kristof.beyls. · View Herald Transcript

Please make sure there's test coverage for illegal half operations (fadd <4 x half> etc.)

I tried compiling a fadd <4 x half> and it actually does not work: the type gets legalized into v4i16 and we get an unknown libcall. So, the problem is that we need to soften v4f16 to v4i16 when passing it as a function parameter, but at the same time expand to f32 when performing arithmetics on it. Do you have any suggestion how to implement this correctly? Do any other targets face a similar problem?

Can we make <4 x half> a legal type on all subtargets with NEON, and just mark all the operations "expand" for subtargets which don't support math on it?

This approach worked, but required handling two more operations in PromoteFloatOperand: PromoteFloatOp_BUILD_VECTOR and PromoteFloatOp_INSERT_VECTOR_ELT, which in my implementation don't actually promote anything but rather do some FP softening. Not sure if there is a better solution.

SjoerdMeijer added a subscriber: SjoerdMeijer.Aug 2 2018, 3:30 AM

Needs tests for hardfloat ABI, in addition to soft-float.

Does half also need to be legal to get passed/returned correctly?

It looks like clang already has special handling for half types in some cases; how does that interact with this patch?

LukeGeeson mentioned this in D50252: [ARM] Added FP16 VREV Vector Instrinsic CodeGen support.Aug 3 2018, 6:30 AM

I have added a hardfp test. Clang has some logic to handle scalar FP16 values, it performs softening when needed (e.g. tools/clang/test/CodeGen/arm-fp16-arguments.c), whereas NEON vectors are lowered into LLVM vector types. So we don't need additional changes neither in Clang nor in the LLVM scalar FP16 handling.

We should handle f16 and vectors of f16 in a consistent manner, for the sake of maintaining the code in the future. (Either handle both in the backend, or handle both in clang.)

Alternative (Clang) patch: https://reviews.llvm.org/D50507

Abandoning in favor of https://reviews.llvm.org/D50507

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

LegalizeFloatTypes.cpp

23 lines

LegalizeTypes.h

2 lines

TargetLoweringBase.cpp

16 lines

Target/

ARM/

ARMISelLowering.h

3 lines

ARMISelLowering.cpp

7 lines

test/

CodeGen/

ARM/

fp16-promote.ll

24 lines

fp16-soften.ll

21 lines

fp16-v3.ll

10 lines

Transforms/

LoopVectorize/

ARM/

interleaved_cost.ll

4 lines

Diff 157962

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	case ISD::CopyToReg:
assert(isLegalInHWReg(N->getValueType(ResNo)) &&		assert(isLegalInHWReg(N->getValueType(ResNo)) &&
"Unsupported SoftenFloatRes opcode!");		"Unsupported SoftenFloatRes opcode!");
// Only when isLegalInHWReg, we can skip check of the operands.		// Only when isLegalInHWReg, we can skip check of the operands.
R = SDValue(N, ResNo);		R = SDValue(N, ResNo);
break;		break;
case ISD::MERGE_VALUES:R = SoftenFloatRes_MERGE_VALUES(N, ResNo); break;		case ISD::MERGE_VALUES:R = SoftenFloatRes_MERGE_VALUES(N, ResNo); break;
case ISD::BITCAST: R = SoftenFloatRes_BITCAST(N, ResNo); break;		case ISD::BITCAST: R = SoftenFloatRes_BITCAST(N, ResNo); break;
case ISD::BUILD_PAIR: R = SoftenFloatRes_BUILD_PAIR(N); break;		case ISD::BUILD_PAIR: R = SoftenFloatRes_BUILD_PAIR(N); break;
		case ISD::BUILD_VECTOR:
		R = SoftenFloatRes_BUILD_VECTOR(N); break;
case ISD::ConstantFP: R = SoftenFloatRes_ConstantFP(N, ResNo); break;		case ISD::ConstantFP: R = SoftenFloatRes_ConstantFP(N, ResNo); break;
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
R = SoftenFloatRes_EXTRACT_VECTOR_ELT(N, ResNo); break;		R = SoftenFloatRes_EXTRACT_VECTOR_ELT(N, ResNo); break;
		case ISD::INSERT_VECTOR_ELT:
		R = SoftenFloatRes_INSERT_VECTOR_ELT(N); break;
case ISD::FABS: R = SoftenFloatRes_FABS(N, ResNo); break;		case ISD::FABS: R = SoftenFloatRes_FABS(N, ResNo); break;
case ISD::FMINNUM: R = SoftenFloatRes_FMINNUM(N); break;		case ISD::FMINNUM: R = SoftenFloatRes_FMINNUM(N); break;
case ISD::FMAXNUM: R = SoftenFloatRes_FMAXNUM(N); break;		case ISD::FMAXNUM: R = SoftenFloatRes_FMAXNUM(N); break;
case ISD::FADD: R = SoftenFloatRes_FADD(N); break;		case ISD::FADD: R = SoftenFloatRes_FADD(N); break;
case ISD::FCEIL: R = SoftenFloatRes_FCEIL(N); break;		case ISD::FCEIL: R = SoftenFloatRes_FCEIL(N); break;
case ISD::FCOPYSIGN: R = SoftenFloatRes_FCOPYSIGN(N, ResNo); break;		case ISD::FCOPYSIGN: R = SoftenFloatRes_FCOPYSIGN(N, ResNo); break;
case ISD::FCOS: R = SoftenFloatRes_FCOS(N); break;		case ISD::FCOS: R = SoftenFloatRes_FCOS(N); break;
case ISD::FDIV: R = SoftenFloatRes_FDIV(N); break;		case ISD::FDIV: R = SoftenFloatRes_FDIV(N); break;
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SoftenFloatRes_BUILD_PAIR(SDNode *N) {
// Convert the inputs to integers, and build a new pair out of them.		// Convert the inputs to integers, and build a new pair out of them.
return DAG.getNode(ISD::BUILD_PAIR, SDLoc(N),		return DAG.getNode(ISD::BUILD_PAIR, SDLoc(N),
TLI.getTypeToTransformTo(*DAG.getContext(),		TLI.getTypeToTransformTo(*DAG.getContext(),
N->getValueType(0)),		N->getValueType(0)),
BitConvertToInteger(N->getOperand(0)),		BitConvertToInteger(N->getOperand(0)),
BitConvertToInteger(N->getOperand(1)));		BitConvertToInteger(N->getOperand(1)));
}		}

		SDValue DAGTypeLegalizer::SoftenFloatRes_BUILD_VECTOR(SDNode *N) {
		SmallVector<SDValue, 8> ConvertedValues;
		llvm::transform(
		N->op_values(), std::back_inserter(ConvertedValues),
		[this](const SDValue &Val) { return BitConvertToInteger(Val); });

		return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N),
		TLI.getTypeToTransformTo(*DAG.getContext(),
		N->getValueType(0)),
		ConvertedValues);
		}

SDValue DAGTypeLegalizer::SoftenFloatRes_ConstantFP(SDNode *N, unsigned ResNo) {		SDValue DAGTypeLegalizer::SoftenFloatRes_ConstantFP(SDNode *N, unsigned ResNo) {
// When LegalInHWReg, we can load better from the constant pool.		// When LegalInHWReg, we can load better from the constant pool.
if (isLegalInHWReg(N->getValueType(ResNo)))		if (isLegalInHWReg(N->getValueType(ResNo)))
return SDValue(N, ResNo);		return SDValue(N, ResNo);
ConstantFPSDNode *CN = cast<ConstantFPSDNode>(N);		ConstantFPSDNode *CN = cast<ConstantFPSDNode>(N);
// In ppcf128, the high 64 bits are always first in memory regardless		// In ppcf128, the high 64 bits are always first in memory regardless
// of Endianness. LLVM's APFloat representation is not Endian sensitive,		// of Endianness. LLVM's APFloat representation is not Endian sensitive,
// and so always converts into a 128-bit APInt in a non-Endian-sensitive		// and so always converts into a 128-bit APInt in a non-Endian-sensitive
Show All 21 Lines	SDValue DAGTypeLegalizer::SoftenFloatRes_EXTRACT_VECTOR_ELT(SDNode *N, unsigned ResNo) {
if (isLegalInHWReg(N->getValueType(ResNo)))		if (isLegalInHWReg(N->getValueType(ResNo)))
return SDValue(N, ResNo);		return SDValue(N, ResNo);
SDValue NewOp = BitConvertVectorToIntegerVector(N->getOperand(0));		SDValue NewOp = BitConvertVectorToIntegerVector(N->getOperand(0));
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(N),		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(N),
NewOp.getValueType().getVectorElementType(),		NewOp.getValueType().getVectorElementType(),
NewOp, N->getOperand(1));		NewOp, N->getOperand(1));
}		}

		SDValue DAGTypeLegalizer::SoftenFloatRes_INSERT_VECTOR_ELT(SDNode *N) {
		SDValue NewVec = BitConvertVectorToIntegerVector(N->getOperand(0));
		SDValue NewElem = BitConvertToInteger(N->getOperand(1));
		return DAG.getNode(ISD::INSERT_VECTOR_ELT, SDLoc(N), NewVec.getValueType(),
		NewVec, NewElem, N->getOperand(2));
		}

SDValue DAGTypeLegalizer::SoftenFloatRes_FABS(SDNode *N, unsigned ResNo) {		SDValue DAGTypeLegalizer::SoftenFloatRes_FABS(SDNode *N, unsigned ResNo) {
// When LegalInHWReg, FABS can be implemented as native bitwise operations.		// When LegalInHWReg, FABS can be implemented as native bitwise operations.
if (isLegalInHWReg(N->getValueType(ResNo)))		if (isLegalInHWReg(N->getValueType(ResNo)))
return SDValue(N, ResNo);		return SDValue(N, ResNo);
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
unsigned Size = NVT.getSizeInBits();		unsigned Size = NVT.getSizeInBits();

// Mask = ~(1 << (Size-1))		// Mask = ~(1 << (Size-1))
▲ Show 20 Lines • Show All 1,966 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 464 Lines • ▼ Show 20 Lines	private:
}		}
void SetSoftenedFloat(SDValue Op, SDValue Result);		void SetSoftenedFloat(SDValue Op, SDValue Result);

// Convert Float Results to Integer for Non-HW-supported Operations.		// Convert Float Results to Integer for Non-HW-supported Operations.
bool SoftenFloatResult(SDNode *N, unsigned ResNo);		bool SoftenFloatResult(SDNode *N, unsigned ResNo);
SDValue SoftenFloatRes_MERGE_VALUES(SDNode *N, unsigned ResNo);		SDValue SoftenFloatRes_MERGE_VALUES(SDNode *N, unsigned ResNo);
SDValue SoftenFloatRes_BITCAST(SDNode *N, unsigned ResNo);		SDValue SoftenFloatRes_BITCAST(SDNode *N, unsigned ResNo);
SDValue SoftenFloatRes_BUILD_PAIR(SDNode *N);		SDValue SoftenFloatRes_BUILD_PAIR(SDNode *N);
		SDValue SoftenFloatRes_BUILD_VECTOR(SDNode *N);
SDValue SoftenFloatRes_ConstantFP(SDNode *N, unsigned ResNo);		SDValue SoftenFloatRes_ConstantFP(SDNode *N, unsigned ResNo);
SDValue SoftenFloatRes_EXTRACT_VECTOR_ELT(SDNode *N, unsigned ResNo);		SDValue SoftenFloatRes_EXTRACT_VECTOR_ELT(SDNode *N, unsigned ResNo);
		SDValue SoftenFloatRes_INSERT_VECTOR_ELT(SDNode *N);
SDValue SoftenFloatRes_FABS(SDNode *N, unsigned ResNo);		SDValue SoftenFloatRes_FABS(SDNode *N, unsigned ResNo);
SDValue SoftenFloatRes_FMINNUM(SDNode *N);		SDValue SoftenFloatRes_FMINNUM(SDNode *N);
SDValue SoftenFloatRes_FMAXNUM(SDNode *N);		SDValue SoftenFloatRes_FMAXNUM(SDNode *N);
SDValue SoftenFloatRes_FADD(SDNode *N);		SDValue SoftenFloatRes_FADD(SDNode *N);
SDValue SoftenFloatRes_FCEIL(SDNode *N);		SDValue SoftenFloatRes_FCEIL(SDNode *N);
SDValue SoftenFloatRes_FCOPYSIGN(SDNode *N, unsigned ResNo);		SDValue SoftenFloatRes_FCOPYSIGN(SDNode *N, unsigned ResNo);
SDValue SoftenFloatRes_FCOS(SDNode *N);		SDValue SoftenFloatRes_FCOS(SDNode *N);
SDValue SoftenFloatRes_FDIV(SDNode *N);		SDValue SoftenFloatRes_FDIV(SDNode *N);
▲ Show 20 Lines • Show All 439 Lines • Show Last 20 Lines

lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 1,159 Lines • ▼ Show 20 Lines	for (unsigned i = MVT::FIRST_VECTOR_VALUETYPE;
if (isTypeLegal(VT))		if (isTypeLegal(VT))
continue;		continue;

MVT EltVT = VT.getVectorElementType();		MVT EltVT = VT.getVectorElementType();
unsigned NElts = VT.getVectorNumElements();		unsigned NElts = VT.getVectorNumElements();
bool IsLegalWiderType = false;		bool IsLegalWiderType = false;
LegalizeTypeAction PreferredAction = getPreferredVectorAction(VT);		LegalizeTypeAction PreferredAction = getPreferredVectorAction(VT);
switch (PreferredAction) {		switch (PreferredAction) {
		case TypeSoftenFloat: {
		MVT SoftEltVT = MVT::getIntegerVT(EltVT.getSizeInBits());
		MVT SoftVT = MVT::getVectorVT(SoftEltVT, NElts);
		if (isTypeLegal(SoftVT)) {
		unsigned ToInd = (unsigned)SoftVT.SimpleTy;
		assert(ToInd < i && "FP types precede integer types in MVT?");
		TransformToType[i] = SoftVT;
		RegisterTypeForVT[i] = RegisterTypeForVT[ToInd];
		NumRegistersForVT[i] = NumRegistersForVT[ToInd];
		ValueTypeActions.setTypeAction(VT, TypeSoftenFloat);
		break;
		}

		LLVM_FALLTHROUGH;
		}

case TypePromoteInteger:		case TypePromoteInteger:
// Try to promote the elements of integer vectors. If no legal		// Try to promote the elements of integer vectors. If no legal
// promotion was found, fall through to the widen-vector method.		// promotion was found, fall through to the widen-vector method.
for (unsigned nVT = i + 1; nVT <= MVT::LAST_INTEGER_VECTOR_VALUETYPE; ++nVT) {		for (unsigned nVT = i + 1; nVT <= MVT::LAST_INTEGER_VECTOR_VALUETYPE; ++nVT) {
MVT SVT = (MVT::SimpleValueType) nVT;		MVT SVT = (MVT::SimpleValueType) nVT;
// Promote vectors of integers to vectors with the same number		// Promote vectors of integers to vectors with the same number
// of elements, with a wider element type.		// of elements, with a wider element type.
if (SVT.getScalarSizeInBits() > EltVT.getSizeInBits() &&		if (SVT.getScalarSizeInBits() > EltVT.getSizeInBits() &&
▲ Show 20 Lines • Show All 681 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	public:
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;		unsigned Index) const override;

/// Returns true if an argument of type Ty needs to be passed in a		/// Returns true if an argument of type Ty needs to be passed in a
/// contiguous block of registers in calling convention CallConv.		/// contiguous block of registers in calling convention CallConv.
bool functionArgumentNeedsConsecutiveRegisters(		bool functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg) const override;		Type *Ty, CallingConv::ID CallConv, bool isVarArg) const override;

		TargetLoweringBase::LegalizeTypeAction
		getPreferredVectorAction(EVT VT) const override;

/// If a physical register, this returns the register that receives the		/// If a physical register, this returns the register that receives the
/// exception address on entry to an EH pad.		/// exception address on entry to an EH pad.
unsigned		unsigned
getExceptionPointerRegister(const Constant *PersonalityFn) const override;		getExceptionPointerRegister(const Constant *PersonalityFn) const override;

/// If a physical register, this returns the register that receives the		/// If a physical register, this returns the register that receives the
/// exception typeid on entry to a landing pad.		/// exception typeid on entry to a landing pad.
unsigned		unsigned
▲ Show 20 Lines • Show All 302 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,840 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::functionArgumentNeedsConsecutiveRegisters(
uint64_t Members = 0;		uint64_t Members = 0;
bool IsHA = isHomogeneousAggregate(Ty, Base, Members);		bool IsHA = isHomogeneousAggregate(Ty, Base, Members);
LLVM_DEBUG(dbgs() << "isHA: " << IsHA << " "; Ty->dump());		LLVM_DEBUG(dbgs() << "isHA: " << IsHA << " "; Ty->dump());

bool IsIntArray = Ty->isArrayTy() && Ty->getArrayElementType()->isIntegerTy();		bool IsIntArray = Ty->isArrayTy() && Ty->getArrayElementType()->isIntegerTy();
return IsHA \|\| IsIntArray;		return IsHA \|\| IsIntArray;
}		}

		TargetLoweringBase::LegalizeTypeAction
		ARMTargetLowering::getPreferredVectorAction(EVT VT) const {
		if (VT.isFloatingPoint() && VT.getScalarSizeInBits() == 16)
		return TargetLoweringBase::LegalizeTypeAction::TypeSoftenFloat;
		return TargetLoweringBase::getPreferredVectorAction(VT);
		}

unsigned ARMTargetLowering::getExceptionPointerRegister(		unsigned ARMTargetLowering::getExceptionPointerRegister(
const Constant *PersonalityFn) const {		const Constant *PersonalityFn) const {
// Platforms which do not use SjLj EH may return values in these registers		// Platforms which do not use SjLj EH may return values in these registers
// via the personality function.		// via the personality function.
return Subtarget->useSjLjEH() ? ARM::NoRegister : ARM::R0;		return Subtarget->useSjLjEH() ? ARM::NoRegister : ARM::R0;
}		}

unsigned ARMTargetLowering::getExceptionSelectorRegister(		unsigned ARMTargetLowering::getExceptionSelectorRegister(
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/CodeGen/ARM/fp16-promote.ll

Show First 20 Lines • Show All 814 Lines • ▼ Show 20 Lines

; f16 vectors are not legal in the backend. Vector elements are not assigned		; f16 vectors are not legal in the backend. Vector elements are not assigned
; to the register, but are stored in the stack instead. Hence insertelement		; to the register, but are stored in the stack instead. Hence insertelement
; and extractelement have these extra loads and stores.		; and extractelement have these extra loads and stores.

; CHECK-ALL-LABEL: test_insertelement:		; CHECK-ALL-LABEL: test_insertelement:
; CHECK-ALL: sub sp, sp, #8		; CHECK-ALL: sub sp, sp, #8

; CHECK-VFP: and		; CHECK-VFP: and
; CHECK-VFP: mov		; CHECK-VFP: mov
; CHECK-VFP: ldrd		; CHECK-VFP: vldr
; CHECK-VFP: orr		; CHECK-VFP: orr
; CHECK-VFP: ldrh		; CHECK-VFP: ldrh
; CHECK-VFP: stm		; CHECK-VFP: vstr
; CHECK-VFP: strh		; CHECK-VFP: strh
; CHECK-VFP: ldm		; CHECK-VFP: vldr
; CHECK-VFP: stm		; CHECK-VFP: vstr

; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP-DAG: strh		; CHECK-NOVFP-DAG: strh
; CHECK-NOVFP-DAG: strh		; CHECK-NOVFP-DAG: strh
; CHECK-NOVFP-DAG: mov		; CHECK-NOVFP-DAG: mov
Show All 15 Lines	define void @test_insertelement(half* %p, <4 x half>* %q, i32 %i) #0 {
%a = load half, half* %p, align 2		%a = load half, half* %p, align 2
%b = load <4 x half>, <4 x half>* %q, align 8		%b = load <4 x half>, <4 x half>* %q, align 8
%c = insertelement <4 x half> %b, half %a, i32 %i		%c = insertelement <4 x half> %b, half %a, i32 %i
store <4 x half> %c, <4 x half>* %q		store <4 x half> %c, <4 x half>* %q
ret void		ret void
}		}

; CHECK-ALL-LABEL: test_extractelement:		; CHECK-ALL-LABEL: test_extractelement:
; CHECK-VFP: push {{{.*}}, lr}
; CHECK-VFP: sub sp, sp, #8		; CHECK-VFP: sub sp, sp, #8
; CHECK-VFP: ldrd		; CHECK-VFP: vldr
		; CHECK-VFP: and
; CHECK-VFP: mov		; CHECK-VFP: mov
; CHECK-VFP: orr		; CHECK-VFP: orr
		; CHECK-VFP: vstr
; CHECK-VFP: ldrh		; CHECK-VFP: ldrh
; CHECK-VFP: strh		; CHECK-VFP: strh
; CHECK-VFP: add sp, sp, #8		; CHECK-VFP: add sp, sp, #8
; CHECK-VFP: pop {{{.*}}, pc}
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
; CHECK-NOVFP: ldrh		; CHECK-NOVFP: ldrh
; CHECK-NOVFP: strh		; CHECK-NOVFP: strh
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

test/CodeGen/ARM/fp16-soften.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "armv7-none--eabi"

				@v = local_unnamed_addr global <4 x half> zeroinitializer, align 8

				declare void @callee(<4 x half>) #0

				; CHECK-LABEL: test_soften:
				; CHECK: vldr [[DREG:d[0-9]+]], {{\[r[0-9]+]}}
				; CHECK-NEXT: vmov r0, r1, [[DREG]]
				; CHECK-NEXT: b callee
				define void @test_soften() #0 {
				entry:
				%0 = load <4 x half>, <4 x half>* @v, align 8
				tail call void (<4 x half>) @callee(<4 x half> %0)
				ret void
				}

				attributes #0 = { nounwind }

test/CodeGen/ARM/fp16-v3.ll

	; RUN: llc -mattr=+fp16 < %s \| FileCheck %s			; RUN: llc -mattr=+fp16 < %s \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "armv7a--none-eabi"			target triple = "armv7a--none-eabi"

	; CHECK-LABEL: test_vec3:			; CHECK-LABEL: test_vec3:
	; CHECK-DAG: vmov.f32 [[SREG1:s[0-9]+]], #1.200000e+01			; CHECK-DAG: vmov.f32 [[SREG1:s[0-9]+]], #1.200000e+01
	; CHECK-DAG: vcvt.f32.s32 [[SREG2:s[0-9]+]],			; CHECK-DAG: vcvt.f32.s32 [[SREG2:s[0-9]+]],
	; CHECK-DAG: vcvtb.f16.f32 [[SREG3:s[0-9]+]], [[SREG2]]			; CHECK-DAG: vcvtb.f16.f32 [[SREG3:s[0-9]+]], [[SREG2]]
	; CHECK-DAG: vcvtb.f32.f16 [[SREG4:s[0-9]+]], [[SREG3]]			; CHECK-DAG: vcvtb.f32.f16 [[SREG4:s[0-9]+]], [[SREG3]]
	; CHECK: vadd.f32 [[SREG5:s[0-9]+]], [[SREG4]], [[SREG1]]			; CHECK: vadd.f32 [[SREG5:s[0-9]+]], [[SREG4]], [[SREG1]]
	; CHECK-NEXT: vcvtb.f16.f32 [[SREG6:s[0-9]+]], [[SREG5]]			; CHECK-NEXT: vcvtb.f16.f32 [[SREG6:s[0-9]+]], [[SREG5]]
	; CHECK-NEXT: vmov [[RREG1:r[0-9]+]], [[SREG6]]			; CHECK-NEXT: vmov [[RREG1:r[0-9]+]], [[SREG6]]
	; CHECK-DAG: uxth [[RREG2:r[0-9]+]], [[RREG1]]
	; CHECK-DAG: pkhbt [[RREG3:r[0-9]+]], [[RREG1]], [[RREG1]], lsl #16
	; CHECK-DAG: strh [[RREG1]], [r0, #4]			; CHECK-DAG: strh [[RREG1]], [r0, #4]
	; CHECK-DAG: vmov [[DREG:d[0-9]+]], [[RREG3]], [[RREG2]]			; CHECK-DAG: vdup.16 [[DREG:d[0-9]+]], [[RREG1]]
	; CHECK-DAG: vst1.32 {[[DREG]][0]}, [r0:32]			; CHECK-DAG: vst1.32 {[[DREG]][0]}, [r0:32]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	define void @test_vec3(<3 x half>* %arr, i32 %i) #0 {			define void @test_vec3(<3 x half>* %arr, i32 %i) #0 {
	%H = sitofp i32 %i to half			%H = sitofp i32 %i to half
	%S = fadd half %H, 0xH4A00			%S = fadd half %H, 0xH4A00
	%1 = insertelement <3 x half> undef, half %S, i32 0			%1 = insertelement <3 x half> undef, half %S, i32 0
	%2 = insertelement <3 x half> %1, half %S, i32 1			%2 = insertelement <3 x half> %1, half %S, i32 1
	%3 = insertelement <3 x half> %2, half %S, i32 2			%3 = insertelement <3 x half> %2, half %S, i32 2
	store <3 x half> %3, <3 x half>* %arr, align 8			store <3 x half> %3, <3 x half>* %arr, align 8
	ret void			ret void
	}			}

	; CHECK-LABEL: test_bitcast:			; CHECK-LABEL: test_bitcast:
	; CHECK: vcvtb.f16.f32			; CHECK: vcvtb.f16.f32
	; CHECK: vcvtb.f16.f32			; CHECK: vcvtb.f16.f32
				; CHECK: vmov.16
	; CHECK: vcvtb.f16.f32			; CHECK: vcvtb.f16.f32
	; CHECK: pkhbt			; CHECK: vmov.16
	; CHECK: uxth			; CHECK: vst1.32
				; CHECK: strh
	define void @test_bitcast(<3 x half> %inp, <3 x i16>* %arr) #0 {			define void @test_bitcast(<3 x half> %inp, <3 x i16>* %arr) #0 {
	%bc = bitcast <3 x half> %inp to <3 x i16>			%bc = bitcast <3 x half> %inp to <3 x i16>
	store <3 x i16> %bc, <3 x i16>* %arr, align 8			store <3 x i16> %bc, <3 x i16>* %arr, align 8
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	}			}

	%half.2 = type {half, half}			%half.2 = type {half, half}
	define void @half_factor_2(%half.2* %data, i64 %n) {			define void @half_factor_2(%half.2* %data, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	; VF_4-LABEL: Checking a loop in "half_factor_2"			; VF_4-LABEL: Checking a loop in "half_factor_2"
	; VF_4: Found an estimated cost of 40 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_4: Found an estimated cost of 33 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_4-NEXT: Found an estimated cost of 32 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 32 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2
	; VF_8-LABEL: Checking a loop in "half_factor_2"			; VF_8-LABEL: Checking a loop in "half_factor_2"
	; VF_8: Found an estimated cost of 80 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_8: Found an estimated cost of 66 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 64 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 64 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1
	%tmp2 = load half, half* %tmp0, align 2			%tmp2 = load half, half* %tmp0, align 2
	Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Make FP16 vectors legalAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 157962

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

lib/CodeGen/SelectionDAG/LegalizeTypes.h

lib/CodeGen/TargetLoweringBase.cpp

lib/Target/ARM/ARMISelLowering.h

lib/Target/ARM/ARMISelLowering.cpp

test/CodeGen/ARM/fp16-promote.ll

test/CodeGen/ARM/fp16-soften.ll

test/CodeGen/ARM/fp16-v3.ll

test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

[ARM] Make FP16 vectors legal
AbandonedPublic