This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.h
2/5
ARMISelLowering.cpp
-
ARMInstrVFP.td
-
test/CodeGen/
-
CodeGen/
-
ARM/
1
fp16-instructions.ll
-
Thumb2/
1/4
float-ops.ll

Differential D63938

[ARM] Stop using scalar FP instructions in integer-only MVE mode.
ClosedPublic

Authored by simon_tatham on Jun 28 2019, 9:34 AM.

Download Raw Diff

Details

Reviewers

dmgreen
ostannard

Commits

rG7b63a9533c7e: [ARM] Stop using scalar FP instructions in integer-only MVE mode.
rL364909: [ARM] Stop using scalar FP instructions in integer-only MVE mode.

Summary

If you compile with -mattr=+mve (enabling integer MVE instructions
but not floating-point ones), then the scalar FP registers exist
and it's legal to move things in and out of them, load and store them,
but it's not legal to do arithmetic on them.

In D60708, the calls to addRegisterClass in ARMISelLowering that
enable use of the scalar FP registers became conditionalised on
Subtarget->hasFPRegs() instead of Subtarget->hasVFP2Base(), so
that loads, stores and moves of those registers would work. But I
didn't realise that that would also enable all the operations on those
types by default.

Now, if the target doesn't have basic VFP, we follow up those
addRegisterClass calls by turning back off all the nontrivial
operations you can perform on f32 and f64. That causes several
knock-on failures, which are fixed by allowing the VMOVDcc and
VMOVScc instructions to be selected even if all you have is
HasFPRegs, and adjusting several checks for 'is this a double in a
single-precision-only world?' to the more general 'is this any FP type
we can't do arithmetic on?'. Between those, the whole of the
float-ops.ll and fp16-instructions.ll tests can now run in
MVE-without-FP mode and generate correct-looking code.

One odd side effect is that I had to relax the check lines in that
test so that they permit test functions like add_f to be generated
as tailcalls to software FP library functions, instead of ordinary
calls. Doing that is entirely legal, but the mystery is why this is
the first RUN line that's needed the relaxation: on the usual kind of
non-FP target, no tailcalls ever seem to be generated. Going by the
llc messages, I think SoftenFloatResult must be perturbing the code
generation in some way, but that's as much as I can guess.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 34144
Build 34143: arc lint + arc unit

Event Timeline

simon_tatham created this revision.Jun 28 2019, 9:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2019, 9:34 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B34069: Diff 207076.Jun 28 2019, 9:38 AM

I still can't say I'm totally convinced by this way of using addRegisterClass for everything and then trying to stop the compiler from ever using them. When would we want to use a floating point load, for example? But I'm not sure enough about what a sensible alternative would be, so lets at least try it and see. This certainly fixes thing. If it does keep causing headaches we can always change it later!

Can you add a test that does some MVE expansion of floating point operations, for floats and halfs. Just a simple "fadd" using +mve should be fine to show things are working.

llvm/lib/Target/ARM/ARMISelLowering.cpp
598–606	Should this be behind hasFPRegs64 instead?
609	How come we don't have to do the same for fullfp16 to work?
llvm/test/CodeGen/ARM/fp16-instructions.ll
5	These should be thumbv8.1m.main-none-eabi
llvm/test/CodeGen/Thumb2/float-ops.ll
268	This is worse than before by the looks of it? We move things into fp registers just to move them out again.

simon_tatham marked 3 inline comments as done.Jul 1 2019, 9:31 AM

simon_tatham added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
598–606	I'm not sure it should, actually. `FPRegs64` is a confusing feature, and quite likely should have been named something different (which is my fault, if so). As far as I can see the only architectural instruction actually conditional on that feature is the scalar FP move instruction that copies a d-register (e.g. `vmov.f64 d0,d1`). It doesn't affect loads and stores of d-regs. So I think it shouldn't affect this decision. On the other hand, I'm pretty sure you're right that I should be separately deciding whether to `setAllExpand` f32 and f64 based on different feature queries.
609	I think because the fp16 legality boundary happens in a different place. With the MVE-integer-only level of support for f32 or f64, you have registers of the right size which you can load and store, but you can't do arithmetic on them. But with the not-full fp16 support, you don't even have a set of registers the right size – there are no loads and stores of HPRs.
llvm/test/CodeGen/Thumb2/float-ops.ll
268	Perhaps that's true, but I'd rather fix the correctness first and get a set of regression tests passing, and then we can worry about recovering the performance with those tests in place to prevent breaking correctness again. Especially since in the general case it's not really clear how you should choose which kind of register to use for a value. Keeping it in an FP register is obviously wasteful in this case, but in another case where register pressure is high, might it save a spill or two? I think getting the right answers in cases larger than this simple one might not be trivial.

Revised patch fixes those two target triples in the tests, and makes the setAllExpand calls for f32 and f64 conditional on different things. Also, to make that less cumbersome, I've moved a few re-enabling setOperationAction calls into setAllExpand itself which otherwise had to be run after every single call.

This version of the patch is intended to apply _before_ D63937 rather than after it, because this way round it turned out easier to get the tests to pass in the intermediate state.

Harbormaster completed remote builds in B34144: Diff 207361.Jul 1 2019, 9:49 AM

simon_tatham mentioned this in D63937: [ARM] MVE: allow soft-float ABI to pass vector types..Jul 1 2019, 9:53 AM

LGTM with one nit.

llvm/lib/Target/ARM/ARMISelLowering.cpp
599	This can still be the same if? To drop a level of indentation.
llvm/test/CodeGen/Thumb2/float-ops.ll
268	My worry is that this will mean every floating point load becomes a vldr, which just ends up being moved into a gpr. This probably isn't a huge deal for performance, as you will likely always be calling a __aeabi_fadd type function, but the codesize would increase quite a bit. I couldn't think of a reason when you _would_ want to load using a vldr (at least it would be fairly uncommon). I imagine that almost every operation would actually be done on a gpr for floats.

This revision is now accepted and ready to land.Jul 2 2019, 3:49 AM

dmgreen added inline comments.Jul 2 2019, 3:50 AM

llvm/test/CodeGen/Thumb2/float-ops.ll
268	Forgot to say. I agree that working is better than not working. Something we may have to adjust in the future though.

Closed by commit rL364909: [ARM] Stop using scalar FP instructions in integer-only MVE mode. (authored by statham). · Explain WhyJul 2 2019, 4:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.h

2 lines

ARMISelLowering.cpp

52 lines

ARMInstrVFP.td

4 lines

test/

CodeGen/

ARM/

fp16-instructions.ll

2 lines

Thumb2/

float-ops.ll

43 lines

Diff 207361

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 788 Lines • ▼ Show 20 Lines	SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SDLoc &dl, SelectionDAG &DAG) const override;		const SDLoc &dl, SelectionDAG &DAG) const override;

bool isUsedByReturnOnly(SDNode *N, SDValue &Chain) const override;		bool isUsedByReturnOnly(SDNode *N, SDValue &Chain) const override;

bool mayBeEmittedAsTailCall(const CallInst *CI) const override;		bool mayBeEmittedAsTailCall(const CallInst *CI) const override;

bool shouldConsiderGEPOffsetSplit() const override { return true; }		bool shouldConsiderGEPOffsetSplit() const override { return true; }

		bool isUnsupportedFloatingType(EVT VT) const;

SDValue getCMOV(const SDLoc &dl, EVT VT, SDValue FalseVal, SDValue TrueVal,		SDValue getCMOV(const SDLoc &dl, EVT VT, SDValue FalseVal, SDValue TrueVal,
SDValue ARMcc, SDValue CCR, SDValue Cmp,		SDValue ARMcc, SDValue CCR, SDValue Cmp,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,		SDValue getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
SDValue &ARMcc, SelectionDAG &DAG, const SDLoc &dl) const;		SDValue &ARMcc, SelectionDAG &DAG, const SDLoc &dl) const;
SDValue getVFPCmp(SDValue LHS, SDValue RHS, SelectionDAG &DAG,		SDValue getVFPCmp(SDValue LHS, SDValue RHS, SelectionDAG &DAG,
const SDLoc &dl, bool InvalidOnQNaN) const;		const SDLoc &dl, bool InvalidOnQNaN) const;
SDValue duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const;		SDValue duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const;
Show All 38 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
void ARMTargetLowering::addQRTypeForNEON(MVT VT) {		void ARMTargetLowering::addQRTypeForNEON(MVT VT) {
addRegisterClass(VT, &ARM::DPairRegClass);		addRegisterClass(VT, &ARM::DPairRegClass);
addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);		addTypeForNEON(VT, MVT::v2f64, MVT::v4i32);
}		}

void ARMTargetLowering::setAllExpand(MVT VT) {		void ARMTargetLowering::setAllExpand(MVT VT) {
for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc)		for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc)
setOperationAction(Opc, VT, Expand);		setOperationAction(Opc, VT, Expand);

		// We support these really simple operations even on types where all
		// the actual arithmetic has to be broken down into simpler
		// operations or turned into library calls.
		setOperationAction(ISD::BITCAST, VT, Legal);
		setOperationAction(ISD::LOAD, VT, Legal);
		setOperationAction(ISD::STORE, VT, Legal);
}		}

void ARMTargetLowering::addAllExtLoads(const MVT From, const MVT To,		void ARMTargetLowering::addAllExtLoads(const MVT From, const MVT To,
LegalizeAction Action) {		LegalizeAction Action) {
setLoadExtAction(ISD::EXTLOAD, From, To, Action);		setLoadExtAction(ISD::EXTLOAD, From, To, Action);
setLoadExtAction(ISD::ZEXTLOAD, From, To, Action);		setLoadExtAction(ISD::ZEXTLOAD, From, To, Action);
setLoadExtAction(ISD::SEXTLOAD, From, To, Action);		setLoadExtAction(ISD::SEXTLOAD, From, To, Action);
}		}
Show All 22 Lines	if (!HasMVEFP)
setAllExpand(VT);		setAllExpand(VT);

// These are legal or custom whether we have MVE.fp or not		// These are legal or custom whether we have MVE.fp or not
setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::BUILD_VECTOR, VT, Custom);		setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
setOperationAction(ISD::SCALAR_TO_VECTOR, VT, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, VT, Legal);
setOperationAction(ISD::BITCAST, VT, Legal);
setOperationAction(ISD::LOAD, VT, Legal);
setOperationAction(ISD::STORE, VT, Legal);

if (HasMVEFP) {		if (HasMVEFP) {
// No native support for these.		// No native support for these.
setOperationAction(ISD::FDIV, VT, Expand);		setOperationAction(ISD::FDIV, VT, Expand);
setOperationAction(ISD::FREM, VT, Expand);		setOperationAction(ISD::FREM, VT, Expand);
setOperationAction(ISD::FSQRT, VT, Expand);		setOperationAction(ISD::FSQRT, VT, Expand);
setOperationAction(ISD::FSIN, VT, Expand);		setOperationAction(ISD::FSIN, VT, Expand);
setOperationAction(ISD::FCOS, VT, Expand);		setOperationAction(ISD::FCOS, VT, Expand);
setOperationAction(ISD::FPOW, VT, Expand);		setOperationAction(ISD::FPOW, VT, Expand);
setOperationAction(ISD::FLOG, VT, Expand);		setOperationAction(ISD::FLOG, VT, Expand);
setOperationAction(ISD::FLOG2, VT, Expand);		setOperationAction(ISD::FLOG2, VT, Expand);
setOperationAction(ISD::FLOG10, VT, Expand);		setOperationAction(ISD::FLOG10, VT, Expand);
setOperationAction(ISD::FEXP, VT, Expand);		setOperationAction(ISD::FEXP, VT, Expand);
setOperationAction(ISD::FEXP2, VT, Expand);		setOperationAction(ISD::FEXP2, VT, Expand);
}		}
}		}

// We 'support' these types up to bitcast/load/store level, regardless of		// We 'support' these types up to bitcast/load/store level, regardless of
// MVE integer-only / float support. Only doing FP data processing on the FP		// MVE integer-only / float support. Only doing FP data processing on the FP
// vector types is inhibited at integer-only level.		// vector types is inhibited at integer-only level.
const MVT LongTypes[] = { MVT::v2i64, MVT::v2f64 };		const MVT LongTypes[] = { MVT::v2i64, MVT::v2f64 };
for (auto VT : LongTypes) {		for (auto VT : LongTypes) {
addRegisterClass(VT, &ARM::QPRRegClass);		addRegisterClass(VT, &ARM::QPRRegClass);
setAllExpand(VT);		setAllExpand(VT);
setOperationAction(ISD::BITCAST, VT, Legal);
setOperationAction(ISD::LOAD, VT, Legal);
setOperationAction(ISD::STORE, VT, Legal);
}		}

// It is legal to extload from v4i8 to v4i16 or v4i32.		// It is legal to extload from v4i8 to v4i16 or v4i32.
addAllExtLoads(MVT::v8i16, MVT::v8i8, Legal);		addAllExtLoads(MVT::v8i16, MVT::v8i8, Legal);
addAllExtLoads(MVT::v4i32, MVT::v4i16, Legal);		addAllExtLoads(MVT::v4i32, MVT::v4i16, Legal);
addAllExtLoads(MVT::v4i32, MVT::v4i8, Legal);		addAllExtLoads(MVT::v4i32, MVT::v4i8, Legal);

// Some truncating stores are legal too.		// Some truncating stores are legal too.
▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	if (Subtarget->isTargetAEABI()) {
}		}
}		}

if (Subtarget->isThumb1Only())		if (Subtarget->isThumb1Only())
addRegisterClass(MVT::i32, &ARM::tGPRRegClass);		addRegisterClass(MVT::i32, &ARM::tGPRRegClass);
else		else
addRegisterClass(MVT::i32, &ARM::GPRRegClass);		addRegisterClass(MVT::i32, &ARM::GPRRegClass);

if (!Subtarget->useSoftFloat() && Subtarget->hasFPRegs() &&		if (!Subtarget->useSoftFloat() && !Subtarget->isThumb1Only()) {
!Subtarget->isThumb1Only()) {		if (Subtarget->hasFPRegs()) {
		dmgreenUnsubmitted Not Done Reply Inline Actions This can still be the same if? To drop a level of indentation. dmgreen: This can still be the same if? To drop a level of indentation.
addRegisterClass(MVT::f32, &ARM::SPRRegClass);		addRegisterClass(MVT::f32, &ARM::SPRRegClass);
addRegisterClass(MVT::f64, &ARM::DPRRegClass);		addRegisterClass(MVT::f64, &ARM::DPRRegClass);
		if (!Subtarget->hasVFP2Base())
		setAllExpand(MVT::f32);
		if (!Subtarget->hasFP64())
		setAllExpand(MVT::f64);
		}
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this be behind hasFPRegs64 instead? dmgreen: Should this be behind hasFPRegs64 instead?
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions I'm not sure it should, actually. `FPRegs64` is a confusing feature, and quite likely should have been named something different (which is my fault, if so). As far as I can see the only architectural instruction actually conditional on that feature is the scalar FP move instruction that copies a d-register (e.g. `vmov.f64 d0,d1`). It doesn't affect loads and stores of d-regs. So I think it shouldn't affect this decision. On the other hand, I'm pretty sure you're right that I should be separately deciding whether to `setAllExpand` f32 and f64 based on different feature queries. simon_tatham: I'm not sure it should, actually. `FPRegs64` is a confusing feature, and quite likely should…
}		}

if (Subtarget->hasFullFP16()) {		if (Subtarget->hasFullFP16()) {
		dmgreenUnsubmitted Not Done Reply Inline Actions How come we don't have to do the same for fullfp16 to work? dmgreen: How come we don't have to do the same for fullfp16 to work?
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions I think because the fp16 legality boundary happens in a different place. With the MVE-integer-only level of support for f32 or f64, you have registers of the right size which you can load and store, but you can't do arithmetic on them. But with the not-full fp16 support, you don't even have a set of registers the right size – there are no loads and stores of HPRs. simon_tatham: I think because the fp16 legality boundary happens in a different place. With the MVE-integer…
addRegisterClass(MVT::f16, &ARM::HPRRegClass);		addRegisterClass(MVT::f16, &ARM::HPRRegClass);
setOperationAction(ISD::BITCAST, MVT::i16, Custom);		setOperationAction(ISD::BITCAST, MVT::i16, Custom);
setOperationAction(ISD::BITCAST, MVT::i32, Custom);		setOperationAction(ISD::BITCAST, MVT::i32, Custom);
setOperationAction(ISD::BITCAST, MVT::f16, Custom);		setOperationAction(ISD::BITCAST, MVT::f16, Custom);

setOperationAction(ISD::FMINNUM, MVT::f16, Legal);		setOperationAction(ISD::FMINNUM, MVT::f16, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f16, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f16, Legal);
}		}
▲ Show 20 Lines • Show All 3,926 Lines • ▼ Show 20 Lines	static bool isLowerSaturatingConditional(const SDValue &Op, SDValue &V,
if (isLowerSaturate(LHS, RHS, TrueVal, FalseVal, CC, *K)) {		if (isLowerSaturate(LHS, RHS, TrueVal, FalseVal, CC, *K)) {
SatK = *K;		SatK = *K;
return true;		return true;
}		}

return false;		return false;
}		}

		bool ARMTargetLowering::isUnsupportedFloatingType(EVT VT) const {
		if (VT == MVT::f32)
		return !Subtarget->hasVFP2Base();
		if (VT == MVT::f64)
		return !Subtarget->hasFP64();
		if (VT == MVT::f16)
		return !Subtarget->hasFullFP16();
		return false;
		}

SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc dl(Op);		SDLoc dl(Op);

// Try to convert two saturating conditional selects into a single SSAT		// Try to convert two saturating conditional selects into a single SSAT
SDValue SatValue;		SDValue SatValue;
uint64_t SatConstant;		uint64_t SatConstant;
bool SatUSat;		bool SatUSat;
Show All 27 Lines	SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
}		}

SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();
SDValue TrueVal = Op.getOperand(2);		SDValue TrueVal = Op.getOperand(2);
SDValue FalseVal = Op.getOperand(3);		SDValue FalseVal = Op.getOperand(3);

if (!Subtarget->hasFP64() && LHS.getValueType() == MVT::f64) {		if (isUnsupportedFloatingType(LHS.getValueType())) {
DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,		DAG.getTargetLoweringInfo().softenSetCCOperands(
dl);		DAG, LHS.getValueType(), LHS, RHS, CC, dl);

// If softenSetCCOperands only returned one value, we should compare it to		// If softenSetCCOperands only returned one value, we should compare it to
// zero.		// zero.
if (!RHS.getNode()) {		if (!RHS.getNode()) {
RHS = DAG.getConstant(0, dl, LHS.getValueType());		RHS = DAG.getConstant(0, dl, LHS.getValueType());
CC = ISD::SETNE;		CC = ISD::SETNE;
}		}
}		}
▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
SDValue LHS = Op.getOperand(2);		SDValue LHS = Op.getOperand(2);
SDValue RHS = Op.getOperand(3);		SDValue RHS = Op.getOperand(3);
SDValue Dest = Op.getOperand(4);		SDValue Dest = Op.getOperand(4);
SDLoc dl(Op);		SDLoc dl(Op);

if (!Subtarget->hasFP64() && LHS.getValueType() == MVT::f64) {		if (isUnsupportedFloatingType(LHS.getValueType())) {
DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,		DAG.getTargetLoweringInfo().softenSetCCOperands(
dl);		DAG, LHS.getValueType(), LHS, RHS, CC, dl);

// If softenSetCCOperands only returned one value, we should compare it to		// If softenSetCCOperands only returned one value, we should compare it to
// zero.		// zero.
if (!RHS.getNode()) {		if (!RHS.getNode()) {
RHS = DAG.getConstant(0, dl, LHS.getValueType());		RHS = DAG.getConstant(0, dl, LHS.getValueType());
CC = ISD::SETNE;		CC = ISD::SETNE;
}		}
}		}
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	static SDValue LowerVectorFP_TO_INT(SDValue Op, SelectionDAG &DAG) {
Op = DAG.getNode(Op.getOpcode(), dl, NewTy, Op.getOperand(0));		Op = DAG.getNode(Op.getOpcode(), dl, NewTy, Op.getOperand(0));
return DAG.getNode(ISD::TRUNCATE, dl, VT, Op);		return DAG.getNode(ISD::TRUNCATE, dl, VT, Op);
}		}

SDValue ARMTargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (VT.isVector())		if (VT.isVector())
return LowerVectorFP_TO_INT(Op, DAG);		return LowerVectorFP_TO_INT(Op, DAG);
if (!Subtarget->hasFP64() && Op.getOperand(0).getValueType() == MVT::f64) {		if (isUnsupportedFloatingType(Op.getOperand(0).getValueType())) {
RTLIB::Libcall LC;		RTLIB::Libcall LC;
if (Op.getOpcode() == ISD::FP_TO_SINT)		if (Op.getOpcode() == ISD::FP_TO_SINT)
LC = RTLIB::getFPTOSINT(Op.getOperand(0).getValueType(),		LC = RTLIB::getFPTOSINT(Op.getOperand(0).getValueType(),
Op.getValueType());		Op.getValueType());
else		else
LC = RTLIB::getFPTOUINT(Op.getOperand(0).getValueType(),		LC = RTLIB::getFPTOUINT(Op.getOperand(0).getValueType(),
Op.getValueType());		Op.getValueType());
return makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(0),		return makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(0),
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	static SDValue LowerVectorINT_TO_FP(SDValue Op, SelectionDAG &DAG) {
Op = DAG.getNode(CastOpc, dl, DestVecType, Op.getOperand(0));		Op = DAG.getNode(CastOpc, dl, DestVecType, Op.getOperand(0));
return DAG.getNode(Opc, dl, VT, Op);		return DAG.getNode(Opc, dl, VT, Op);
}		}

SDValue ARMTargetLowering::LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (VT.isVector())		if (VT.isVector())
return LowerVectorINT_TO_FP(Op, DAG);		return LowerVectorINT_TO_FP(Op, DAG);
if (!Subtarget->hasFP64() && Op.getValueType() == MVT::f64) {		if (isUnsupportedFloatingType(VT)) {
RTLIB::Libcall LC;		RTLIB::Libcall LC;
if (Op.getOpcode() == ISD::SINT_TO_FP)		if (Op.getOpcode() == ISD::SINT_TO_FP)
LC = RTLIB::getSINTTOFP(Op.getOperand(0).getValueType(),		LC = RTLIB::getSINTTOFP(Op.getOperand(0).getValueType(),
Op.getValueType());		Op.getValueType());
else		else
LC = RTLIB::getUINTTOFP(Op.getOperand(0).getValueType(),		LC = RTLIB::getUINTTOFP(Op.getOperand(0).getValueType(),
Op.getValueType());		Op.getValueType());
return makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(0),		return makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(0),
▲ Show 20 Lines • Show All 10,666 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrVFP.td

	Show First 20 Lines • Show All 2,263 Lines • ▼ Show 20 Lines
	// FP Conditional moves.			// FP Conditional moves.
	//			//

	let hasSideEffects = 0 in {			let hasSideEffects = 0 in {
	def VMOVDcc : PseudoInst<(outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm, cmovpred:$p),			def VMOVDcc : PseudoInst<(outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm, cmovpred:$p),
	IIC_fpUNA64,			IIC_fpUNA64,
	[(set (f64 DPR:$Dd),			[(set (f64 DPR:$Dd),
	(ARMcmov DPR:$Dn, DPR:$Dm, cmovpred:$p))]>,			(ARMcmov DPR:$Dn, DPR:$Dm, cmovpred:$p))]>,
	RegConstraint<"$Dn = $Dd">, Requires<[HasVFP2,HasDPVFP]>;			RegConstraint<"$Dn = $Dd">, Requires<[HasFPRegs64]>;

	def VMOVScc : PseudoInst<(outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm, cmovpred:$p),			def VMOVScc : PseudoInst<(outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm, cmovpred:$p),
	IIC_fpUNA32,			IIC_fpUNA32,
	[(set (f32 SPR:$Sd),			[(set (f32 SPR:$Sd),
	(ARMcmov SPR:$Sn, SPR:$Sm, cmovpred:$p))]>,			(ARMcmov SPR:$Sn, SPR:$Sm, cmovpred:$p))]>,
	RegConstraint<"$Sn = $Sd">, Requires<[HasVFP2]>;			RegConstraint<"$Sn = $Sd">, Requires<[HasFPRegs]>;
	} // hasSideEffects			} // hasSideEffects

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Move from VFP System Register to ARM core register.			// Move from VFP System Register to ARM core register.
	//			//

	class MovFromVFP<bits<4> opc19_16, dag oops, dag iops, string opc, string asm,			class MovFromVFP<bits<4> opc19_16, dag oops, dag iops, string opc, string asm,
	list<dag> pattern>:			list<dag> pattern>:
	▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/fp16-instructions.ll

	; SOFT:			; SOFT:
	; RUN: llc < %s -mtriple=arm-none-eabi -float-abi=soft \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFT			; RUN: llc < %s -mtriple=arm-none-eabi -float-abi=soft \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFT
	; RUN: llc < %s -mtriple=thumb-none-eabi -float-abi=soft \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFT			; RUN: llc < %s -mtriple=thumb-none-eabi -float-abi=soft \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFT
				; RUN: llc < %s -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFT
				; RUN: llc < %s -mtriple=thumbv8.1m.main-none-eabi -float-abi=soft -mattr=+mve \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFT
				dmgreenUnsubmitted Not Done Reply Inline Actions These should be thumbv8.1m.main-none-eabi dmgreen: These should be thumbv8.1m.main-none-eabi

	; SOFTFP:			; SOFTFP:
	; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+vfp3 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-VFP3			; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+vfp3 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-VFP3
	; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+vfp4 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-FP16,CHECK-SOFTFP-FP16-A32			; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+vfp4 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-FP16,CHECK-SOFTFP-FP16-A32
	; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+fullfp16,+fp64 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-FULLFP16			; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+fullfp16,+fp64 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-FULLFP16

	; RUN: llc < %s -mtriple=thumbv7-none-eabi -mattr=+vfp3 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-VFP3			; RUN: llc < %s -mtriple=thumbv7-none-eabi -mattr=+vfp3 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-VFP3
	; RUN: llc < %s -mtriple=thumbv7-none-eabi -mattr=+vfp4 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-FP16,CHECK-SOFTFP-FP16-T32			; RUN: llc < %s -mtriple=thumbv7-none-eabi -mattr=+vfp4 \| FileCheck %s --check-prefixes=CHECK,CHECK-SOFTFP-FP16,CHECK-SOFTFP-FP16-T32
	▲ Show 20 Lines • Show All 1,036 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/float-ops.ll

	; RUN: llc < %s -mtriple=thumbv7-none-eabi -mcpu=cortex-m3 \| FileCheck %s -check-prefix=CHECK -check-prefix=NONE			; RUN: llc < %s -mtriple=thumbv7-none-eabi -mcpu=cortex-m3 \| FileCheck %s -check-prefix=CHECK -check-prefix=NONE -check-prefix=NOREGS
	; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m4 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=SP -check-prefix=VFP4-ALL			; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m4 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=SP -check-prefix=VFP4-ALL
	; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m7 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP -check-prefix=FP-ARMv8			; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m7 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP -check-prefix=FP-ARMv8
	; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-a8 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP -check-prefix=VFP4-ALL -check-prefix=VFP4-DP			; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-a8 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP -check-prefix=VFP4-ALL -check-prefix=VFP4-DP
				; RUN: llc < %s -mtriple=thumbv8.1m.main-none-eabihf -mattr=+mve \| FileCheck %s -check-prefix=CHECK -check-prefix=NONE -check-prefix=ONLYREGS

	define float @add_f(float %a, float %b) {			define float @add_f(float %a, float %b) {
	entry:			entry:
	; CHECK-LABEL: add_f:			; CHECK-LABEL: add_f:
	; NONE: bl __aeabi_fadd			; NONE: {{b\|bl}} __aeabi_fadd
	; HARD: vadd.f32 s0, s0, s1			; HARD: vadd.f32 s0, s0, s1
	%0 = fadd float %a, %b			%0 = fadd float %a, %b
	ret float %0			ret float %0
	}			}

	define double @add_d(double %a, double %b) {			define double @add_d(double %a, double %b) {
	entry:			entry:
	; CHECK-LABEL: add_d:			; CHECK-LABEL: add_d:
	; NONE: bl __aeabi_dadd			; NONE: {{b\|bl}} __aeabi_dadd
	; SP: bl __aeabi_dadd			; SP: {{b\|bl}} __aeabi_dadd
	; DP: vadd.f64 d0, d0, d1			; DP: vadd.f64 d0, d0, d1
	%0 = fadd double %a, %b			%0 = fadd double %a, %b
	ret double %0			ret double %0
	}			}

	define float @sub_f(float %a, float %b) {			define float @sub_f(float %a, float %b) {
	entry:			entry:
	; CHECK-LABEL: sub_f:			; CHECK-LABEL: sub_f:
	; NONE: bl __aeabi_fsub			; NONE: {{b\|bl}} __aeabi_fsub
	; HARD: vsub.f32 s			; HARD: vsub.f32 s
	%0 = fsub float %a, %b			%0 = fsub float %a, %b
	ret float %0			ret float %0
	}			}

	define double @sub_d(double %a, double %b) {			define double @sub_d(double %a, double %b) {
	entry:			entry:
	; CHECK-LABEL: sub_d:			; CHECK-LABEL: sub_d:
	; NONE: bl __aeabi_dsub			; NONE: {{b\|bl}} __aeabi_dsub
	; SP: bl __aeabi_dsub			; SP: {{b\|bl}} __aeabi_dsub
	; DP: vsub.f64 d0, d0, d1			; DP: vsub.f64 d0, d0, d1
	%0 = fsub double %a, %b			%0 = fsub double %a, %b
	ret double %0			ret double %0
	}			}

	define float @mul_f(float %a, float %b) {			define float @mul_f(float %a, float %b) {
	entry:			entry:
	; CHECK-LABEL: mul_f:			; CHECK-LABEL: mul_f:
	; NONE: bl __aeabi_fmul			; NONE: {{b\|bl}} __aeabi_fmul
	; HARD: vmul.f32 s			; HARD: vmul.f32 s
	%0 = fmul float %a, %b			%0 = fmul float %a, %b
	ret float %0			ret float %0
	}			}

	define double @mul_d(double %a, double %b) {			define double @mul_d(double %a, double %b) {
	entry:			entry:
	; CHECK-LABEL: mul_d:			; CHECK-LABEL: mul_d:
	; NONE: bl __aeabi_dmul			; NONE: {{b\|bl}} __aeabi_dmul
	; SP: bl __aeabi_dmul			; SP: {{b\|bl}} __aeabi_dmul
	; DP: vmul.f64 d0, d0, d1			; DP: vmul.f64 d0, d0, d1
	%0 = fmul double %a, %b			%0 = fmul double %a, %b
	ret double %0			ret double %0
	}			}

	define float @div_f(float %a, float %b) {			define float @div_f(float %a, float %b) {
	entry:			entry:
	; CHECK-LABEL: div_f:			; CHECK-LABEL: div_f:
	; NONE: bl __aeabi_fdiv			; NONE: {{b\|bl}} __aeabi_fdiv
	; HARD: vdiv.f32 s			; HARD: vdiv.f32 s
	%0 = fdiv float %a, %b			%0 = fdiv float %a, %b
	ret float %0			ret float %0
	}			}

	define double @div_d(double %a, double %b) {			define double @div_d(double %a, double %b) {
	entry:			entry:
	; CHECK-LABEL: div_d:			; CHECK-LABEL: div_d:
	; NONE: bl __aeabi_ddiv			; NONE: {{b\|bl}} __aeabi_ddiv
	; SP: bl __aeabi_ddiv			; SP: {{b\|bl}} __aeabi_ddiv
	; DP: vdiv.f64 d0, d0, d1			; DP: vdiv.f64 d0, d0, d1
	%0 = fdiv double %a, %b			%0 = fdiv double %a, %b
	ret double %0			ret double %0
	}			}

	define float @rem_f(float %a, float %b) {			define float @rem_f(float %a, float %b) {
	entry:			entry:
	; CHECK-LABEL: rem_f:			; CHECK-LABEL: rem_f:
	Show All 19 Lines
	; HARD: vldr s0, [r0]			; HARD: vldr s0, [r0]
	%0 = load float, float* %a, align 4			%0 = load float, float* %a, align 4
	ret float %0			ret float %0
	}			}

	define double @load_d(double* %a) {			define double @load_d(double* %a) {
	entry:			entry:
	; CHECK-LABEL: load_d:			; CHECK-LABEL: load_d:
	; NONE: ldm r0, {r0, r1}			; NOREGS: ldm r0, {r0, r1}
				; ONLYREGS: vldr d0, [r0]
	; HARD: vldr d0, [r0]			; HARD: vldr d0, [r0]
	%0 = load double, double* %a, align 8			%0 = load double, double* %a, align 8
	ret double %0			ret double %0
	}			}

	define void @store_f(float* %a, float %b) {			define void @store_f(float* %a, float %b) {
	entry:			entry:
	; CHECK-LABEL: store_f:			; CHECK-LABEL: store_f:
	; NONE: str r1, [r0]			; NONE: str r1, [r0]
	; HARD: vstr s0, [r0]			; HARD: vstr s0, [r0]
	store float %b, float* %a, align 4			store float %b, float* %a, align 4
	ret void			ret void
	}			}

	define void @store_d(double* %a, double %b) {			define void @store_d(double* %a, double %b) {
	entry:			entry:
	; CHECK-LABEL: store_d:			; CHECK-LABEL: store_d:
	; NONE: strd r2, r3, [r0]			; NOREGS: strd r2, r3, [r0]
				; ONLYREGS: vstr d0, [r0]
	; HARD: vstr d0, [r0]			; HARD: vstr d0, [r0]
	store double %b, double* %a, align 8			store double %b, double* %a, align 8
	ret void			ret void
	}			}

	define double @f_to_d(float %a) {			define double @f_to_d(float %a) {
	; CHECK-LABEL: f_to_d:			; CHECK-LABEL: f_to_d:
	; NONE: bl __aeabi_f2d			; NONE: bl __aeabi_f2d
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; NONE-NOT: mov			; NONE-NOT: mov
	; HARD: vmov r0, r1, d0			; HARD: vmov r0, r1, d0
	%1 = bitcast double %a to i64			%1 = bitcast double %a to i64
	ret i64 %1			ret i64 %1
	}			}

	define float @select_f(float %a, float %b, i1 %c) {			define float @select_f(float %a, float %b, i1 %c) {
	; CHECK-LABEL: select_f:			; CHECK-LABEL: select_f:
	; NONE: lsls r2, r2, #31			; NOREGS: lsls r2, r2, #31
	; NONE: moveq r0, r1			; NOREGS: moveq r0, r1
				; ONLYREGS: lsls r2, r2, #31
				; ONLYREGS: vmovne.f32 s2, s0
				dmgreenUnsubmitted Not Done Reply Inline Actions This is worse than before by the looks of it? We move things into fp registers just to move them out again. dmgreen: This is worse than before by the looks of it? We move things into fp registers just to move…
				simon_tathamAuthorUnsubmitted Done Reply Inline Actions Perhaps that's true, but I'd rather fix the correctness first and get a set of regression tests passing, and then we can worry about recovering the performance with those tests in place to prevent breaking correctness again. Especially since in the general case it's not really clear how you should choose which kind of register to use for a value. Keeping it in an FP register is obviously wasteful in this case, but in another case where register pressure is high, might it save a spill or two? I think getting the right answers in cases larger than this simple one might not be trivial. simon_tatham: Perhaps that's true, but I'd rather fix the correctness first and get a set of regression tests…
				dmgreenUnsubmitted Not Done Reply Inline Actions My worry is that this will mean every floating point load becomes a vldr, which just ends up being moved into a gpr. This probably isn't a huge deal for performance, as you will likely always be calling a __aeabi_fadd type function, but the codesize would increase quite a bit. I couldn't think of a reason when you _would_ want to load using a vldr (at least it would be fairly uncommon). I imagine that almost every operation would actually be done on a gpr for floats. dmgreen: My worry is that this will mean every floating point load becomes a vldr, which just ends up…
				dmgreenUnsubmitted Not Done Reply Inline Actions Forgot to say. I agree that working is better than not working. Something we may have to adjust in the future though. dmgreen: Forgot to say. I agree that working is better than not working. Something we may have to adjust…
	; HARD: lsls r0, r0, #31			; HARD: lsls r0, r0, #31
	; VFP4-ALL: vmovne.f32 s1, s0			; VFP4-ALL: vmovne.f32 s1, s0
	; VFP4-ALL: vmov.f32 s0, s1			; VFP4-ALL: vmov.f32 s0, s1
	; FP-ARMv8: vseleq.f32 s0, s1, s0			; FP-ARMv8: vseleq.f32 s0, s1, s0
	%1 = select i1 %c, float %a, float %b			%1 = select i1 %c, float %a, float %b
	ret float %1			ret float %1
	}			}

	define double @select_d(double %a, double %b, i1 %c) {			define double @select_d(double %a, double %b, i1 %c) {
	; CHECK-LABEL: select_d:			; CHECK-LABEL: select_d:
	; NONE: ldr{{(.w)?}} [[REG:r[0-9]+]], [sp]			; NONE: ldr{{(.w)?}} [[REG:r[0-9]+]], [sp]
	; NONE ands [[REG]], [[REG]], #1			; NONE ands [[REG]], [[REG]], #1
	; NONE: moveq r0, r2			; NONE-DAG: moveq r0, r2
	; NONE: moveq r1, r3			; NONE-DAG: moveq r1, r3
	; SP: ands r0, r0, #1			; SP: ands r0, r0, #1
	; SP-DAG: vmov [[ALO:r[0-9]+]], [[AHI:r[0-9]+]], d0			; SP-DAG: vmov [[ALO:r[0-9]+]], [[AHI:r[0-9]+]], d0
	; SP-DAG: vmov [[BLO:r[0-9]+]], [[BHI:r[0-9]+]], d1			; SP-DAG: vmov [[BLO:r[0-9]+]], [[BHI:r[0-9]+]], d1
	; SP: itt ne			; SP: itt ne
	; SP-DAG: movne [[BLO]], [[ALO]]			; SP-DAG: movne [[BLO]], [[ALO]]
	; SP-DAG: movne [[BHI]], [[AHI]]			; SP-DAG: movne [[BHI]], [[AHI]]
	; SP: vmov d0, [[BLO]], [[BHI]]			; SP: vmov d0, [[BLO]], [[BHI]]
	; DP: lsls r0, r0, #31			; DP: lsls r0, r0, #31
	; VFP4-DP: vmovne.f64 d1, d0			; VFP4-DP: vmovne.f64 d1, d0
	; VFP4-DP: vmov.f64 d0, d1			; VFP4-DP: vmov.f64 d0, d1
	; FP-ARMV8: vseleq.f64 d0, d1, d0			; FP-ARMV8: vseleq.f64 d0, d1, d0
	%1 = select i1 %c, double %a, double %b			%1 = select i1 %c, double %a, double %b
	ret double %1			ret double %1
	}			}