This is an archive of the discontinued LLVM Phabricator instance.

ARM: Enable DP copy, load and store instructions for FPv4-SP
ClosedPublic

Authored by olista01 on Aug 14 2014, 8:53 AM.

Download Raw Diff

Details

Reviewers

rengolin
t.p.northover

Summary

The FPv4-SP floating-point unit is generally reffered to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers.

This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM.

Diff Detail

Event Timeline

olista01 updated this revision to Diff 12511.Aug 14 2014, 8:53 AM

olista01 retitled this revision from to ARM: Enable DP copy, load and store instructions for FPv4-SP.

olista01 updated this object.

olista01 edited the test plan for this revision. (Show Details)

olista01 set the repository for this revision to rL LLVM.

olista01 added a subscriber: Unknown Object (MLST).

Herald added subscribers: mroth, aemerson. · View Herald TranscriptAug 14 2014, 8:53 AM

Hi Oliver,

This seems to be lots of separate patches lumped together. I started enumerating them but lost count (ABI, VMOV, CMOV, Predicates, at least 1 libcall issue, ...). Please could you split them up?

Thanks.

Tim.

Hi Oliver,

Overall a good patch. I have a few comments inline and I also agree with Tim that this could actually be lots of smaller patches.

Feel free to address the comments here, but apply them (where applicable) to the individual patches later on.

cheers,
--renato

lib/Target/ARM/ARMCallingConv.h
238	Wild guess here, but shouldn't this be something like std::min(Size, 8)?
lib/Target/ARM/ARMISelLowering.cpp
3311	Is this the only such assert, or just the one that you hit during tests?
3535	Would this ever be called by v2f64 types?
8636	This looks like left over.
test/CodeGen/Thumb2/float-cmp.ll
4	You could have used the same HARD+SP/DP you used in the test above to simplify this test.

I could split the patch up, but it would not be testable until the last patch, which would remove the condition at about lib/Target/ARM/ARMISelLowering.cpp:411, which makes f64 a legal type. If I committed this patch first, this would introduce regressions until the last patch is applied. Is this still worth doing?

Now looking again, I also can't see how you could separate much, since it all depends on the new flag.

I'd be ok with having it as one patch.

cheers,
--renato

lib/Target/ARM/ARMCallingConv.h
182	Isn't this an independent fix?

olista01 added inline comments.Aug 21 2014, 1:45 AM

lib/Target/ARM/ARMCallingConv.h
182	This is part of the revert of http://reviews.llvm.org/rL209650. r209650 used this mechanism to pass a double as 2 i32s, meaning that an HFA of 4 doubles would have 8 members. Doubles will now show up here as f64s, so the assertion can be tightened up again.
lib/Target/ARM/ARMISelLowering.cpp
3311	The only other legal floating-point type for FPv4-SP is f32, which can be handled by this function. Vector types are only legal for NEON.
3535	No, because v2f64 is not a legal type when isFPOnlySP, and this is only called while legalizing operations, which happens after legalizing types.

olista01 updated this revision to Diff 12750.Aug 21 2014, 1:45 AM

rengolin added inline comments.Aug 21 2014, 2:04 AM

test/CodeGen/ARM/aapcs-hfa-code.ll
79	A bit fragile this change, no? Shouldn't this also be a DAG check?
test/CodeGen/Thumb2/float-ops.ll
4	This one still has some check redundancy. Sorry for being picky, but I fear people will later change checks without properly making sure it makes sense for both DP and SP.

rengolin added reviewers: t.p.northover, rengolin.Aug 21 2014, 2:05 AM

rengolin removed subscribers: rengolin, t.p.northover.

olista01 added inline comments.Aug 21 2014, 3:07 AM

test/CodeGen/ARM/aapcs-hfa-code.ll
79	I'm not sure how this can be done. If I was to make the movs lines DAG checks, they could match in either order, but ONEHI and ONELO would be swapped, so the movt and strd lines would fail to match. Making any more checks DAGs would cause this test to match even if the registers were in the wrong order (swapping ONEHI and ONELO in the strd is a plausible failure mode).

rengolin added inline comments.Aug 21 2014, 4:22 AM

test/CodeGen/ARM/aapcs-hfa-code.ll
79	That's a good point. I wanted to add a CHECK-OR so that all sequential CHECKs could be matched and you'd be able to do: ; CHECK-M4F: strd [[ONELO]], [[ONEHI]], [sp] ; CHECK-M4F-OR: strd [[ONEHI]], [[ONELO]], [sp] But that's for another commit. :)

Reduce redundancy of a test

Thanks! LGTM.

--renato

A common way to get around those issues is to add a temporary option, -arm-enable-dp-sp or whatever, which is off by default while the code is still buggy but can be enabled in tests.

It seems slightly excessive in this case if we're just going to remove it again a few days later, but it would definitely make revision control simpler if anyone does ever need to "git blame" the situation. And it'd make each individual patch easier to review and pair up with its tests (though I'm going to be away for the next week anyway, so if Renato doesn't mind...).

Cheers.

Tim.

Yeah, I think it's a bit too much in this case, as the patch is not *that* big and does follow a similar line. I think the nightmare of reverting multiple patches would be worse if we split.

cheers,
--renato

This revision is now accepted and ready to land.Aug 21 2014, 4:57 AM

I'm OK with that in this case, but I think in future we should aim for incremental development *on trunk*. This situation didn't have to come up in the first place.

Cheers.

Tim.

Thanks, committed revision 216172.

olista01 closed this revision.Aug 21 2014, 6:02 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

26 lines

SelectionDAGBuilder.cpp

19 lines

Target/

ARM/

9 lines

17 lines

7 lines

179 lines

2 lines

test/

CodeGen/

ARM/

aapcs-hfa-code.ll

19 lines

darwin-eabi.ll

2 lines

Thumb2/

aapcs.ll

50 lines

cortex-fp.ll

2 lines

float-cmp.ll

300 lines

float-intrinsics-double.ll

214 lines

float-intrinsics-float.ll

210 lines

float-ops.ll

290 lines

Diff 12756

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 3,511 Lines • ▼ Show 20 Lines	Results.push_back(ExpandFPLibCall(Node, RTLIB::REM_F32, RTLIB::REM_F64,
RTLIB::REM_F80, RTLIB::REM_F128,		RTLIB::REM_F80, RTLIB::REM_F128,
RTLIB::REM_PPCF128));		RTLIB::REM_PPCF128));
break;		break;
case ISD::FMA:		case ISD::FMA:
Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,		Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,
RTLIB::FMA_F80, RTLIB::FMA_F128,		RTLIB::FMA_F80, RTLIB::FMA_F128,
RTLIB::FMA_PPCF128));		RTLIB::FMA_PPCF128));
break;		break;
		case ISD::FADD:
		Results.push_back(ExpandFPLibCall(Node, RTLIB::ADD_F32, RTLIB::ADD_F64,
		RTLIB::ADD_F80, RTLIB::ADD_F128,
		RTLIB::ADD_PPCF128));
		break;
		case ISD::FMUL:
		Results.push_back(ExpandFPLibCall(Node, RTLIB::MUL_F32, RTLIB::MUL_F64,
		RTLIB::MUL_F80, RTLIB::MUL_F128,
		RTLIB::MUL_PPCF128));
		break;
case ISD::FP16_TO_FP: {		case ISD::FP16_TO_FP: {
if (Node->getValueType(0) == MVT::f32) {		if (Node->getValueType(0) == MVT::f32) {
Results.push_back(ExpandLibCall(RTLIB::FPEXT_F16_F32, Node, false));		Results.push_back(ExpandLibCall(RTLIB::FPEXT_F16_F32, Node, false));
break;		break;
}		}

// We can extend to types bigger than f32 in two steps without changing the		// We can extend to types bigger than f32 in two steps without changing the
// result. Since "f16 -> f32" is much more commonly available, give CodeGen		// result. Since "f16 -> f32" is much more commonly available, give CodeGen
Show All 16 Lines	case ISD::ConstantFP: {
// Check to see if this FP immediate is already legal.		// Check to see if this FP immediate is already legal.
// If this is a legal constant, turn it into a TargetConstantFP node.		// If this is a legal constant, turn it into a TargetConstantFP node.
if (!TLI.isFPImmLegal(CFP->getValueAPF(), Node->getValueType(0)))		if (!TLI.isFPImmLegal(CFP->getValueAPF(), Node->getValueType(0)))
Results.push_back(ExpandConstantFP(CFP, true));		Results.push_back(ExpandConstantFP(CFP, true));
break;		break;
}		}
case ISD::FSUB: {		case ISD::FSUB: {
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
assert(TLI.isOperationLegalOrCustom(ISD::FADD, VT) &&		if (TLI.isOperationLegalOrCustom(ISD::FADD, VT) &&
TLI.isOperationLegalOrCustom(ISD::FNEG, VT) &&		TLI.isOperationLegalOrCustom(ISD::FNEG, VT)) {
"Don't know how to expand this FP subtraction!");
Tmp1 = DAG.getNode(ISD::FNEG, dl, VT, Node->getOperand(1));		Tmp1 = DAG.getNode(ISD::FNEG, dl, VT, Node->getOperand(1));
Tmp1 = DAG.getNode(ISD::FADD, dl, VT, Node->getOperand(0), Tmp1);		Tmp1 = DAG.getNode(ISD::FADD, dl, VT, Node->getOperand(0), Tmp1);
Results.push_back(Tmp1);		Results.push_back(Tmp1);
		} else {
		Results.push_back(ExpandFPLibCall(Node, RTLIB::SUB_F32, RTLIB::SUB_F64,
		RTLIB::SUB_F80, RTLIB::SUB_F128,
		RTLIB::SUB_PPCF128));
		}
break;		break;
}		}
case ISD::SUB: {		case ISD::SUB: {
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
assert(TLI.isOperationLegalOrCustom(ISD::ADD, VT) &&		assert(TLI.isOperationLegalOrCustom(ISD::ADD, VT) &&
TLI.isOperationLegalOrCustom(ISD::XOR, VT) &&		TLI.isOperationLegalOrCustom(ISD::XOR, VT) &&
"Don't know how to expand this subtraction!");		"Don't know how to expand this subtraction!");
Tmp1 = DAG.getNode(ISD::XOR, dl, VT, Node->getOperand(1),		Tmp1 = DAG.getNode(ISD::XOR, dl, VT, Node->getOperand(1),
▲ Show 20 Lines • Show All 786 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,259 Lines • ▼ Show 20 Lines	for (unsigned Value = 0, NumValues = ValueVTs.size(); Value != NumValues;
if (Args[i].Alignment)		if (Args[i].Alignment)
FrameAlign = Args[i].Alignment;		FrameAlign = Args[i].Alignment;
else		else
FrameAlign = getByValTypeAlignment(ElementTy);		FrameAlign = getByValTypeAlignment(ElementTy);
Flags.setByValAlign(FrameAlign);		Flags.setByValAlign(FrameAlign);
}		}
if (Args[i].isNest)		if (Args[i].isNest)
Flags.setNest();		Flags.setNest();
if (NeedsRegBlock)		if (NeedsRegBlock) {
Flags.setInConsecutiveRegs();		Flags.setInConsecutiveRegs();
		if (Value == NumValues - 1)
		Flags.setInConsecutiveRegsLast();
		}
Flags.setOrigAlign(OriginalAlignment);		Flags.setOrigAlign(OriginalAlignment);

MVT PartVT = getRegisterType(CLI.RetTy->getContext(), VT);		MVT PartVT = getRegisterType(CLI.RetTy->getContext(), VT);
unsigned NumParts = getNumRegisters(CLI.RetTy->getContext(), VT);		unsigned NumParts = getNumRegisters(CLI.RetTy->getContext(), VT);
SmallVector<SDValue, 4> Parts(NumParts);		SmallVector<SDValue, 4> Parts(NumParts);
ISD::NodeType ExtendKind = ISD::ANY_EXTEND;		ISD::NodeType ExtendKind = ISD::ANY_EXTEND;

if (Args[i].isSExt)		if (Args[i].isSExt)
Show All 29 Lines	for (unsigned Value = 0, NumValues = ValueVTs.size(); Value != NumValues;
ISD::OutputArg MyFlags(Flags, Parts[j].getValueType(), VT,		ISD::OutputArg MyFlags(Flags, Parts[j].getValueType(), VT,
i < CLI.NumFixedArgs,		i < CLI.NumFixedArgs,
i, j*Parts[j].getValueType().getStoreSize());		i, j*Parts[j].getValueType().getStoreSize());
if (NumParts > 1 && j == 0)		if (NumParts > 1 && j == 0)
MyFlags.Flags.setSplit();		MyFlags.Flags.setSplit();
else if (j != 0)		else if (j != 0)
MyFlags.Flags.setOrigAlign(1);		MyFlags.Flags.setOrigAlign(1);

// Only mark the end at the last register of the last value.
if (NeedsRegBlock && Value == NumValues - 1 && j == NumParts - 1)
MyFlags.Flags.setInConsecutiveRegsLast();

CLI.Outs.push_back(MyFlags);		CLI.Outs.push_back(MyFlags);
CLI.OutVals.push_back(Parts[j]);		CLI.OutVals.push_back(Parts[j]);
}		}
}		}
}		}

SmallVector<SDValue, 4> InVals;		SmallVector<SDValue, 4> InVals;
CLI.Chain = LowerCall(CLI, InVals);		CLI.Chain = LowerCall(CLI, InVals);
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	for (unsigned Value = 0, NumValues = ValueVTs.size();
if (F.getParamAlignment(Idx))		if (F.getParamAlignment(Idx))
FrameAlign = F.getParamAlignment(Idx);		FrameAlign = F.getParamAlignment(Idx);
else		else
FrameAlign = TLI->getByValTypeAlignment(ElementTy);		FrameAlign = TLI->getByValTypeAlignment(ElementTy);
Flags.setByValAlign(FrameAlign);		Flags.setByValAlign(FrameAlign);
}		}
if (F.getAttributes().hasAttribute(Idx, Attribute::Nest))		if (F.getAttributes().hasAttribute(Idx, Attribute::Nest))
Flags.setNest();		Flags.setNest();
if (NeedsRegBlock)		if (NeedsRegBlock) {
Flags.setInConsecutiveRegs();		Flags.setInConsecutiveRegs();
		if (Value == NumValues - 1)
		Flags.setInConsecutiveRegsLast();
		}
Flags.setOrigAlign(OriginalAlignment);		Flags.setOrigAlign(OriginalAlignment);

MVT RegisterVT = TLI->getRegisterType(*CurDAG->getContext(), VT);		MVT RegisterVT = TLI->getRegisterType(*CurDAG->getContext(), VT);
unsigned NumRegs = TLI->getNumRegisters(*CurDAG->getContext(), VT);		unsigned NumRegs = TLI->getNumRegisters(*CurDAG->getContext(), VT);
for (unsigned i = 0; i != NumRegs; ++i) {		for (unsigned i = 0; i != NumRegs; ++i) {
ISD::InputArg MyFlags(Flags, RegisterVT, VT, isArgValueUsed,		ISD::InputArg MyFlags(Flags, RegisterVT, VT, isArgValueUsed,
Idx-1, PartBase+i*RegisterVT.getStoreSize());		Idx-1, PartBase+i*RegisterVT.getStoreSize());
if (NumRegs > 1 && i == 0)		if (NumRegs > 1 && i == 0)
MyFlags.Flags.setSplit();		MyFlags.Flags.setSplit();
// if it isn't first piece, alignment must be 1		// if it isn't first piece, alignment must be 1
else if (i > 0)		else if (i > 0)
MyFlags.Flags.setOrigAlign(1);		MyFlags.Flags.setOrigAlign(1);

// Only mark the end at the last register of the last value.
if (NeedsRegBlock && Value == NumValues - 1 && i == NumRegs - 1)
MyFlags.Flags.setInConsecutiveRegsLast();

Ins.push_back(MyFlags);		Ins.push_back(MyFlags);
}		}
PartBase += VT.getStoreSize();		PartBase += VT.getStoreSize();
}		}
}		}

// Call the target to set up the argument values.		// Call the target to set up the argument values.
SmallVector<SDValue, 8> InVals;		SmallVector<SDValue, 8> InVals;
▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

lib/Target/ARM/ARMBaseInstrInfo.cpp

Show First 20 Lines • Show All 715 Lines • ▼ Show 20 Lines	void ARMBaseInstrInfo::copyPhysReg(MachineBasicBlock &MBB,

unsigned Opc = 0;		unsigned Opc = 0;
if (SPRDest && SPRSrc)		if (SPRDest && SPRSrc)
Opc = ARM::VMOVS;		Opc = ARM::VMOVS;
else if (GPRDest && SPRSrc)		else if (GPRDest && SPRSrc)
Opc = ARM::VMOVRS;		Opc = ARM::VMOVRS;
else if (SPRDest && GPRSrc)		else if (SPRDest && GPRSrc)
Opc = ARM::VMOVSR;		Opc = ARM::VMOVSR;
else if (ARM::DPRRegClass.contains(DestReg, SrcReg))		else if (ARM::DPRRegClass.contains(DestReg, SrcReg) && !Subtarget.isFPOnlySP())
Opc = ARM::VMOVD;		Opc = ARM::VMOVD;
else if (ARM::QPRRegClass.contains(DestReg, SrcReg))		else if (ARM::QPRRegClass.contains(DestReg, SrcReg))
Opc = ARM::VORRq;		Opc = ARM::VORRq;

if (Opc) {		if (Opc) {
MachineInstrBuilder MIB = BuildMI(MBB, I, DL, get(Opc), DestReg);		MachineInstrBuilder MIB = BuildMI(MBB, I, DL, get(Opc), DestReg);
MIB.addReg(SrcReg, getKillRegState(KillSrc));		MIB.addReg(SrcReg, getKillRegState(KillSrc));
if (Opc == ARM::VORRq)		if (Opc == ARM::VORRq)
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	// Fall back to VMOVD.
BeginIdx = ARM::dsub_0;		BeginIdx = ARM::dsub_0;
SubRegs = 3;		SubRegs = 3;
Spacing = 2;		Spacing = 2;
} else if (ARM::DQuadSpcRegClass.contains(DestReg, SrcReg)) {		} else if (ARM::DQuadSpcRegClass.contains(DestReg, SrcReg)) {
Opc = ARM::VMOVD;		Opc = ARM::VMOVD;
BeginIdx = ARM::dsub_0;		BeginIdx = ARM::dsub_0;
SubRegs = 4;		SubRegs = 4;
Spacing = 2;		Spacing = 2;
		} else if (ARM::DPRRegClass.contains(DestReg, SrcReg) && Subtarget.isFPOnlySP()) {
		Opc = ARM::VMOVS;
		BeginIdx = ARM::ssub_0;
		SubRegs = 2;
}		}

assert(Opc && "Impossible reg-to-reg copy");		assert(Opc && "Impossible reg-to-reg copy");

const TargetRegisterInfo *TRI = &getRegisterInfo();		const TargetRegisterInfo *TRI = &getRegisterInfo();
MachineInstrBuilder Mov;		MachineInstrBuilder Mov;

// Copy register tuples backward when the first Dest reg overlaps with SrcReg.		// Copy register tuples backward when the first Dest reg overlaps with SrcReg.
▲ Show 20 Lines • Show All 434 Lines • ▼ Show 20 Lines	if (MI->getOpcode() == TargetOpcode::LOAD_STACK_GUARD) {
MI->getParent()->erase(MI);		MI->getParent()->erase(MI);
return true;		return true;
}		}

// This hook gets to expand COPY instructions before they become		// This hook gets to expand COPY instructions before they become
// copyPhysReg() calls. Look for VMOVS instructions that can legally be		// copyPhysReg() calls. Look for VMOVS instructions that can legally be
// widened to VMOVD. We prefer the VMOVD when possible because it may be		// widened to VMOVD. We prefer the VMOVD when possible because it may be
// changed into a VORR that can go down the NEON pipeline.		// changed into a VORR that can go down the NEON pipeline.
if (!WidenVMOVS \|\| !MI->isCopy() \|\| Subtarget.isCortexA15())		if (!WidenVMOVS \|\| !MI->isCopy() \|\| Subtarget.isCortexA15() \|\|
		Subtarget.isFPOnlySP())
return false;		return false;

// Look for a copy between even S-registers. That is where we keep floats		// Look for a copy between even S-registers. That is where we keep floats
// when using NEON v2f32 instructions for f32 arithmetic.		// when using NEON v2f32 instructions for f32 arithmetic.
unsigned DstRegS = MI->getOperand(0).getReg();		unsigned DstRegS = MI->getOperand(0).getReg();
unsigned SrcRegS = MI->getOperand(1).getReg();		unsigned SrcRegS = MI->getOperand(1).getReg();
if (!ARM::SPRRegClass.contains(DstRegS, SrcRegS))		if (!ARM::SPRRegClass.contains(DstRegS, SrcRegS))
return false;		return false;
▲ Show 20 Lines • Show All 3,245 Lines • Show Last 20 Lines

lib/Target/ARM/ARMCallingConv.h

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
// has InConsecutiveRegs set, and that the last member also has		// has InConsecutiveRegs set, and that the last member also has
// InConsecutiveRegsLast set. We must process all members of the HA before		// InConsecutiveRegsLast set. We must process all members of the HA before
// we can allocate it, as we need to know the total number of registers that		// we can allocate it, as we need to know the total number of registers that
// will be needed in order to (attempt to) allocate a contiguous block.		// will be needed in order to (attempt to) allocate a contiguous block.
static bool CC_ARM_AAPCS_Custom_HA(unsigned &ValNo, MVT &ValVT, MVT &LocVT,		static bool CC_ARM_AAPCS_Custom_HA(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
CCValAssign::LocInfo &LocInfo,		CCValAssign::LocInfo &LocInfo,
ISD::ArgFlagsTy &ArgFlags, CCState &State) {		ISD::ArgFlagsTy &ArgFlags, CCState &State) {
SmallVectorImpl<CCValAssign> &PendingHAMembers = State.getPendingLocs();		SmallVectorImpl<CCValAssign> &PendingHAMembers = State.getPendingLocs();

// AAPCS HFAs must have 1-4 elements, all of the same type		// AAPCS HFAs must have 1-4 elements, all of the same type
assert(PendingHAMembers.size() < 8);		assert(PendingHAMembers.size() < 4);
		rengolinUnsubmitted Not Done Reply Inline Actions Isn't this an independent fix? rengolin: Isn't this an independent fix?
		olista01AuthorUnsubmitted Not Done Reply Inline Actions This is part of the revert of http://reviews.llvm.org/rL209650. r209650 used this mechanism to pass a double as 2 i32s, meaning that an HFA of 4 doubles would have 8 members. Doubles will now show up here as f64s, so the assertion can be tightened up again. olista01: This is part of the revert of http://reviews.llvm.org/rL209650. r209650 used this mechanism to…
if (PendingHAMembers.size() > 0)		if (PendingHAMembers.size() > 0)
assert(PendingHAMembers[0].getLocVT() == LocVT);		assert(PendingHAMembers[0].getLocVT() == LocVT);

// Add the argument to the list to be allocated once we know the size of the		// Add the argument to the list to be allocated once we know the size of the
// HA		// HA
PendingHAMembers.push_back(		PendingHAMembers.push_back(
CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));		CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));

if (ArgFlags.isInConsecutiveRegsLast()) {		if (ArgFlags.isInConsecutiveRegsLast()) {
assert(PendingHAMembers.size() > 0 && PendingHAMembers.size() <= 8 &&		assert(PendingHAMembers.size() > 0 && PendingHAMembers.size() <= 4 &&
"Homogeneous aggregates must have between 1 and 4 members");		"Homogeneous aggregates must have between 1 and 4 members");

// Try to allocate a contiguous block of registers, each of the correct		// Try to allocate a contiguous block of registers, each of the correct
// size to hold one member.		// size to hold one member.
const uint16_t *RegList;		const uint16_t *RegList;
unsigned NumRegs;		unsigned NumRegs;
switch (LocVT.SimpleTy) {		switch (LocVT.SimpleTy) {
case MVT::i32:
case MVT::f32:		case MVT::f32:
RegList = SRegList;		RegList = SRegList;
NumRegs = 16;		NumRegs = 16;
break;		break;
case MVT::f64:		case MVT::f64:
RegList = DRegList;		RegList = DRegList;
NumRegs = 8;		NumRegs = 8;
break;		break;
Show All 22 Lines	if (ArgFlags.isInConsecutiveRegsLast()) {

// Register allocation failed, fall back to the stack		// Register allocation failed, fall back to the stack

// Mark all VFP regs as unavailable (AAPCS rule C.2.vfp)		// Mark all VFP regs as unavailable (AAPCS rule C.2.vfp)
for (unsigned regNo = 0; regNo < 16; ++regNo)		for (unsigned regNo = 0; regNo < 16; ++regNo)
State.AllocateReg(SRegList[regNo]);		State.AllocateReg(SRegList[regNo]);

unsigned Size = LocVT.getSizeInBits() / 8;		unsigned Size = LocVT.getSizeInBits() / 8;
unsigned Align = Size;		unsigned Align = std::min(Size, 8U);
		rengolinUnsubmitted Not Done Reply Inline Actions Wild guess here, but shouldn't this be something like std::min(Size, 8)? rengolin: Wild guess here, but shouldn't this be something like std::min(Size, 8)?

if (LocVT.SimpleTy == MVT::v2f64 \|\| LocVT.SimpleTy == MVT::i32) {
// Vectors are always aligned to 8 bytes. If we've seen an i32 here
// it's because it's been split from a larger type, also with align 8.
Align = 8;
}

for (auto It : PendingHAMembers) {		for (auto It : PendingHAMembers) {
It.convertToMem(State.AllocateStack(Size, Align));		It.convertToMem(State.AllocateStack(Size, Align));
State.addLoc(It);		State.addLoc(It);

// Only the first member needs to be aligned.
Align = 1;
}		}

// All pending members have now been allocated		// All pending members have now been allocated
PendingHAMembers.clear();		PendingHAMembers.clear();
}		}

// This will be allocated by the last member of the HA		// This will be allocated by the last member of the HA
return true;		return true;
}		}

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 470 Lines • ▼ Show 20 Lines	private:
SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerConstantFP(SDValue Op, SelectionDAG &DAG,		SDValue LowerConstantFP(SDValue Op, SelectionDAG &DAG,
const ARMSubtarget *ST) const;		const ARMSubtarget *ST) const;
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
const ARMSubtarget *ST) const;		const ARMSubtarget *ST) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDivRem(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDivRem(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;

unsigned getRegisterByName(const char* RegName, EVT VT) const override;		unsigned getRegisterByName(const char* RegName, EVT VT) const override;

/// isFMAFasterThanFMulAndFAdd - Return true if an FMA operation is faster		/// isFMAFasterThanFMulAndFAdd - Return true if an FMA operation is faster
/// than a pair of fmul and fadd instructions. fmuladd intrinsics will be		/// than a pair of fmul and fadd instructions. fmuladd intrinsics will be
/// expanded to FMAs when this method returns true, otherwise fmuladd is		/// expanded to FMAs when this method returns true, otherwise fmuladd is
/// expanded to fmul + fadd.		/// expanded to fmul + fadd.
///		///
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	SDValue
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals,		const SmallVectorImpl<SDValue> &OutVals,
SDLoc dl, SelectionDAG &DAG) const override;		SDLoc dl, SelectionDAG &DAG) const override;

bool isUsedByReturnOnly(SDNode *N, SDValue &Chain) const override;		bool isUsedByReturnOnly(SDNode *N, SDValue &Chain) const override;

bool mayBeEmittedAsTailCall(CallInst *CI) const override;		bool mayBeEmittedAsTailCall(CallInst *CI) const override;

		SDValue getCMOV(SDLoc dl, EVT VT, SDValue FalseVal, SDValue TrueVal,
		SDValue ARMcc, SDValue CCR, SDValue Cmp,
		SelectionDAG &DAG) const;
SDValue getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,		SDValue getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
SDValue &ARMcc, SelectionDAG &DAG, SDLoc dl) const;		SDValue &ARMcc, SelectionDAG &DAG, SDLoc dl) const;
SDValue getVFPCmp(SDValue LHS, SDValue RHS,		SDValue getVFPCmp(SDValue LHS, SDValue RHS,
SelectionDAG &DAG, SDLoc dl) const;		SelectionDAG &DAG, SDLoc dl) const;
SDValue duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const;		SDValue duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const;

SDValue OptimizeVFPBrcond(SDValue Op, SelectionDAG &DAG) const;		SDValue OptimizeVFPBrcond(SDValue Op, SelectionDAG &DAG) const;

Show All 29 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(TargetMachine &TM)

if (Subtarget->isThumb1Only())		if (Subtarget->isThumb1Only())
addRegisterClass(MVT::i32, &ARM::tGPRRegClass);		addRegisterClass(MVT::i32, &ARM::tGPRRegClass);
else		else
addRegisterClass(MVT::i32, &ARM::GPRRegClass);		addRegisterClass(MVT::i32, &ARM::GPRRegClass);
if (!TM.Options.UseSoftFloat && Subtarget->hasVFP2() &&		if (!TM.Options.UseSoftFloat && Subtarget->hasVFP2() &&
!Subtarget->isThumb1Only()) {		!Subtarget->isThumb1Only()) {
addRegisterClass(MVT::f32, &ARM::SPRRegClass);		addRegisterClass(MVT::f32, &ARM::SPRRegClass);
if (!Subtarget->isFPOnlySP())
addRegisterClass(MVT::f64, &ARM::DPRRegClass);		addRegisterClass(MVT::f64, &ARM::DPRRegClass);
}		}

for (unsigned VT = (unsigned)MVT::FIRST_VECTOR_VALUETYPE;		for (unsigned VT = (unsigned)MVT::FIRST_VECTOR_VALUETYPE;
VT <= (unsigned)MVT::LAST_VECTOR_VALUETYPE; ++VT) {		VT <= (unsigned)MVT::LAST_VECTOR_VALUETYPE; ++VT) {
for (unsigned InnerVT = (unsigned)MVT::FIRST_VECTOR_VALUETYPE;		for (unsigned InnerVT = (unsigned)MVT::FIRST_VECTOR_VALUETYPE;
InnerVT <= (unsigned)MVT::LAST_VECTOR_VALUETYPE; ++InnerVT)		InnerVT <= (unsigned)MVT::LAST_VECTOR_VALUETYPE; ++InnerVT)
setTruncStoreAction((MVT::SimpleValueType)VT,		setTruncStoreAction((MVT::SimpleValueType)VT,
(MVT::SimpleValueType)InnerVT, Expand);		(MVT::SimpleValueType)InnerVT, Expand);
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < 6; ++i) {
setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);		setLoadExtAction(ISD::SEXTLOAD, Tys[i], Legal);
}		}
}		}

// ARM and Thumb2 support UMLAL/SMLAL.		// ARM and Thumb2 support UMLAL/SMLAL.
if (!Subtarget->isThumb1Only())		if (!Subtarget->isThumb1Only())
setTargetDAGCombine(ISD::ADDC);		setTargetDAGCombine(ISD::ADDC);

		if (Subtarget->isFPOnlySP()) {
		// When targetting a floating-point unit with only single-precision
		// operations, f64 is legal for the few double-precision instructions which
		// are present However, no double-precision operations other than moves,
		// loads and stores are provided by the hardware.
		setOperationAction(ISD::FADD, MVT::f64, Expand);
		setOperationAction(ISD::FSUB, MVT::f64, Expand);
		setOperationAction(ISD::FMUL, MVT::f64, Expand);
		setOperationAction(ISD::FMA, MVT::f64, Expand);
		setOperationAction(ISD::FDIV, MVT::f64, Expand);
		setOperationAction(ISD::FREM, MVT::f64, Expand);
		setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
		setOperationAction(ISD::FGETSIGN, MVT::f64, Expand);
		setOperationAction(ISD::FNEG, MVT::f64, Expand);
		setOperationAction(ISD::FABS, MVT::f64, Expand);
		setOperationAction(ISD::FSQRT, MVT::f64, Expand);
		setOperationAction(ISD::FSIN, MVT::f64, Expand);
		setOperationAction(ISD::FCOS, MVT::f64, Expand);
		setOperationAction(ISD::FPOWI, MVT::f64, Expand);
		setOperationAction(ISD::FPOW, MVT::f64, Expand);
		setOperationAction(ISD::FLOG, MVT::f64, Expand);
		setOperationAction(ISD::FLOG2, MVT::f64, Expand);
		setOperationAction(ISD::FLOG10, MVT::f64, Expand);
		setOperationAction(ISD::FEXP, MVT::f64, Expand);
		setOperationAction(ISD::FEXP2, MVT::f64, Expand);
		setOperationAction(ISD::FCEIL, MVT::f64, Expand);
		setOperationAction(ISD::FTRUNC, MVT::f64, Expand);
		setOperationAction(ISD::FRINT, MVT::f64, Expand);
		setOperationAction(ISD::FNEARBYINT, MVT::f64, Expand);
		setOperationAction(ISD::FFLOOR, MVT::f64, Expand);
		setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);
		setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);
		}

computeRegisterProperties();		computeRegisterProperties();

// ARM does not have floating-point extending loads.		// ARM does not have floating-point extending loads.
setLoadExtAction(ISD::EXTLOAD, MVT::f32, Expand);		setLoadExtAction(ISD::EXTLOAD, MVT::f32, Expand);
setLoadExtAction(ISD::EXTLOAD, MVT::f16, Expand);		setLoadExtAction(ISD::EXTLOAD, MVT::f16, Expand);

// ... or truncating stores		// ... or truncating stores
▲ Show 20 Lines • Show All 2,632 Lines • ▼ Show 20 Lines	ARMTargetLowering::getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
ARMcc = DAG.getConstant(CondCode, MVT::i32);		ARMcc = DAG.getConstant(CondCode, MVT::i32);
return DAG.getNode(CompareType, dl, MVT::Glue, LHS, RHS);		return DAG.getNode(CompareType, dl, MVT::Glue, LHS, RHS);
}		}

/// Returns a appropriate VFP CMP (fcmp{s\|d}+fmstat) for the given operands.		/// Returns a appropriate VFP CMP (fcmp{s\|d}+fmstat) for the given operands.
SDValue		SDValue
ARMTargetLowering::getVFPCmp(SDValue LHS, SDValue RHS, SelectionDAG &DAG,		ARMTargetLowering::getVFPCmp(SDValue LHS, SDValue RHS, SelectionDAG &DAG,
SDLoc dl) const {		SDLoc dl) const {
		assert(!Subtarget->isFPOnlySP() \|\| RHS.getValueType() != MVT::f64);
		rengolinUnsubmitted Not Done Reply Inline Actions Is this the only such assert, or just the one that you hit during tests? rengolin: Is this the only such assert, or just the one that you hit during tests?
		olista01AuthorUnsubmitted Not Done Reply Inline Actions The only other legal floating-point type for FPv4-SP is f32, which can be handled by this function. Vector types are only legal for NEON. olista01: The only other legal floating-point type for FPv4-SP is f32, which can be handled by this…
SDValue Cmp;		SDValue Cmp;
if (!isFloatingPointZero(RHS))		if (!isFloatingPointZero(RHS))
Cmp = DAG.getNode(ARMISD::CMPFP, dl, MVT::Glue, LHS, RHS);		Cmp = DAG.getNode(ARMISD::CMPFP, dl, MVT::Glue, LHS, RHS);
else		else
Cmp = DAG.getNode(ARMISD::CMPFPw0, dl, MVT::Glue, LHS);		Cmp = DAG.getNode(ARMISD::CMPFPw0, dl, MVT::Glue, LHS);
return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Cmp);		return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Cmp);
}		}

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	if (!DAG.getTargetLoweringInfo().isTypeLegal(Cond->getValueType(0)))
return SDValue();		return SDValue();

SDValue Value, OverflowCmp;		SDValue Value, OverflowCmp;
SDValue ARMcc;		SDValue ARMcc;
std::tie(Value, OverflowCmp) = getARMXALUOOp(Cond, DAG, ARMcc);		std::tie(Value, OverflowCmp) = getARMXALUOOp(Cond, DAG, ARMcc);
SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

return DAG.getNode(ARMISD::CMOV, SDLoc(Op), VT, SelectTrue, SelectFalse,		return getCMOV(SDLoc(Op), VT, SelectTrue, SelectFalse, ARMcc, CCR,
ARMcc, CCR, OverflowCmp);		OverflowCmp, DAG);

}		}

// Convert:		// Convert:
//		//
// (select (cmov 1, 0, cond), t, f) -> (cmov t, f, cond)		// (select (cmov 1, 0, cond), t, f) -> (cmov t, f, cond)
// (select (cmov 0, 1, cond), t, f) -> (cmov f, t, cond)		// (select (cmov 0, 1, cond), t, f) -> (cmov f, t, cond)
//		//
if (Cond.getOpcode() == ARMISD::CMOV && Cond.hasOneUse()) {		if (Cond.getOpcode() == ARMISD::CMOV && Cond.hasOneUse()) {
Show All 17 Lines	if (CMOVTrue && CMOVFalse) {
}		}

if (True.getNode() && False.getNode()) {		if (True.getNode() && False.getNode()) {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDValue ARMcc = Cond.getOperand(2);		SDValue ARMcc = Cond.getOperand(2);
SDValue CCR = Cond.getOperand(3);		SDValue CCR = Cond.getOperand(3);
SDValue Cmp = duplicateCmp(Cond.getOperand(4), DAG);		SDValue Cmp = duplicateCmp(Cond.getOperand(4), DAG);
assert(True.getValueType() == VT);		assert(True.getValueType() == VT);
return DAG.getNode(ARMISD::CMOV, dl, VT, True, False, ARMcc, CCR, Cmp);		return getCMOV(dl, VT, True, False, ARMcc, CCR, Cmp, DAG);
}		}
}		}
}		}

// ARM's BooleanContents value is UndefinedBooleanContent. Mask out the		// ARM's BooleanContents value is UndefinedBooleanContent. Mask out the
// undefined bits before doing a full-word comparison with zero.		// undefined bits before doing a full-word comparison with zero.
Cond = DAG.getNode(ISD::AND, dl, Cond.getValueType(), Cond,		Cond = DAG.getNode(ISD::AND, dl, Cond.getValueType(), Cond,
DAG.getConstant(1, Cond.getValueType()));		DAG.getConstant(1, Cond.getValueType()));
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	static void checkVSELConstraints(ISD::CondCode CC, ARMCC::CondCodes &CondCode,
// 'unordered or not equal' is 'anything but equal', so use the EQ condition		// 'unordered or not equal' is 'anything but equal', so use the EQ condition
// code and swap the VSEL operands.		// code and swap the VSEL operands.
if (CC == ISD::SETUNE) {		if (CC == ISD::SETUNE) {
CondCode = ARMCC::EQ;		CondCode = ARMCC::EQ;
swpVselOps = true;		swpVselOps = true;
}		}
}		}

		SDValue ARMTargetLowering::getCMOV(SDLoc dl, EVT VT, SDValue FalseVal,
		SDValue TrueVal, SDValue ARMcc, SDValue CCR,
		SDValue Cmp, SelectionDAG &DAG) const {
		if (Subtarget->isFPOnlySP() && VT == MVT::f64) {
		rengolinUnsubmitted Not Done Reply Inline Actions Would this ever be called by v2f64 types? rengolin: Would this ever be called by v2f64 types?
		olista01AuthorUnsubmitted Not Done Reply Inline Actions No, because v2f64 is not a legal type when isFPOnlySP, and this is only called while legalizing operations, which happens after legalizing types. olista01: No, because v2f64 is not a legal type when isFPOnlySP, and this is only called while legalizing…
		FalseVal = DAG.getNode(ARMISD::VMOVRRD, dl,
		DAG.getVTList(MVT::i32, MVT::i32), FalseVal);
		TrueVal = DAG.getNode(ARMISD::VMOVRRD, dl,
		DAG.getVTList(MVT::i32, MVT::i32), TrueVal);

		SDValue TrueLow = TrueVal.getValue(0);
		SDValue TrueHigh = TrueVal.getValue(1);
		SDValue FalseLow = FalseVal.getValue(0);
		SDValue FalseHigh = FalseVal.getValue(1);

		SDValue Low = DAG.getNode(ARMISD::CMOV, dl, MVT::i32, FalseLow, TrueLow,
		ARMcc, CCR, Cmp);
		SDValue High = DAG.getNode(ARMISD::CMOV, dl, MVT::i32, FalseHigh, TrueHigh,
		ARMcc, CCR, duplicateCmp(Cmp, DAG));

		return DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Low, High);
		} else {
		return DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal, ARMcc, CCR,
		Cmp);
		}
		}

SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();
SDValue TrueVal = Op.getOperand(2);		SDValue TrueVal = Op.getOperand(2);
SDValue FalseVal = Op.getOperand(3);		SDValue FalseVal = Op.getOperand(3);
SDLoc dl(Op);		SDLoc dl(Op);

		if (Subtarget->isFPOnlySP() && LHS.getValueType() == MVT::f64) {
		DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,
		dl);

		// If softenSetCCOperands only returned one value, we should compare it to
		// zero.
		if (!RHS.getNode()) {
		RHS = DAG.getConstant(0, LHS.getValueType());
		CC = ISD::SETNE;
		}
		}

if (LHS.getValueType() == MVT::i32) {		if (LHS.getValueType() == MVT::i32) {
// Try to generate VSEL on ARMv8.		// Try to generate VSEL on ARMv8.
// The VSEL instruction can't use all the usual ARM condition		// The VSEL instruction can't use all the usual ARM condition
// codes: it only has two bits to select the condition code, so it's		// codes: it only has two bits to select the condition code, so it's
// constrained to use only GE, GT, VS and EQ.		// constrained to use only GE, GT, VS and EQ.
//		//
// To implement all the various ISD::SETXXX opcodes, we sometimes need to		// To implement all the various ISD::SETXXX opcodes, we sometimes need to
// swap the operands of the previous compare instruction (effectively		// swap the operands of the previous compare instruction (effectively
// inverting the compare condition, swapping 'less' and 'greater') and		// inverting the compare condition, swapping 'less' and 'greater') and
// sometimes need to swap the operands to the VSEL (which inverts the		// sometimes need to swap the operands to the VSEL (which inverts the
// condition in the sense of firing whenever the previous condition didn't)		// condition in the sense of firing whenever the previous condition didn't)
if (getSubtarget()->hasFPARMv8() && (TrueVal.getValueType() == MVT::f32 \|\|		if (getSubtarget()->hasFPARMv8() && (TrueVal.getValueType() == MVT::f32 \|\|
TrueVal.getValueType() == MVT::f64)) {		TrueVal.getValueType() == MVT::f64)) {
ARMCC::CondCodes CondCode = IntCCToARMCC(CC);		ARMCC::CondCodes CondCode = IntCCToARMCC(CC);
if (CondCode == ARMCC::LT \|\| CondCode == ARMCC::LE \|\|		if (CondCode == ARMCC::LT \|\| CondCode == ARMCC::LE \|\|
CondCode == ARMCC::VC \|\| CondCode == ARMCC::NE) {		CondCode == ARMCC::VC \|\| CondCode == ARMCC::NE) {
CC = getInverseCCForVSEL(CC);		CC = getInverseCCForVSEL(CC);
std::swap(TrueVal, FalseVal);		std::swap(TrueVal, FalseVal);
}		}
}		}

SDValue ARMcc;		SDValue ARMcc;
SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);		SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);
return DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal, ARMcc, CCR,		return getCMOV(dl, VT, FalseVal, TrueVal, ARMcc, CCR, Cmp, DAG);
Cmp);
}		}

ARMCC::CondCodes CondCode, CondCode2;		ARMCC::CondCodes CondCode, CondCode2;
FPCCToARMCC(CC, CondCode, CondCode2);		FPCCToARMCC(CC, CondCode, CondCode2);

// Try to generate VSEL on ARMv8.		// Try to generate VSEL on ARMv8.
if (getSubtarget()->hasFPARMv8() && (TrueVal.getValueType() == MVT::f32 \|\|		if (getSubtarget()->hasFPARMv8() && (TrueVal.getValueType() == MVT::f32 \|\|
TrueVal.getValueType() == MVT::f64)) {		TrueVal.getValueType() == MVT::f64)) {
Show All 22 Lines	if (CondCode == ARMCC::GT \|\| CondCode == ARMCC::GE \|\|
if (swpVselOps)		if (swpVselOps)
std::swap(TrueVal, FalseVal);		std::swap(TrueVal, FalseVal);
}		}
}		}

SDValue ARMcc = DAG.getConstant(CondCode, MVT::i32);		SDValue ARMcc = DAG.getConstant(CondCode, MVT::i32);
SDValue Cmp = getVFPCmp(LHS, RHS, DAG, dl);		SDValue Cmp = getVFPCmp(LHS, RHS, DAG, dl);
SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
SDValue Result = DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal,		SDValue Result = getCMOV(dl, VT, FalseVal, TrueVal, ARMcc, CCR, Cmp, DAG);
ARMcc, CCR, Cmp);
if (CondCode2 != ARMCC::AL) {		if (CondCode2 != ARMCC::AL) {
SDValue ARMcc2 = DAG.getConstant(CondCode2, MVT::i32);		SDValue ARMcc2 = DAG.getConstant(CondCode2, MVT::i32);
// FIXME: Needs another CMP because flag can have but one use.		// FIXME: Needs another CMP because flag can have but one use.
SDValue Cmp2 = getVFPCmp(LHS, RHS, DAG, dl);		SDValue Cmp2 = getVFPCmp(LHS, RHS, DAG, dl);
Result = DAG.getNode(ARMISD::CMOV, dl, VT,		Result = getCMOV(dl, VT, Result, TrueVal, ARMcc2, CCR, Cmp2, DAG);
Result, TrueVal, ARMcc2, CCR, Cmp2);
}		}
return Result;		return Result;
}		}

/// canChangeToInt - Given the fp compare operand, return true if it is suitable		/// canChangeToInt - Given the fp compare operand, return true if it is suitable
/// to morph to an integer compare sequence.		/// to morph to an integer compare sequence.
static bool canChangeToInt(SDValue Op, bool &SeenZero,		static bool canChangeToInt(SDValue Op, bool &SeenZero,
const ARMSubtarget *Subtarget) {		const ARMSubtarget *Subtarget) {
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
SDValue LHS = Op.getOperand(2);		SDValue LHS = Op.getOperand(2);
SDValue RHS = Op.getOperand(3);		SDValue RHS = Op.getOperand(3);
SDValue Dest = Op.getOperand(4);		SDValue Dest = Op.getOperand(4);
SDLoc dl(Op);		SDLoc dl(Op);

		if (Subtarget->isFPOnlySP() && LHS.getValueType() == MVT::f64) {
		DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,
		dl);

		// If softenSetCCOperands only returned one value, we should compare it to
		// zero.
		if (!RHS.getNode()) {
		RHS = DAG.getConstant(0, LHS.getValueType());
		CC = ISD::SETNE;
		}
		}

if (LHS.getValueType() == MVT::i32) {		if (LHS.getValueType() == MVT::i32) {
SDValue ARMcc;		SDValue ARMcc;
SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);		SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);
SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,		return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,
Chain, Dest, ARMcc, CCR, Cmp);		Chain, Dest, ARMcc, CCR, Cmp);
}		}

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	assert(Op.getOperand(0).getValueType() == MVT::v4f32 &&
"Invalid type for custom lowering!");		"Invalid type for custom lowering!");
if (VT != MVT::v4i16)		if (VT != MVT::v4i16)
return DAG.UnrollVectorOp(Op.getNode());		return DAG.UnrollVectorOp(Op.getNode());

Op = DAG.getNode(Op.getOpcode(), dl, MVT::v4i32, Op.getOperand(0));		Op = DAG.getNode(Op.getOpcode(), dl, MVT::v4i32, Op.getOperand(0));
return DAG.getNode(ISD::TRUNCATE, dl, VT, Op);		return DAG.getNode(ISD::TRUNCATE, dl, VT, Op);
}		}

static SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) {		SDValue ARMTargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (VT.isVector())		if (VT.isVector())
return LowerVectorFP_TO_INT(Op, DAG);		return LowerVectorFP_TO_INT(Op, DAG);

		if (Subtarget->isFPOnlySP() && Op.getOperand(0).getValueType() == MVT::f64) {
		RTLIB::Libcall LC;
		if (Op.getOpcode() == ISD::FP_TO_SINT)
		LC = RTLIB::getFPTOSINT(Op.getOperand(0).getValueType(),
		Op.getValueType());
		else
		LC = RTLIB::getFPTOUINT(Op.getOperand(0).getValueType(),
		Op.getValueType());
		return makeLibCall(DAG, LC, Op.getValueType(), &Op.getOperand(0), 1,
		/isSigned/ false, SDLoc(Op)).first;
		}

SDLoc dl(Op);		SDLoc dl(Op);
unsigned Opc;		unsigned Opc;

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default: llvm_unreachable("Invalid opcode!");		default: llvm_unreachable("Invalid opcode!");
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
Opc = ARMISD::FTOSI;		Opc = ARMISD::FTOSI;
break;		break;
Show All 33 Lines	case ISD::UINT_TO_FP:
Opc = ISD::UINT_TO_FP;		Opc = ISD::UINT_TO_FP;
break;		break;
}		}

Op = DAG.getNode(CastOpc, dl, MVT::v4i32, Op.getOperand(0));		Op = DAG.getNode(CastOpc, dl, MVT::v4i32, Op.getOperand(0));
return DAG.getNode(Opc, dl, VT, Op);		return DAG.getNode(Opc, dl, VT, Op);
}		}

static SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) {		SDValue ARMTargetLowering::LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (VT.isVector())		if (VT.isVector())
return LowerVectorINT_TO_FP(Op, DAG);		return LowerVectorINT_TO_FP(Op, DAG);

		if (Subtarget->isFPOnlySP() && Op.getValueType() == MVT::f64) {
		RTLIB::Libcall LC;
		if (Op.getOpcode() == ISD::SINT_TO_FP)
		LC = RTLIB::getSINTTOFP(Op.getOperand(0).getValueType(),
		Op.getValueType());
		else
		LC = RTLIB::getUINTTOFP(Op.getOperand(0).getValueType(),
		Op.getValueType());
		return makeLibCall(DAG, LC, Op.getValueType(), &Op.getOperand(0), 1,
		/isSigned/ false, SDLoc(Op)).first;
		}

SDLoc dl(Op);		SDLoc dl(Op);
unsigned Opc;		unsigned Opc;

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default: llvm_unreachable("Invalid opcode!");		default: llvm_unreachable("Invalid opcode!");
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
Opc = ARMISD::SITOF;		Opc = ARMISD::SITOF;
break;		break;
▲ Show 20 Lines • Show All 492 Lines • ▼ Show 20 Lines	static SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) {

SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
SDValue Op1 = Op.getOperand(1);		SDValue Op1 = Op.getOperand(1);
SDValue CC = Op.getOperand(2);		SDValue CC = Op.getOperand(2);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
ISD::CondCode SetCCOpcode = cast<CondCodeSDNode>(CC)->get();		ISD::CondCode SetCCOpcode = cast<CondCodeSDNode>(CC)->get();
SDLoc dl(Op);		SDLoc dl(Op);

if (Op.getOperand(1).getValueType().isFloatingPoint()) {		if (Op1.getValueType().isFloatingPoint()) {
switch (SetCCOpcode) {		switch (SetCCOpcode) {
default: llvm_unreachable("Illegal FP comparison");		default: llvm_unreachable("Illegal FP comparison");
case ISD::SETUNE:		case ISD::SETUNE:
case ISD::SETNE: Invert = true; // Fallthrough		case ISD::SETNE: Invert = true; // Fallthrough
case ISD::SETOEQ:		case ISD::SETOEQ:
case ISD::SETEQ: Opc = ARMISD::VCEQ; break;		case ISD::SETEQ: Opc = ARMISD::VCEQ; break;
case ISD::SETOLT:		case ISD::SETOLT:
case ISD::SETLT: Swap = true; // Fallthrough		case ISD::SETLT: Swap = true; // Fallthrough
▲ Show 20 Lines • Show All 247 Lines • ▼ Show 20 Lines
SDValue ARMTargetLowering::LowerConstantFP(SDValue Op, SelectionDAG &DAG,		SDValue ARMTargetLowering::LowerConstantFP(SDValue Op, SelectionDAG &DAG,
const ARMSubtarget *ST) const {		const ARMSubtarget *ST) const {
if (!ST->hasVFP3())		if (!ST->hasVFP3())
return SDValue();		return SDValue();

bool IsDouble = Op.getValueType() == MVT::f64;		bool IsDouble = Op.getValueType() == MVT::f64;
ConstantFPSDNode *CFP = cast<ConstantFPSDNode>(Op);		ConstantFPSDNode *CFP = cast<ConstantFPSDNode>(Op);

		// Use the default (constant pool) lowering for double constants when we have
		// an SP-only FPU
		if (IsDouble && Subtarget->isFPOnlySP())
		return SDValue();

// Try splatting with a VMOV.f32...		// Try splatting with a VMOV.f32...
APFloat FPVal = CFP->getValueAPF();		APFloat FPVal = CFP->getValueAPF();
int ImmVal = IsDouble ? ARM_AM::getFP64Imm(FPVal) : ARM_AM::getFP32Imm(FPVal);		int ImmVal = IsDouble ? ARM_AM::getFP64Imm(FPVal) : ARM_AM::getFP32Imm(FPVal);

if (ImmVal != -1) {		if (ImmVal != -1) {
if (IsDouble \|\| !ST->useNEONForSinglePrecisionFP()) {		if (IsDouble \|\| !ST->useNEONForSinglePrecisionFP()) {
// We have code in place to select a valid ConstantFP already, no need to		// We have code in place to select a valid ConstantFP already, no need to
// do any mangling.		// do any mangling.
▲ Show 20 Lines • Show All 1,687 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::ATOMIC_STORE: return LowerAtomicLoadStore(Op, DAG);		case ISD::ATOMIC_STORE: return LowerAtomicLoadStore(Op, DAG);
case ISD::FSINCOS: return LowerFSINCOS(Op, DAG);		case ISD::FSINCOS: return LowerFSINCOS(Op, DAG);
case ISD::SDIVREM:		case ISD::SDIVREM:
case ISD::UDIVREM: return LowerDivRem(Op, DAG);		case ISD::UDIVREM: return LowerDivRem(Op, DAG);
case ISD::DYNAMIC_STACKALLOC:		case ISD::DYNAMIC_STACKALLOC:
if (Subtarget->getTargetTriple().isWindowsItaniumEnvironment())		if (Subtarget->getTargetTriple().isWindowsItaniumEnvironment())
return LowerDYNAMIC_STACKALLOC(Op, DAG);		return LowerDYNAMIC_STACKALLOC(Op, DAG);
llvm_unreachable("Don't know how to custom lower this!");		llvm_unreachable("Don't know how to custom lower this!");
		case ISD::FP_ROUND: return LowerFP_ROUND(Op, DAG);
		case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);
}		}
}		}

/// ReplaceNodeResults - Replace the results of node with an illegal result		/// ReplaceNodeResults - Replace the results of node with an illegal result
/// type with new values built out of custom code.		/// type with new values built out of custom code.
void ARMTargetLowering::ReplaceNodeResults(SDNode *N,		void ARMTargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue>&Results,		SmallVectorImpl<SDValue>&Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 2,127 Lines • ▼ Show 20 Lines	if ((Mask & (~Mask2)) == 0)
N->getOperand(2));		N->getOperand(2));
}		}
return SDValue();		return SDValue();
}		}

/// PerformVMOVRRDCombine - Target-specific dag combine xforms for		/// PerformVMOVRRDCombine - Target-specific dag combine xforms for
/// ARMISD::VMOVRRD.		/// ARMISD::VMOVRRD.
static SDValue PerformVMOVRRDCombine(SDNode *N,		static SDValue PerformVMOVRRDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI,
		const ARMSubtarget *Subtarget) {
// vmovrrd(vmovdrr x, y) -> x,y		// vmovrrd(vmovdrr x, y) -> x,y
SDValue InDouble = N->getOperand(0);		SDValue InDouble = N->getOperand(0);
if (InDouble.getOpcode() == ARMISD::VMOVDRR)		if (InDouble.getOpcode() == ARMISD::VMOVDRR && !Subtarget->isFPOnlySP())
return DCI.CombineTo(N, InDouble.getOperand(0), InDouble.getOperand(1));		return DCI.CombineTo(N, InDouble.getOperand(0), InDouble.getOperand(1));

// vmovrrd(load f64) -> (load i32), (load i32)		// vmovrrd(load f64) -> (load i32), (load i32)
SDNode *InNode = InDouble.getNode();		SDNode *InNode = InDouble.getNode();
if (ISD::isNormalLoad(InNode) && InNode->hasOneUse() &&		if (ISD::isNormalLoad(InNode) && InNode->hasOneUse() &&
InNode->getValueType(0) == MVT::f64 &&		InNode->getValueType(0) == MVT::f64 &&
InNode->getOperand(1).getOpcode() == ISD::FrameIndex &&		InNode->getOperand(1).getOpcode() == ISD::FrameIndex &&
!cast<LoadSDNode>(InNode)->isVolatile()) {		!cast<LoadSDNode>(InNode)->isVolatile()) {
Show All 23 Lines	static SDValue PerformVMOVRRDCombine(SDNode *N,
}		}

return SDValue();		return SDValue();
}		}

/// PerformVMOVDRRCombine - Target-specific dag combine xforms for		/// PerformVMOVDRRCombine - Target-specific dag combine xforms for
/// ARMISD::VMOVDRR. This is also used for BUILD_VECTORs with 2 operands.		/// ARMISD::VMOVDRR. This is also used for BUILD_VECTORs with 2 operands.
static SDValue PerformVMOVDRRCombine(SDNode *N, SelectionDAG &DAG) {		static SDValue PerformVMOVDRRCombine(SDNode *N, SelectionDAG &DAG) {
// N=vmovrrd(X); vmovdrr(N:0, N:1) -> bit_convert(X)		// N=vmovrrd(X); vmovdrr(N:0, N:1) -> bit_convert(X)
		rengolinUnsubmitted Not Done Reply Inline Actions This looks like left over. rengolin: This looks like left over.
SDValue Op0 = N->getOperand(0);		SDValue Op0 = N->getOperand(0);
SDValue Op1 = N->getOperand(1);		SDValue Op1 = N->getOperand(1);
if (Op0.getOpcode() == ISD::BITCAST)		if (Op0.getOpcode() == ISD::BITCAST)
Op0 = Op0.getOperand(0);		Op0 = Op0.getOperand(0);
if (Op1.getOpcode() == ISD::BITCAST)		if (Op1.getOpcode() == ISD::BITCAST)
Op1 = Op1.getOperand(0);		Op1 = Op1.getOperand(0);
if (Op0.getOpcode() == ARMISD::VMOVRRD &&		if (Op0.getOpcode() == ARMISD::VMOVRRD &&
Op0.getNode() == Op1.getNode() &&		Op0.getNode() == Op1.getNode() &&
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	if (ISD::isNormalLoad(Elt) && !cast<LoadSDNode>(Elt)->isVolatile())
return true;		return true;
}		}
return false;		return false;
}		}

/// PerformBUILD_VECTORCombine - Target-specific dag combine xforms for		/// PerformBUILD_VECTORCombine - Target-specific dag combine xforms for
/// ISD::BUILD_VECTOR.		/// ISD::BUILD_VECTOR.
static SDValue PerformBUILD_VECTORCombine(SDNode *N,		static SDValue PerformBUILD_VECTORCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI){		TargetLowering::DAGCombinerInfo &DCI,
		const ARMSubtarget *Subtarget) {
// build_vector(N=ARMISD::VMOVRRD(X), N:1) -> bit_convert(X):		// build_vector(N=ARMISD::VMOVRRD(X), N:1) -> bit_convert(X):
// VMOVRRD is introduced when legalizing i64 types. It forces the i64 value		// VMOVRRD is introduced when legalizing i64 types. It forces the i64 value
// into a pair of GPRs, which is fine when the value is used as a scalar,		// into a pair of GPRs, which is fine when the value is used as a scalar,
// but if the i64 value is converted to a vector, we need to undo the VMOVRRD.		// but if the i64 value is converted to a vector, we need to undo the VMOVRRD.
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
if (N->getNumOperands() == 2) {		if (N->getNumOperands() == 2) {
SDValue RV = PerformVMOVDRRCombine(N, DAG);		SDValue RV = PerformVMOVDRRCombine(N, DAG);
if (RV.getNode())		if (RV.getNode())
▲ Show 20 Lines • Show All 998 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::PerformDAGCombine(SDNode *N,
case ISD::ADDC: return PerformADDCCombine(N, DCI, Subtarget);		case ISD::ADDC: return PerformADDCCombine(N, DCI, Subtarget);
case ISD::ADD: return PerformADDCombine(N, DCI, Subtarget);		case ISD::ADD: return PerformADDCombine(N, DCI, Subtarget);
case ISD::SUB: return PerformSUBCombine(N, DCI);		case ISD::SUB: return PerformSUBCombine(N, DCI);
case ISD::MUL: return PerformMULCombine(N, DCI, Subtarget);		case ISD::MUL: return PerformMULCombine(N, DCI, Subtarget);
case ISD::OR: return PerformORCombine(N, DCI, Subtarget);		case ISD::OR: return PerformORCombine(N, DCI, Subtarget);
case ISD::XOR: return PerformXORCombine(N, DCI, Subtarget);		case ISD::XOR: return PerformXORCombine(N, DCI, Subtarget);
case ISD::AND: return PerformANDCombine(N, DCI, Subtarget);		case ISD::AND: return PerformANDCombine(N, DCI, Subtarget);
case ARMISD::BFI: return PerformBFICombine(N, DCI);		case ARMISD::BFI: return PerformBFICombine(N, DCI);
case ARMISD::VMOVRRD: return PerformVMOVRRDCombine(N, DCI);		case ARMISD::VMOVRRD: return PerformVMOVRRDCombine(N, DCI, Subtarget);
case ARMISD::VMOVDRR: return PerformVMOVDRRCombine(N, DCI.DAG);		case ARMISD::VMOVDRR: return PerformVMOVDRRCombine(N, DCI.DAG);
case ISD::STORE: return PerformSTORECombine(N, DCI);		case ISD::STORE: return PerformSTORECombine(N, DCI);
case ISD::BUILD_VECTOR: return PerformBUILD_VECTORCombine(N, DCI);		case ISD::BUILD_VECTOR: return PerformBUILD_VECTORCombine(N, DCI, Subtarget);
case ISD::INSERT_VECTOR_ELT: return PerformInsertEltCombine(N, DCI);		case ISD::INSERT_VECTOR_ELT: return PerformInsertEltCombine(N, DCI);
case ISD::VECTOR_SHUFFLE: return PerformVECTOR_SHUFFLECombine(N, DCI.DAG);		case ISD::VECTOR_SHUFFLE: return PerformVECTOR_SHUFFLECombine(N, DCI.DAG);
case ARMISD::VDUPLANE: return PerformVDUPLANECombine(N, DCI);		case ARMISD::VDUPLANE: return PerformVDUPLANECombine(N, DCI);
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT: return PerformVCVTCombine(N, DCI, Subtarget);		case ISD::FP_TO_UINT: return PerformVCVTCombine(N, DCI, Subtarget);
case ISD::FDIV: return PerformVDIVCombine(N, DCI, Subtarget);		case ISD::FDIV: return PerformVDIVCombine(N, DCI, Subtarget);
case ISD::INTRINSIC_WO_CHAIN: return PerformIntrinsicCombine(N, DCI.DAG);		case ISD::INTRINSIC_WO_CHAIN: return PerformIntrinsicCombine(N, DCI.DAG);
case ISD::SHL:		case ISD::SHL:
▲ Show 20 Lines • Show All 973 Lines • ▼ Show 20 Lines	ARMTargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const {

SDValue NewSP = DAG.getCopyFromReg(Chain, DL, ARM::SP, MVT::i32);		SDValue NewSP = DAG.getCopyFromReg(Chain, DL, ARM::SP, MVT::i32);
Chain = NewSP.getValue(1);		Chain = NewSP.getValue(1);

SDValue Ops[2] = { NewSP, Chain };		SDValue Ops[2] = { NewSP, Chain };
return DAG.getMergeValues(Ops, DL);		return DAG.getMergeValues(Ops, DL);
}		}

		SDValue ARMTargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {
		assert(Op.getValueType() == MVT::f64 && Subtarget->isFPOnlySP() &&
		"Unexpected type for custom-lowering FP_EXTEND");

		RTLIB::Libcall LC;
		LC = RTLIB::getFPEXT(Op.getOperand(0).getValueType(), Op.getValueType());

		SDValue SrcVal = Op.getOperand(0);
		return makeLibCall(DAG, LC, Op.getValueType(), &SrcVal, 1,
		/isSigned/ false, SDLoc(Op)).first;
		}

		SDValue ARMTargetLowering::LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
		assert(Op.getOperand(0).getValueType() == MVT::f64 &&
		Subtarget->isFPOnlySP() &&
		"Unexpected type for custom-lowering FP_ROUND");

		RTLIB::Libcall LC;
		LC = RTLIB::getFPROUND(Op.getOperand(0).getValueType(), Op.getValueType());

		SDValue SrcVal = Op.getOperand(0);
		return makeLibCall(DAG, LC, Op.getValueType(), &SrcVal, 1,
		/isSigned/ false, SDLoc(Op)).first;
		}

bool		bool
ARMTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {		ARMTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
// The ARM target isn't yet aware of offsets.		// The ARM target isn't yet aware of offsets.
return false;		return false;
}		}

bool ARM::isBitFieldInvertedMask(unsigned v) {		bool ARM::isBitFieldInvertedMask(unsigned v) {
if (v == 0xffffffff)		if (v == 0xffffffff)
Show All 11 Lines
/// isFPImmLegal - Returns true if the target can instruction select the		/// isFPImmLegal - Returns true if the target can instruction select the
/// specified FP immediate natively. If false, the legalizer will		/// specified FP immediate natively. If false, the legalizer will
/// materialize the FP immediate as a load from a constant pool.		/// materialize the FP immediate as a load from a constant pool.
bool ARMTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT) const {		bool ARMTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT) const {
if (!Subtarget->hasVFP3())		if (!Subtarget->hasVFP3())
return false;		return false;
if (VT == MVT::f32)		if (VT == MVT::f32)
return ARM_AM::getFP32Imm(Imm) != -1;		return ARM_AM::getFP32Imm(Imm) != -1;
if (VT == MVT::f64)		if (VT == MVT::f64 && !Subtarget->isFPOnlySP())
return ARM_AM::getFP64Imm(Imm) != -1;		return ARM_AM::getFP64Imm(Imm) != -1;
return false;		return false;
}		}

/// getTgtMemIntrinsic - Represent NEON load and store intrinsics as		/// getTgtMemIntrinsic - Represent NEON load and store intrinsics as
/// MemIntrinsicNodes. The associated MachineMemOperands record the alignment		/// MemIntrinsicNodes. The associated MachineMemOperands record the alignment
/// specified in the intrinsic calls.		/// specified in the intrinsic calls.
bool ARMTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,		bool ARMTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrVFP.td

Show First 20 Lines • Show All 509 Lines • ▼ Show 20 Lines	def VCVTDS : ASuI<0b11101, 0b11, 0b0111, 0b11, 0,
bits<5> Dd;		bits<5> Dd;
bits<5> Sm;		bits<5> Sm;

// Encode instruction operands.		// Encode instruction operands.
let Inst{3-0} = Sm{4-1};		let Inst{3-0} = Sm{4-1};
let Inst{5} = Sm{0};		let Inst{5} = Sm{0};
let Inst{15-12} = Dd{3-0};		let Inst{15-12} = Dd{3-0};
let Inst{22} = Dd{4};		let Inst{22} = Dd{4};

		let Predicates = [HasVFP2, HasDPVFP];
}		}

// Special case encoding: bits 11-8 is 0b1011.		// Special case encoding: bits 11-8 is 0b1011.
def VCVTSD : VFPAI<(outs SPR:$Sd), (ins DPR:$Dm), VFPUnaryFrm,		def VCVTSD : VFPAI<(outs SPR:$Sd), (ins DPR:$Dm), VFPUnaryFrm,
IIC_fpCVTSD, "vcvt", ".f32.f64\t$Sd, $Dm",		IIC_fpCVTSD, "vcvt", ".f32.f64\t$Sd, $Dm",
[(set SPR:$Sd, (fround DPR:$Dm))]> {		[(set SPR:$Sd, (fround DPR:$Dm))]> {
// Instruction operands.		// Instruction operands.
bits<5> Sd;		bits<5> Sd;
▲ Show 20 Lines • Show All 1,266 Lines • Show Last 20 Lines

test/CodeGen/ARM/aapcs-hfa-code.ll

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	}			}

	define arm_aapcs_vfpcc void @test_1double({ double } %a) {			define arm_aapcs_vfpcc void @test_1double({ double } %a) {
	; CHECK-LABEL: test_1double:			; CHECK-LABEL: test_1double:
	; CHECK-DAG: vmov.f64 d0, #1.{{0+}}e+00			; CHECK-DAG: vmov.f64 d0, #1.{{0+}}e+00
	; CHECK: bl test_1double			; CHECK: bl test_1double

	; CHECK-M4F-LABEL: test_1double:			; CHECK-M4F-LABEL: test_1double:
	; CHECK-M4F: movs [[ONEHI:r[0-9]+]], #0			; CHECK-M4F: vldr d0, [[CP_LABEL:.*]]
	; CHECK-M4F: movs [[ONELO:r[0-9]+]], #0
	; CHECK-M4F: movt [[ONEHI]], #16368
	; CHECK-M4F-DAG: vmov s0, [[ONELO]]
	; CHECK-M4F-DAG: vmov s1, [[ONEHI]]
	; CHECK-M4F: bl test_1double			; CHECK-M4F: bl test_1double
				; CHECK-M4F: [[CP_LABEL]]
				; CHECK-M4F-NEXT: .long 0
				; CHECK-M4F-NEXT: .long 1072693248

	call arm_aapcs_vfpcc void @test_1double({ double } { double 1.0 })			call arm_aapcs_vfpcc void @test_1double({ double } { double 1.0 })
	ret void			ret void
	}			}

	; Final double argument might be put in s15 & [sp] if we're careless. It should			; Final double argument might be put in s15 & [sp] if we're careless. It should
	; go all on the stack.			; go all on the stack.
	define arm_aapcs_vfpcc void @test_1double_nosplit([4 x float], [4 x double], [3 x float], double %a) {			define arm_aapcs_vfpcc void @test_1double_nosplit([4 x float], [4 x double], [3 x float], double %a) {
	; CHECK-LABEL: test_1double_nosplit:			; CHECK-LABEL: test_1double_nosplit:
	; CHECK-DAG: mov [[ONELO:r[0-9]+]], #0			; CHECK-DAG: mov [[ONELO:r[0-9]+]], #0
	; CHECK-DAG: movw [[ONEHI:r[0-9]+]], #0			; CHECK-DAG: movw [[ONEHI:r[0-9]+]], #0
	; CHECK-DAG: movt [[ONEHI]], #16368			; CHECK-DAG: movt [[ONEHI]], #16368
	; CHECK: strd [[ONELO]], [[ONEHI]], [sp]			; CHECK: strd [[ONELO]], [[ONEHI]], [sp]
	; CHECK: bl test_1double_nosplit			; CHECK: bl test_1double_nosplit

	; CHECK-M4F-LABEL: test_1double_nosplit:			; CHECK-M4F-LABEL: test_1double_nosplit:
	; CHECK-M4F: movs [[ONELO:r[0-9]+]], #0
	; CHECK-M4F: movs [[ONEHI:r[0-9]+]], #0			; CHECK-M4F: movs [[ONEHI:r[0-9]+]], #0
				; CHECK-M4F: movs [[ONELO:r[0-9]+]], #0
				rengolinUnsubmitted Not Done Reply Inline Actions A bit fragile this change, no? Shouldn't this also be a DAG check? rengolin: A bit fragile this change, no? Shouldn't this also be a DAG check?
				olista01AuthorUnsubmitted Not Done Reply Inline Actions I'm not sure how this can be done. If I was to make the movs lines DAG checks, they could match in either order, but ONEHI and ONELO would be swapped, so the movt and strd lines would fail to match. Making any more checks DAGs would cause this test to match even if the registers were in the wrong order (swapping ONEHI and ONELO in the strd is a plausible failure mode). olista01: I'm not sure how this can be done. If I was to make the movs lines DAG checks, they could match…
				rengolinUnsubmitted Not Done Reply Inline Actions That's a good point. I wanted to add a CHECK-OR so that all sequential CHECKs could be matched and you'd be able to do: ; CHECK-M4F: strd [[ONELO]], [[ONEHI]], [sp] ; CHECK-M4F-OR: strd [[ONEHI]], [[ONELO]], [sp] But that's for another commit. :) rengolin: That's a good point. I wanted to add a CHECK-OR so that all sequential CHECKs could be matched…
	; CHECK-M4F: movt [[ONEHI]], #16368			; CHECK-M4F: movt [[ONEHI]], #16368
	; CHECK-M4F-DAG: str [[ONELO]], [sp]			; CHECK-M4F: strd [[ONELO]], [[ONEHI]], [sp]
	; CHECK-M4F-DAG: str [[ONEHI]], [sp, #4]
	; CHECK-M4F: bl test_1double_nosplit			; CHECK-M4F: bl test_1double_nosplit
	call arm_aapcs_vfpcc void @test_1double_nosplit([4 x float] undef, [4 x double] undef, [3 x float] undef, double 1.0)			call arm_aapcs_vfpcc void @test_1double_nosplit([4 x float] undef, [4 x double] undef, [3 x float] undef, double 1.0)
	ret void			ret void
	}			}

	; Final double argument might go at [sp, #4] if we're careless. Should go at			; Final double argument might go at [sp, #4] if we're careless. Should go at
	; [sp, #8] to preserve alignment.			; [sp, #8] to preserve alignment.
	define arm_aapcs_vfpcc void @test_1double_misaligned([4 x double], [4 x double], float, double) {			define arm_aapcs_vfpcc void @test_1double_misaligned([4 x double], [4 x double], float, double) {
	call arm_aapcs_vfpcc void @test_1double_misaligned([4 x double] undef, [4 x double] undef, float undef, double 1.0)			call arm_aapcs_vfpcc void @test_1double_misaligned([4 x double] undef, [4 x double] undef, float undef, double 1.0)

	; CHECK-LABEL: test_1double_misaligned:			; CHECK-LABEL: test_1double_misaligned:
	; CHECK-DAG: movw [[ONEHI:r[0-9]+]], #0			; CHECK-DAG: movw [[ONEHI:r[0-9]+]], #0
	; CHECK-DAG: mov [[ONELO:r[0-9]+]], #0			; CHECK-DAG: mov [[ONELO:r[0-9]+]], #0
	; CHECK-DAG: movt [[ONEHI]], #16368			; CHECK-DAG: movt [[ONEHI]], #16368
	; CHECK-DAG: strd [[ONELO]], [[ONEHI]], [sp, #8]			; CHECK-DAG: strd [[ONELO]], [[ONEHI]], [sp, #8]

	; CHECK-M4F-LABEL: test_1double_misaligned:			; CHECK-M4F-LABEL: test_1double_misaligned:
	; CHECK-M4F: movs [[ONELO:r[0-9]+]], #0
	; CHECK-M4F: movs [[ONEHI:r[0-9]+]], #0			; CHECK-M4F: movs [[ONEHI:r[0-9]+]], #0
				; CHECK-M4F: movs [[ONELO:r[0-9]+]], #0
	; CHECK-M4F: movt [[ONEHI]], #16368			; CHECK-M4F: movt [[ONEHI]], #16368
	; CHECK-M4F-DAG: str [[ONELO]], [sp, #8]			; CHECK-M4F: strd [[ONELO]], [[ONEHI]], [sp, #8]
	; CHECK-M4F-DAG: str [[ONEHI]], [sp, #12]
	; CHECK-M4F: bl test_1double_misaligned			; CHECK-M4F: bl test_1double_misaligned

	ret void			ret void
	}			}

test/CodeGen/ARM/darwin-eabi.ll

	Show All 14 Lines

	define double @double_op(double %lhs, double %rhs) {			define double @double_op(double %lhs, double %rhs) {
	%sum = fadd double %lhs, %rhs			%sum = fadd double %lhs, %rhs
	ret double %sum			ret double %sum
	; CHECK-M3-LABEL: double_op:			; CHECK-M3-LABEL: double_op:
	; CHECK-M3: bl ___adddf3			; CHECK-M3: bl ___adddf3

	; CHECK-M4-LABEL: double_op:			; CHECK-M4-LABEL: double_op:
	; CHECK-M4: bl ___adddf3			; CHECK-M4: {{(blx\|b.w)}} ___adddf3
	}			}

test/CodeGen/Thumb2/aapcs.ll

This file was added.

				; RUN: llc < %s -mtriple=thumbv7-none-eabi -mcpu=cortex-m4 -mattr=-vfp2 \| FileCheck %s -check-prefix=CHECK -check-prefix=SOFT
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m4 -mattr=+vfp4,+fp-only-sp \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=SP
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-a8 -mattr=+vfp3 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP

				define float @float_in_reg(float %a, float %b) {
				entry:
				; CHECK-LABEL: float_in_reg:
				; SOFT: mov r0, r1
				; HARD: vmov.f32 s0, s1
				; CHECK-NEXT: bx lr
				ret float %b
				}

				define double @double_in_reg(double %a, double %b) {
				entry:
				; CHECK-LABEL: double_in_reg:
				; SOFT: mov r0, r2
				; SOFT: mov r1, r3
				; SP: vmov.f32 s0, s2
				; SP: vmov.f32 s1, s3
				; DP: vmov.f64 d0, d1
				; CHECK-NEXT: bx lr
				ret double %b
				}

				define float @float_on_stack(double %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, float %i) {
				; CHECK-LABEL: float_on_stack:
				; SOFT: ldr r0, [sp, #48]
				; HARD: vldr s0, [sp]
				; CHECK-NEXT: bx lr
				ret float %i
				}

				define double @double_on_stack(double %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, double %i) {
				; CHECK-LABEL: double_on_stack:
				; SOFT: ldr r0, [sp, #48]
				; SOFT: ldr r1, [sp, #52]
				; HARD: vldr d0, [sp]
				; CHECK-NEXT: bx lr
				ret double %i
				}

				define double @double_not_split(double %a, double %b, double %c, double %d, double %e, double %f, double %g, float %h, double %i) {
				; CHECK-LABEL: double_not_split:
				; SOFT: ldr r0, [sp, #48]
				; SOFT: ldr r1, [sp, #52]
				; HARD: vldr d0, [sp]
				; CHECK-NEXT: bx lr
				ret double %i
				}

test/CodeGen/Thumb2/cortex-fp.ll

Show All 12 Lines	; CORTEXA8: vmul.f32 d
ret float %0		ret float %0
}		}

define double @bar(double %a, double %b) {		define double @bar(double %a, double %b) {
entry:		entry:
; CHECK-LABEL: bar:		; CHECK-LABEL: bar:
%0 = fmul double %a, %b		%0 = fmul double %a, %b
; CORTEXM3: bl ___muldf3		; CORTEXM3: bl ___muldf3
; CORTEXM4: bl ___muldf3		; CORTEXM4: {{bl\|b.w}} ___muldf3
; CORTEXA8: vmul.f64 d		; CORTEXA8: vmul.f64 d
ret double %0		ret double %0
}		}

test/CodeGen/Thumb2/float-cmp.ll

This file was added.

				; RUN: llc < %s -mtriple=thumbv7-none-eabi -mcpu=cortex-m3 \| FileCheck %s -check-prefix=CHECK -check-prefix=NONE
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m4 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=SP
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-a8 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP

				rengolinUnsubmitted Not Done Reply Inline Actions You could have used the same HARD+SP/DP you used in the test above to simplify this test. rengolin: You could have used the same HARD+SP/DP you used in the test above to simplify this test.


				define i1 @cmp_f_false(float %a, float %b) {
				; CHECK-LABEL: cmp_f_false:
				; NONE: movs r0, #0
				; HARD: movs r0, #0
				%1 = fcmp false float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_oeq(float %a, float %b) {
				; CHECK-LABEL: cmp_f_oeq:
				; NONE: bl __aeabi_fcmpeq
				; HARD: vcmpe.f32
				; HARD: moveq r0, #1
				%1 = fcmp oeq float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_ogt(float %a, float %b) {
				; CHECK-LABEL: cmp_f_ogt:
				; NONE: bl __aeabi_fcmpgt
				; HARD: vcmpe.f32
				; HARD: movgt r0, #1
				%1 = fcmp ogt float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_oge(float %a, float %b) {
				; CHECK-LABEL: cmp_f_oge:
				; NONE: bl __aeabi_fcmpge
				; HARD: vcmpe.f32
				; HARD: movge r0, #1
				%1 = fcmp oge float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_olt(float %a, float %b) {
				; CHECK-LABEL: cmp_f_olt:
				; NONE: bl __aeabi_fcmplt
				; HARD: vcmpe.f32
				; HARD: movmi r0, #1
				%1 = fcmp olt float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_ole(float %a, float %b) {
				; CHECK-LABEL: cmp_f_ole:
				; NONE: bl __aeabi_fcmple
				; HARD: vcmpe.f32
				; HARD: movls r0, #1
				%1 = fcmp ole float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_one(float %a, float %b) {
				; CHECK-LABEL: cmp_f_one:
				; NONE: bl __aeabi_fcmpgt
				; NONE: bl __aeabi_fcmplt
				; HARD: vcmpe.f32
				; HARD: movmi r0, #1
				; HARD: movgt r0, #1
				%1 = fcmp one float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_ord(float %a, float %b) {
				; CHECK-LABEL: cmp_f_ord:
				; NONE: bl __aeabi_fcmpun
				; HARD: vcmpe.f32
				; HARD: movvc r0, #1
				%1 = fcmp ord float %a, %b
				ret i1 %1
				}define i1 @cmp_f_ueq(float %a, float %b) {
				; CHECK-LABEL: cmp_f_ueq:
				; NONE: bl __aeabi_fcmpeq
				; NONE: bl __aeabi_fcmpun
				; HARD: vcmpe.f32
				; HARD: moveq r0, #1
				; HARD: movvs r0, #1
				%1 = fcmp ueq float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_ugt(float %a, float %b) {
				; CHECK-LABEL: cmp_f_ugt:
				; NONE: bl __aeabi_fcmpgt
				; NONE: bl __aeabi_fcmpun
				; HARD: vcmpe.f32
				; HARD: movhi r0, #1
				%1 = fcmp ugt float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_uge(float %a, float %b) {
				; CHECK-LABEL: cmp_f_uge:
				; NONE: bl __aeabi_fcmpge
				; NONE: bl __aeabi_fcmpun
				; HARD: vcmpe.f32
				; HARD: movpl r0, #1
				%1 = fcmp uge float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_ult(float %a, float %b) {
				; CHECK-LABEL: cmp_f_ult:
				; NONE: bl __aeabi_fcmplt
				; NONE: bl __aeabi_fcmpun
				; HARD: vcmpe.f32
				; HARD: movlt r0, #1
				%1 = fcmp ult float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_ule(float %a, float %b) {
				; CHECK-LABEL: cmp_f_ule:
				; NONE: bl __aeabi_fcmple
				; NONE: bl __aeabi_fcmpun
				; HARD: vcmpe.f32
				; HARD: movle r0, #1
				%1 = fcmp ule float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_une(float %a, float %b) {
				; CHECK-LABEL: cmp_f_une:
				; NONE: bl __aeabi_fcmpeq
				; HARD: vcmpe.f32
				; HARD: movne r0, #1
				%1 = fcmp une float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_uno(float %a, float %b) {
				; CHECK-LABEL: cmp_f_uno:
				; NONE: bl __aeabi_fcmpun
				; HARD: vcmpe.f32
				; HARD: movvs r0, #1
				%1 = fcmp uno float %a, %b
				ret i1 %1
				}
				define i1 @cmp_f_true(float %a, float %b) {
				; CHECK-LABEL: cmp_f_true:
				; NONE: movs r0, #1
				; HARD: movs r0, #1
				%1 = fcmp true float %a, %b
				ret i1 %1
				}

				define i1 @cmp_d_false(double %a, double %b) {
				; CHECK-LABEL: cmp_d_false:
				; NONE: movs r0, #0
				; HARD: movs r0, #0
				%1 = fcmp false double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_oeq(double %a, double %b) {
				; CHECK-LABEL: cmp_d_oeq:
				; NONE: bl __aeabi_dcmpeq
				; SP: bl __aeabi_dcmpeq
				; DP: vcmpe.f64
				; DP: moveq r0, #1
				%1 = fcmp oeq double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_ogt(double %a, double %b) {
				; CHECK-LABEL: cmp_d_ogt:
				; NONE: bl __aeabi_dcmpgt
				; SP: bl __aeabi_dcmpgt
				; DP: vcmpe.f64
				; DP: movgt r0, #1
				%1 = fcmp ogt double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_oge(double %a, double %b) {
				; CHECK-LABEL: cmp_d_oge:
				; NONE: bl __aeabi_dcmpge
				; SP: bl __aeabi_dcmpge
				; DP: vcmpe.f64
				; DP: movge r0, #1
				%1 = fcmp oge double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_olt(double %a, double %b) {
				; CHECK-LABEL: cmp_d_olt:
				; NONE: bl __aeabi_dcmplt
				; SP: bl __aeabi_dcmplt
				; DP: vcmpe.f64
				; DP: movmi r0, #1
				%1 = fcmp olt double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_ole(double %a, double %b) {
				; CHECK-LABEL: cmp_d_ole:
				; NONE: bl __aeabi_dcmple
				; SP: bl __aeabi_dcmple
				; DP: vcmpe.f64
				; DP: movls r0, #1
				%1 = fcmp ole double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_one(double %a, double %b) {
				; CHECK-LABEL: cmp_d_one:
				; NONE: bl __aeabi_dcmpgt
				; NONE: bl __aeabi_dcmplt
				; SP: bl __aeabi_dcmpgt
				; SP: bl __aeabi_dcmplt
				; DP: vcmpe.f64
				; DP: movmi r0, #1
				; DP: movgt r0, #1
				%1 = fcmp one double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_ord(double %a, double %b) {
				; CHECK-LABEL: cmp_d_ord:
				; NONE: bl __aeabi_dcmpun
				; SP: bl __aeabi_dcmpun
				; DP: vcmpe.f64
				; DP: movvc r0, #1
				%1 = fcmp ord double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_ugt(double %a, double %b) {
				; CHECK-LABEL: cmp_d_ugt:
				; NONE: bl __aeabi_dcmpgt
				; NONE: bl __aeabi_dcmpun
				; SP: bl __aeabi_dcmpgt
				; SP: bl __aeabi_dcmpun
				; DP: vcmpe.f64
				; DP: movhi r0, #1
				%1 = fcmp ugt double %a, %b
				ret i1 %1
				}

				define i1 @cmp_d_ult(double %a, double %b) {
				; CHECK-LABEL: cmp_d_ult:
				; NONE: bl __aeabi_dcmplt
				; NONE: bl __aeabi_dcmpun
				; SP: bl __aeabi_dcmplt
				; SP: bl __aeabi_dcmpun
				; DP: vcmpe.f64
				; DP: movlt r0, #1
				%1 = fcmp ult double %a, %b
				ret i1 %1
				}


				define i1 @cmp_d_uno(double %a, double %b) {
				; CHECK-LABEL: cmp_d_uno:
				; NONE: bl __aeabi_dcmpun
				; SP: bl __aeabi_dcmpun
				; DP: vcmpe.f64
				; DP: movvs r0, #1
				%1 = fcmp uno double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_true(double %a, double %b) {
				; CHECK-LABEL: cmp_d_true:
				; NONE: movs r0, #1
				; HARD: movs r0, #1
				%1 = fcmp true double %a, %b
				ret i1 %1
				}
				define i1 @cmp_d_ueq(double %a, double %b) {
				; CHECK-LABEL: cmp_d_ueq:
				; NONE: bl __aeabi_dcmpeq
				; NONE: bl __aeabi_dcmpun
				; SP: bl __aeabi_dcmpeq
				; SP: bl __aeabi_dcmpun
				; DP: vcmpe.f64
				; DP: moveq r0, #1
				; DP: movvs r0, #1
				%1 = fcmp ueq double %a, %b
				ret i1 %1
				}

				define i1 @cmp_d_uge(double %a, double %b) {
				; CHECK-LABEL: cmp_d_uge:
				; NONE: bl __aeabi_dcmpge
				; NONE: bl __aeabi_dcmpun
				; SP: bl __aeabi_dcmpge
				; SP: bl __aeabi_dcmpun
				; DP: vcmpe.f64
				; DP: movpl r0, #1
				%1 = fcmp uge double %a, %b
				ret i1 %1
				}

				define i1 @cmp_d_ule(double %a, double %b) {
				; CHECK-LABEL: cmp_d_ule:
				; NONE: bl __aeabi_dcmple
				; NONE: bl __aeabi_dcmpun
				; SP: bl __aeabi_dcmple
				; SP: bl __aeabi_dcmpun
				; DP: vcmpe.f64
				; DP: movle r0, #1
				%1 = fcmp ule double %a, %b
				ret i1 %1
				}

				define i1 @cmp_d_une(double %a, double %b) {
				; CHECK-LABEL: cmp_d_une:
				; NONE: bl __aeabi_dcmpeq
				; SP: bl __aeabi_dcmpeq
				; DP: vcmpe.f64
				; DP: movne r0, #1
				%1 = fcmp une double %a, %b
				ret i1 %1
				}

test/CodeGen/Thumb2/float-intrinsics-double.ll

This file was added.

				; RUN: llc < %s -mtriple=thumbv7-none-eabi -mcpu=cortex-m3 \| FileCheck %s -check-prefix=CHECK -check-prefix=SOFT -check-prefix=NONE
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m4 \| FileCheck %s -check-prefix=CHECK -check-prefix=SOFT -check-prefix=SP
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-a7 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP

				declare double @llvm.sqrt.f64(double %Val)
				define double @sqrt_d(double %a) {
				; CHECK-LABEL: sqrt_d:
				; SOFT: {{(bl\|b)}} sqrt
				; HARD: vsqrt.f64 d0, d0
				%1 = call double @llvm.sqrt.f64(double %a)
				ret double %1
				}

				declare double @llvm.powi.f64(double %Val, i32 %power)
				define double @powi_d(double %a, i32 %b) {
				; CHECK-LABEL: powi_d:
				; SOFT: {{(bl\|b)}} __powidf2
				; HARD: b __powidf2
				%1 = call double @llvm.powi.f64(double %a, i32 %b)
				ret double %1
				}

				declare double @llvm.sin.f64(double %Val)
				define double @sin_d(double %a) {
				; CHECK-LABEL: sin_d:
				; SOFT: {{(bl\|b)}} sin
				; HARD: b sin
				%1 = call double @llvm.sin.f64(double %a)
				ret double %1
				}

				declare double @llvm.cos.f64(double %Val)
				define double @cos_d(double %a) {
				; CHECK-LABEL: cos_d:
				; SOFT: {{(bl\|b)}} cos
				; HARD: b cos
				%1 = call double @llvm.cos.f64(double %a)
				ret double %1
				}

				declare double @llvm.pow.f64(double %Val, double %power)
				define double @pow_d(double %a, double %b) {
				; CHECK-LABEL: pow_d:
				; SOFT: {{(bl\|b)}} pow
				; HARD: b pow
				%1 = call double @llvm.pow.f64(double %a, double %b)
				ret double %1
				}

				declare double @llvm.exp.f64(double %Val)
				define double @exp_d(double %a) {
				; CHECK-LABEL: exp_d:
				; SOFT: {{(bl\|b)}} exp
				; HARD: b exp
				%1 = call double @llvm.exp.f64(double %a)
				ret double %1
				}

				declare double @llvm.exp2.f64(double %Val)
				define double @exp2_d(double %a) {
				; CHECK-LABEL: exp2_d:
				; SOFT: {{(bl\|b)}} exp2
				; HARD: b exp2
				%1 = call double @llvm.exp2.f64(double %a)
				ret double %1
				}

				declare double @llvm.log.f64(double %Val)
				define double @log_d(double %a) {
				; CHECK-LABEL: log_d:
				; SOFT: {{(bl\|b)}} log
				; HARD: b log
				%1 = call double @llvm.log.f64(double %a)
				ret double %1
				}

				declare double @llvm.log10.f64(double %Val)
				define double @log10_d(double %a) {
				; CHECK-LABEL: log10_d:
				; SOFT: {{(bl\|b)}} log10
				; HARD: b log10
				%1 = call double @llvm.log10.f64(double %a)
				ret double %1
				}

				declare double @llvm.log2.f64(double %Val)
				define double @log2_d(double %a) {
				; CHECK-LABEL: log2_d:
				; SOFT: {{(bl\|b)}} log2
				; HARD: b log2
				%1 = call double @llvm.log2.f64(double %a)
				ret double %1
				}

				declare double @llvm.fma.f64(double %a, double %b, double %c)
				define double @fma_d(double %a, double %b, double %c) {
				; CHECK-LABEL: fma_d:
				; SOFT: {{(bl\|b)}} fma
				; HARD: vfma.f64
				%1 = call double @llvm.fma.f64(double %a, double %b, double %c)
				ret double %1
				}

				; FIXME: the FPv4-SP version is less efficient than the no-FPU version
				declare double @llvm.fabs.f64(double %Val)
				define double @abs_d(double %a) {
				; CHECK-LABEL: abs_d:
				; NONE: bic r1, r1, #-2147483648
				; SP: bl __aeabi_dcmpgt
				; SP: bl __aeabi_dcmpun
				; SP: bl __aeabi_dsub
				; DP: vabs.f64 d0, d0
				%1 = call double @llvm.fabs.f64(double %a)
				ret double %1
				}

				declare double @llvm.copysign.f64(double %Mag, double %Sgn)
				define double @copysign_d(double %a, double %b) {
				; CHECK-LABEL: copysign_d:
				; SOFT: lsrs [[REG:r[0-9]+]], r3, #31
				; SOFT: bfi r1, [[REG]], #31, #1
				; HARD: vmov.i32 [[REG:d[0-9]+]], #0x80000000
				; HARD: vshl.i64 [[REG]], [[REG]], #32
				; HARD: vbsl [[REG]], d
				%1 = call double @llvm.copysign.f64(double %a, double %b)
				ret double %1
				}

				declare double @llvm.floor.f64(double %Val)
				define double @floor_d(double %a) {
				; CHECK-LABEL: floor_d:
				; SOFT: {{(bl\|b)}} floor
				; HARD: b floor
				%1 = call double @llvm.floor.f64(double %a)
				ret double %1
				}

				declare double @llvm.ceil.f64(double %Val)
				define double @ceil_d(double %a) {
				; CHECK-LABEL: ceil_d:
				; SOFT: {{(bl\|b)}} ceil
				; HARD: b ceil
				%1 = call double @llvm.ceil.f64(double %a)
				ret double %1
				}

				declare double @llvm.trunc.f64(double %Val)
				define double @trunc_d(double %a) {
				; CHECK-LABEL: trunc_d:
				; SOFT: {{(bl\|b)}} trunc
				; HARD: b trunc
				%1 = call double @llvm.trunc.f64(double %a)
				ret double %1
				}

				declare double @llvm.rint.f64(double %Val)
				define double @rint_d(double %a) {
				; CHECK-LABEL: rint_d:
				; SOFT: {{(bl\|b)}} rint
				; HARD: b rint
				%1 = call double @llvm.rint.f64(double %a)
				ret double %1
				}

				declare double @llvm.nearbyint.f64(double %Val)
				define double @nearbyint_d(double %a) {
				; CHECK-LABEL: nearbyint_d:
				; SOFT: {{(bl\|b)}} nearbyint
				; HARD: b nearbyint
				%1 = call double @llvm.nearbyint.f64(double %a)
				ret double %1
				}

				declare double @llvm.round.f64(double %Val)
				define double @round_d(double %a) {
				; CHECK-LABEL: round_d:
				; SOFT: {{(bl\|b)}} round
				; HARD: b round
				%1 = call double @llvm.round.f64(double %a)
				ret double %1
				}

				declare double @llvm.fmuladd.f64(double %a, double %b, double %c)
				define double @fmuladd_d(double %a, double %b, double %c) {
				; CHECK-LABEL: fmuladd_d:
				; SOFT: bl __aeabi_dmul
				; SOFT: bl __aeabi_dadd
				; HARD: vmul.f64
				; HARD: vadd.f64
				%1 = call double @llvm.fmuladd.f64(double %a, double %b, double %c)
				ret double %1
				}

				declare i16 @llvm.convert.to.fp16.f64(double %a)
				define i16 @d_to_h(double %a) {
				; CHECK-LABEL: d_to_h:
				; SOFT: bl __aeabi_d2h
				; HARD: bl __aeabi_d2h
				%1 = call i16 @llvm.convert.to.fp16.f64(double %a)
				ret i16 %1
				}

				declare double @llvm.convert.from.fp16.f64(i16 %a)
				define double @h_to_d(i16 %a) {
				; CHECK-LABEL: h_to_d:
				; NONE: bl __gnu_h2f_ieee
				; NONE: bl __aeabi_f2d
				; SP: vcvtb.f32.f16
				; SP: bl __aeabi_f2d
				; DP: vcvtb.f32.f16
				; DP: vcvt.f64.f32
				%1 = call double @llvm.convert.from.fp16.f64(i16 %a)
				ret double %1
				}

test/CodeGen/Thumb2/float-intrinsics-float.ll

This file was added.

				; RUN: llc < %s -mtriple=thumbv7-none-eabi -mcpu=cortex-m3 \| FileCheck %s -check-prefix=CHECK -check-prefix=SOFT -check-prefix=NONE
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m4 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=SP
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-a7 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP

				declare float @llvm.sqrt.f32(float %Val)
				define float @sqrt_f(float %a) {
				; CHECK-LABEL: sqrt_f:
				; SOFT: bl sqrtf
				; HARD: vsqrt.f32 s0, s0
				%1 = call float @llvm.sqrt.f32(float %a)
				ret float %1
				}

				declare float @llvm.powi.f32(float %Val, i32 %power)
				define float @powi_f(float %a, i32 %b) {
				; CHECK-LABEL: powi_f:
				; SOFT: bl __powisf2
				; HARD: b __powisf2
				%1 = call float @llvm.powi.f32(float %a, i32 %b)
				ret float %1
				}

				declare float @llvm.sin.f32(float %Val)
				define float @sin_f(float %a) {
				; CHECK-LABEL: sin_f:
				; SOFT: bl sinf
				; HARD: b sinf
				%1 = call float @llvm.sin.f32(float %a)
				ret float %1
				}

				declare float @llvm.cos.f32(float %Val)
				define float @cos_f(float %a) {
				; CHECK-LABEL: cos_f:
				; SOFT: bl cosf
				; HARD: b cosf
				%1 = call float @llvm.cos.f32(float %a)
				ret float %1
				}

				declare float @llvm.pow.f32(float %Val, float %power)
				define float @pow_f(float %a, float %b) {
				; CHECK-LABEL: pow_f:
				; SOFT: bl powf
				; HARD: b powf
				%1 = call float @llvm.pow.f32(float %a, float %b)
				ret float %1
				}

				declare float @llvm.exp.f32(float %Val)
				define float @exp_f(float %a) {
				; CHECK-LABEL: exp_f:
				; SOFT: bl expf
				; HARD: b expf
				%1 = call float @llvm.exp.f32(float %a)
				ret float %1
				}

				declare float @llvm.exp2.f32(float %Val)
				define float @exp2_f(float %a) {
				; CHECK-LABEL: exp2_f:
				; SOFT: bl exp2f
				; HARD: b exp2f
				%1 = call float @llvm.exp2.f32(float %a)
				ret float %1
				}

				declare float @llvm.log.f32(float %Val)
				define float @log_f(float %a) {
				; CHECK-LABEL: log_f:
				; SOFT: bl logf
				; HARD: b logf
				%1 = call float @llvm.log.f32(float %a)
				ret float %1
				}

				declare float @llvm.log10.f32(float %Val)
				define float @log10_f(float %a) {
				; CHECK-LABEL: log10_f:
				; SOFT: bl log10f
				; HARD: b log10f
				%1 = call float @llvm.log10.f32(float %a)
				ret float %1
				}

				declare float @llvm.log2.f32(float %Val)
				define float @log2_f(float %a) {
				; CHECK-LABEL: log2_f:
				; SOFT: bl log2f
				; HARD: b log2f
				%1 = call float @llvm.log2.f32(float %a)
				ret float %1
				}

				declare float @llvm.fma.f32(float %a, float %b, float %c)
				define float @fma_f(float %a, float %b, float %c) {
				; CHECK-LABEL: fma_f:
				; SOFT: bl fmaf
				; HARD: vfma.f32
				%1 = call float @llvm.fma.f32(float %a, float %b, float %c)
				ret float %1
				}

				declare float @llvm.fabs.f32(float %Val)
				define float @abs_f(float %a) {
				; CHECK-LABEL: abs_f:
				; SOFT: bic r0, r0, #-2147483648
				; HARD: vabs.f32
				%1 = call float @llvm.fabs.f32(float %a)
				ret float %1
				}

				declare float @llvm.copysign.f32(float %Mag, float %Sgn)
				define float @copysign_f(float %a, float %b) {
				; CHECK-LABEL: copysign_f:
				; NONE: lsrs [[REG:r[0-9]+]], r{{[0-9]+}}, #31
				; NONE: bfi r{{[0-9]+}}, [[REG]], #31, #1
				; SP: lsrs [[REG:r[0-9]+]], r{{[0-9]+}}, #31
				; SP: bfi r{{[0-9]+}}, [[REG]], #31, #1
				; DP: vmov.i32 [[REG:d[0-9]+]], #0x80000000
				; DP: vbsl [[REG]], d
				%1 = call float @llvm.copysign.f32(float %a, float %b)
				ret float %1
				}

				declare float @llvm.floor.f32(float %Val)
				define float @floor_f(float %a) {
				; CHECK-LABEL: floor_f:
				; SOFT: bl floorf
				; HARD: b floorf
				%1 = call float @llvm.floor.f32(float %a)
				ret float %1
				}

				declare float @llvm.ceil.f32(float %Val)
				define float @ceil_f(float %a) {
				; CHECK-LABEL: ceil_f:
				; SOFT: bl ceilf
				; HARD: b ceilf
				%1 = call float @llvm.ceil.f32(float %a)
				ret float %1
				}

				declare float @llvm.trunc.f32(float %Val)
				define float @trunc_f(float %a) {
				; CHECK-LABEL: trunc_f:
				; SOFT: bl truncf
				; HARD: b truncf
				%1 = call float @llvm.trunc.f32(float %a)
				ret float %1
				}

				declare float @llvm.rint.f32(float %Val)
				define float @rint_f(float %a) {
				; CHECK-LABEL: rint_f:
				; SOFT: bl rintf
				; HARD: b rintf
				%1 = call float @llvm.rint.f32(float %a)
				ret float %1
				}

				declare float @llvm.nearbyint.f32(float %Val)
				define float @nearbyint_f(float %a) {
				; CHECK-LABEL: nearbyint_f:
				; SOFT: bl nearbyintf
				; HARD: b nearbyintf
				%1 = call float @llvm.nearbyint.f32(float %a)
				ret float %1
				}

				declare float @llvm.round.f32(float %Val)
				define float @round_f(float %a) {
				; CHECK-LABEL: round_f:
				; SOFT: bl roundf
				; HARD: b roundf
				%1 = call float @llvm.round.f32(float %a)
				ret float %1
				}

				; FIXME: why does cortex-m4 use vmla, while cortex-a7 uses vmul+vadd?
				; (these should be equivalent, even the rounding is the same)
				declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
				define float @fmuladd_f(float %a, float %b, float %c) {
				; CHECK-LABEL: fmuladd_f:
				; SOFT: bl __aeabi_fmul
				; SOFT: bl __aeabi_fadd
				; SP: vmla.f32
				; DP: vmul.f32
				; DP: vadd.f32
				%1 = call float @llvm.fmuladd.f32(float %a, float %b, float %c)
				ret float %1
				}

				declare i16 @llvm.convert.to.fp16.f32(float %a)
				define i16 @f_to_h(float %a) {
				; CHECK-LABEL: f_to_h:
				; SOFT: bl __gnu_f2h_ieee
				; HARD: vcvtb.f16.f32
				%1 = call i16 @llvm.convert.to.fp16.f32(float %a)
				ret i16 %1
				}

				declare float @llvm.convert.from.fp16.f32(i16 %a)
				define float @h_to_f(i16 %a) {
				; CHECK-LABEL: h_to_f:
				; SOFT: bl __gnu_h2f_ieee
				; HARD: vcvtb.f32.f16
				%1 = call float @llvm.convert.from.fp16.f32(i16 %a)
				ret float %1
				}

test/CodeGen/Thumb2/float-ops.ll

This file was added.

				; RUN: llc < %s -mtriple=thumbv7-none-eabi -mcpu=cortex-m3 \| FileCheck %s -check-prefix=CHECK -check-prefix=NONE
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-m4 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=SP
				; RUN: llc < %s -mtriple=thumbv7-none-eabihf -mcpu=cortex-a8 \| FileCheck %s -check-prefix=CHECK -check-prefix=HARD -check-prefix=DP

				rengolinUnsubmitted Not Done Reply Inline Actions This one still has some check redundancy. Sorry for being picky, but I fear people will later change checks without properly making sure it makes sense for both DP and SP. rengolin: This one still has some check redundancy. Sorry for being picky, but I fear people will later…
				define float @add_f(float %a, float %b) {
				entry:
				; CHECK-LABEL: add_f:
				; NONE: bl __aeabi_fadd
				; HARD: vadd.f32 s0, s0, s1
				%0 = fadd float %a, %b
				ret float %0
				}

				define double @add_d(double %a, double %b) {
				entry:
				; CHECK-LABEL: add_d:
				; NONE: bl __aeabi_dadd
				; SP: bl __aeabi_dadd
				; DP: vadd.f64 d0, d0, d1
				%0 = fadd double %a, %b
				ret double %0
				}

				define float @sub_f(float %a, float %b) {
				entry:
				; CHECK-LABEL: sub_f:
				; NONE: bl __aeabi_fsub
				; HARD: vsub.f32 s
				%0 = fsub float %a, %b
				ret float %0
				}

				define double @sub_d(double %a, double %b) {
				entry:
				; CHECK-LABEL: sub_d:
				; NONE: bl __aeabi_dsub
				; SP: bl __aeabi_dsub
				; DP: vsub.f64 d0, d0, d1
				%0 = fsub double %a, %b
				ret double %0
				}

				define float @mul_f(float %a, float %b) {
				entry:
				; CHECK-LABEL: mul_f:
				; NONE: bl __aeabi_fmul
				; HARD: vmul.f32 s
				%0 = fmul float %a, %b
				ret float %0
				}

				define double @mul_d(double %a, double %b) {
				entry:
				; CHECK-LABEL: mul_d:
				; NONE: bl __aeabi_dmul
				; SP: bl __aeabi_dmul
				; DP: vmul.f64 d0, d0, d1
				%0 = fmul double %a, %b
				ret double %0
				}

				define float @div_f(float %a, float %b) {
				entry:
				; CHECK-LABEL: div_f:
				; NONE: bl __aeabi_fdiv
				; HARD: vdiv.f32 s
				%0 = fdiv float %a, %b
				ret float %0
				}

				define double @div_d(double %a, double %b) {
				entry:
				; CHECK-LABEL: div_d:
				; NONE: bl __aeabi_ddiv
				; SP: bl __aeabi_ddiv
				; DP: vdiv.f64 d0, d0, d1
				%0 = fdiv double %a, %b
				ret double %0
				}

				define float @rem_f(float %a, float %b) {
				entry:
				; CHECK-LABEL: rem_f:
				; NONE: bl fmodf
				; HARD: b fmodf
				%0 = frem float %a, %b
				ret float %0
				}

				define double @rem_d(double %a, double %b) {
				entry:
				; CHECK-LABEL: rem_d:
				; NONE: bl fmod
				; HARD: b fmod
				%0 = frem double %a, %b
				ret double %0
				}

				define float @load_f(float* %a) {
				entry:
				; CHECK-LABEL: load_f:
				; NONE: ldr r0, [r0]
				; HARD: vldr s0, [r0]
				%0 = load float* %a, align 4
				ret float %0
				}

				define double @load_d(double* %a) {
				entry:
				; CHECK-LABEL: load_d:
				; NONE: ldm.w r0, {r0, r1}
				; HARD: vldr d0, [r0]
				%0 = load double* %a, align 8
				ret double %0
				}

				define void @store_f(float* %a, float %b) {
				entry:
				; CHECK-LABEL: store_f:
				; NONE: str r1, [r0]
				; HARD: vstr s0, [r0]
				store float %b, float* %a, align 4
				ret void
				}

				define void @store_d(double* %a, double %b) {
				entry:
				; CHECK-LABEL: store_d:
				; NONE: mov r1, r3
				; NONE: str r2, [r0]
				; NONE: str r1, [r0, #4]
				; HARD: vstr d0, [r0]
				store double %b, double* %a, align 8
				ret void
				}

				define double @f_to_d(float %a) {
				; CHECK-LABEL: f_to_d:
				; NONE: bl __aeabi_f2d
				; SP: bl __aeabi_f2d
				; DP: vcvt.f64.f32 d0, s0
				%1 = fpext float %a to double
				ret double %1
				}

				define float @d_to_f(double %a) {
				; CHECK-LABEL: d_to_f:
				; NONE: bl __aeabi_d2f
				; SP: bl __aeabi_d2f
				; DP: vcvt.f32.f64 s0, d0
				%1 = fptrunc double %a to float
				ret float %1
				}

				define i32 @f_to_si(float %a) {
				; CHECK-LABEL: f_to_si:
				; NONE: bl __aeabi_f2iz
				; HARD: vcvt.s32.f32 s0, s0
				; HARD: vmov r0, s0
				%1 = fptosi float %a to i32
				ret i32 %1
				}

				define i32 @d_to_si(double %a) {
				; CHECK-LABEL: d_to_si:
				; NONE: bl __aeabi_d2iz
				; SP: vmov r0, r1, d0
				; SP: bl __aeabi_d2iz
				; DP: vcvt.s32.f64 s0, d0
				; DP: vmov r0, s0
				%1 = fptosi double %a to i32
				ret i32 %1
				}

				define i32 @f_to_ui(float %a) {
				; CHECK-LABEL: f_to_ui:
				; NONE: bl __aeabi_f2uiz
				; HARD: vcvt.u32.f32 s0, s0
				; HARD: vmov r0, s0
				%1 = fptoui float %a to i32
				ret i32 %1
				}

				define i32 @d_to_ui(double %a) {
				; CHECK-LABEL: d_to_ui:
				; NONE: bl __aeabi_d2uiz
				; SP: vmov r0, r1, d0
				; SP: bl __aeabi_d2uiz
				; DP: vcvt.u32.f64 s0, d0
				; DP: vmov r0, s0
				%1 = fptoui double %a to i32
				ret i32 %1
				}

				define float @si_to_f(i32 %a) {
				; CHECK-LABEL: si_to_f:
				; NONE: bl __aeabi_i2f
				; HARD: vcvt.f32.s32 s0, s0
				%1 = sitofp i32 %a to float
				ret float %1
				}

				define double @si_to_d(i32 %a) {
				; CHECK-LABEL: si_to_d:
				; NONE: bl __aeabi_i2d
				; SP: bl __aeabi_i2d
				; DP: vcvt.f64.s32 d0, s0
				%1 = sitofp i32 %a to double
				ret double %1
				}

				define float @ui_to_f(i32 %a) {
				; CHECK-LABEL: ui_to_f:
				; NONE: bl __aeabi_ui2f
				; HARD: vcvt.f32.u32 s0, s0
				%1 = uitofp i32 %a to float
				ret float %1
				}

				define double @ui_to_d(i32 %a) {
				; CHECK-LABEL: ui_to_d:
				; NONE: bl __aeabi_ui2d
				; SP: bl __aeabi_ui2d
				; DP: vcvt.f64.u32 d0, s0
				%1 = uitofp i32 %a to double
				ret double %1
				}

				define float @bitcast_i_to_f(i32 %a) {
				; CHECK-LABEL: bitcast_i_to_f:
				; NONE-NOT: mov
				; HARD: vmov s0, r0
				%1 = bitcast i32 %a to float
				ret float %1
				}

				define double @bitcast_i_to_d(i64 %a) {
				; CHECK-LABEL: bitcast_i_to_d:
				; NONE-NOT: mov
				; HARD: vmov d0, r0, r1
				%1 = bitcast i64 %a to double
				ret double %1
				}

				define i32 @bitcast_f_to_i(float %a) {
				; CHECK-LABEL: bitcast_f_to_i:
				; NONE-NOT: mov
				; HARD: vmov r0, s0
				%1 = bitcast float %a to i32
				ret i32 %1
				}

				define i64 @bitcast_d_to_i(double %a) {
				; CHECK-LABEL: bitcast_d_to_i:
				; NONE-NOT: mov
				; HARD: vmov r0, r1, d0
				%1 = bitcast double %a to i64
				ret i64 %1
				}

				define float @select_f(float %a, float %b, i1 %c) {
				; CHECK-LABEL: select_f:
				; NONE: tst.w r2, #1
				; NONE: moveq r0, r1
				; HARD: tst.w r0, #1
				; HARD: vmovne.f32 s1, s0
				; HARD: vmov.f32 s0, s1
				%1 = select i1 %c, float %a, float %b
				ret float %1
				}

				define double @select_d(double %a, double %b, i1 %c) {
				; CHECK-LABEL: select_d:
				; NONE: ldr.w [[REG:r[0-9]+]], [sp]
				; NONE: ands [[REG]], [[REG]], #1
				; NONE: moveq r0, r2
				; NONE: moveq r1, r3
				; SP: ands r0, r0, #1
				; SP-DAG: vmov [[ALO:r[0-9]+]], [[AHI:r[0-9]+]], d0
				; SP-DAG: vmov [[BLO:r[0-9]+]], [[BHI:r[0-9]+]], d1
				; SP: itt ne
				; SP-DAG: movne [[BLO]], [[ALO]]
				; SP-DAG: movne [[BHI]], [[AHI]]
				; SP: vmov d0, [[BLO]], [[BHI]]
				; DP: tst.w r0, #1
				; DP: vmovne.f64 d1, d0
				; DP: vmov.f64 d0, d1
				%1 = select i1 %c, double %a, double %b
				ret double %1
				}

This is an archive of the discontinued LLVM Phabricator instance.

ARM: Enable DP copy, load and store instructions for FPv4-SPClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 12756

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/Target/ARM/ARMBaseInstrInfo.cpp

lib/Target/ARM/ARMCallingConv.h

lib/Target/ARM/ARMISelLowering.h

lib/Target/ARM/ARMISelLowering.cpp

lib/Target/ARM/ARMInstrVFP.td

test/CodeGen/ARM/aapcs-hfa-code.ll

test/CodeGen/ARM/darwin-eabi.ll

test/CodeGen/Thumb2/aapcs.ll

test/CodeGen/Thumb2/cortex-fp.ll

test/CodeGen/Thumb2/float-cmp.ll

test/CodeGen/Thumb2/float-intrinsics-double.ll

test/CodeGen/Thumb2/float-intrinsics-float.ll

test/CodeGen/Thumb2/float-ops.ll

ARM: Enable DP copy, load and store instructions for FPv4-SP
ClosedPublic