This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Armv8.2-A FP16 code generation (part 1/3)
ClosedPublic

Authored by SjoerdMeijer on Sep 27 2017, 6:40 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
rengolin
samparker
olista01
eli.friedman

Commits

rG011de9c0ca77: [ARM] Armv8.2-A FP16 code generation (part 1/3)
rL323512: [ARM] Armv8.2-A FP16 code generation (part 1/3)

Summary

This is the groundwork for Armv8.2-A FP16 code generation .

Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering
in the ARM backend soon, but for now this means that this:

_Float16 sub(_Float16 a, _Float16 b) {

return a + b;

}

gets lowered to this:

define float @sub(float %a.coerce, float %b.coerce) {
entry:

%0 = bitcast float %a.coerce to i32
%tmp.0.extract.trunc = trunc i32 %0 to i16
%1 = bitcast i16 %tmp.0.extract.trunc to half
<SNIP>
%add = fadd half %1, %3
<SNIP>

}

When FullFP16 is *not* supported, we don't make f16 a
legal type, and we get legalization for "free", i.e. nothing changes
and everything works as before. And also f16 argument passing/returning
is handled.

When FullFP16 is supported, we do make f16 a legal type,
and have 2 places that we need to patch up: f16 argument passing and
returning, which involves minor tweaks to avoid unnecessary code generation
for some bitcasts.

As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing that
we can codegen this instruction from IR, but more importantly, also to some
conversion instructions. These conversions were causing issue before in the FP16
and FullFP16 cases.

I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed that
these loads and stores had the wrong addressing mode specified: AddrMode5 instead
of AddrMode5FP16, which turned out not be implemented at all, so that has also been added.

This is the minimal patch that shows all the different moving parts. In patch 2/3 I will
add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A
FP16 instruction descriptions.

Diff Detail

Repository: rL LLVM

Event Timeline

SjoerdMeijer created this revision.Sep 27 2017, 6:40 AM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptSep 27 2017, 6:40 AM

Realised that I am not testing hard float abi without fullfp16 support. And now see that code generation is not really optimal: it is generating eabi calls for the fp16 <-> fp conversions, whereas it should use the normal convert instructions. Fixing this.

Hi Sjoerd,

Just one inline comment from me, but this approach looks good to me.

cheers,
sam

lib/Target/ARM/ARMCallingConv.td
24 ↗	(On Diff #116803)	should this use CCBitConvertToType instead, as you've done for the return values?
127 ↗	(On Diff #116803)	Same as above.

SjoerdMeijer edited the summary of this revision. (Show Details)Oct 4 2017, 2:16 AM

SjoerdMeijer added a reviewer: olista01.

I have:

removed the f16 type from the calling conventions except for ARM_AAPCS_VFP. It's necessary to add them elsewhere, the default behaviour is what we want.
added a test case using the hard float abi, but without fullfp16 support.
added some comments about the HPR register class.

Just clarifying a typo in my previous comment:

It's *not* necessary to add f16 elsewhere...

olista01 added inline comments.Oct 4 2017, 2:51 AM

test/CodeGen/ARM/fp16-instructions.ll
14 ↗	(On Diff #117638)	This looks like it's returning the result as a 32-bit float, which is wrong. It should be a 16-bit float in the least-significant half of s0. Also, are there any conversion instructions before the vsub? If so, it would be better to include them in the test, and if not then the arguments are being passed incorrectly too.

SjoerdMeijer added inline comments.Oct 4 2017, 5:41 AM

test/CodeGen/ARM/fp16-instructions.ll
14 ↗	(On Diff #117638)	Thanks Oliver. You're right and I do need to change the implementation of the calling conventions. For return values, the aapcs says: "A Half-precision Floating Point Type is returned in the least significant 16 bits of r0", so yes we want to see an vcvtb.f16.f32 s0, s0 for the return value here. And there are converts for the arguments at the moment, in fact 2 for each arguments. It thinks the args are passed in f32 regs. So there is a convert to f16, but then there's another convert to f32 because the operations is done in f32 (in this case). The aapcs says that: "If the argument is a Half-precision Floating Point Type its size is set to 4 bytes as if it had been copied to the least significant bits of a 32-bit register and the remaining bits filled with unspecified values", and so yes the args passing needs changing too.

SjoerdMeijer updated this revision to Diff 125330.Dec 4 2017, 6:38 AM

SjoerdMeijer retitled this revision from [ARM] Add f16 type support and code generation (part 1/2) to WIP: [ARM] Add f16 type support and code generation (part 1/2).

This fixes most issues, now I am working on the only remaining failure
fp16-promote.ll, which is missing an upconvert.

I've written an RFC to list here explaining the general approach:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119467.html
And as I explained there, I want to add f16 as a legal type, and don't want any
regressions for the storage-only type behaviour. That is the case, with one
exception in fp16-v3.ll, where I accept one minor performance
regression (not correctness) for now. That's a bitcast being codegen'd,
which I should avoid but still need to look into.

I've added @eli.friedman as a reviewer here who commented on the RFC.
And as the patch is becoming big, and while I am fixing the last issue, I was wondering
if @eli.friedman and @olista01 can double-check the approach and look if we think this patch
is how we want to approach this. Of course, any other comments, nit picks are also
welcome too.

And just as another reminder, the motivation to do all this groundwork and plumbing,
was to enable instructions selection for Armv8.2-A FP16 instructions, i.e. let instruction selection work on the f16 types. This patch, just a demonstrator (and a check), includes
instruction selection for some native f16 add and sub instructions.

This looks like a lot of additional complexity to deal with the case where we only have the conversion instructions, but you have made f16 a legal type.

Have you looked into the option of instead adjusting the calling convention lowering to check the original (pre-legalization) type of the argument? This would allow you to leave f16 illegal when we don't have the arithmetic instructions. There is some discussion ongoing at the moment [1] about some backends that already do something like this, and options for making that target-independent.

[1] http://lists.llvm.org/pipermail/llvm-dev/2018-January/120098.html

test/CodeGen/ARM/fp16-args.ll
36 ↗	(On Diff #129957)	This looks like an ABI change: we previously returned an f16 value packed into the bottom half of s0, now we return an f32 value in the whole of s0.

Hi Oliver,
Thanks for pointing this out and this alternative approach to adjust the calling convention lowering looks more robust. What I mean by that is:

although that approach does not come for free and involves creating a custom CCState and implementation,
it avoids custom lowering of the loads/stores, and the few other corner cases that needed changing. As we don't need this custom lowering anymore, we are not risking having missed another corner case.

So I will start exploring the CCState approach.

This is a rewrite, implementing the new approach:

Clang now passes and returns _Float16 values as floats, together with the required

bitconverts and truncs etc. to implement correct AAPCS behaviour, see also
https://reviews.llvm.org/D42318.
We will implement half-precision argument passing/returning lowering
in the ARM backend soon, but for now this means that this:

_Float16 sub(_Float16 a, _Float16 b) {
   return a + b;
}

gets lowered to this:

define float @sub(float %a.coerce, float %b.coerce)  {
entry:
  %0 = bitcast float %a.coerce to i32
  %tmp.0.extract.trunc = trunc i32 %0 to i16
  %1 = bitcast i16 %tmp.0.extract.trunc to half
  <SNIP>
  %add = fadd half %1, %3
  <SNIP>
}

When FullFP16 is *not* supported, we don't make f16 a

legal type, and we get legalization for "free", i.e. nothing changes
and everything works as before. And also f16 argument passing/returning
is handled (by the Clang patch, see 1. above).

3.1. When FullFP16 is supported, we do make f16 a legal type,
and have 2 places that we need to patch up: f16 argument passing and
returning, which involves minor tweaks to avoid unnecessary code generation
for some bitcasts.

3.2. As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing that
we can codegen this instruction from IR, but more importantly, also to some
conversion instructions. These conversions were causing issue before in the FP16
and FullFP16 cases.

3.3 I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed that
these loads and stores had the wrong addressing mode specified: AddrMode5 instead
of AddrMode5FP16, which turned out not be implemented at all, so that has also been added.
Splitting this out in a separate doesn't make sense I think, because if it is not used, as
was the case, we're also not testing it.

Therefore, I think this is the minimal patch that shows all the different moving parts.
Once we are happy with this patch, I would like to commit it first, just to make sure
we are happy with this groundwork. And then in part 2/2, I will add the remaining FP16
instruction descriptions.

olista01 added inline comments.Jan 24 2018, 7:26 AM

lib/Target/ARM/ARMCallingConv.td
159 ↗	(On Diff #131252)	Are the changes in this file needed if we're still using the clang hack for the calling convention?
lib/Target/ARM/ARMISelLowering.cpp
529 ↗	(On Diff #131252)	If this custom lowering is correct and profitable, why not have it always enabled?
2485 ↗	(On Diff #131252)	I think it would be better to fix the code-generation for bitcasts generally, rather than putting in these special cases for arguments and returns. Also, this doesn't seem to be working in one of your test cases (we get store/load pairs), do you know why?
4949 ↗	(On Diff #131252)	Again, it would be better to fix the code-generation for bitcasts, and let the usual DAG optimisations remove the unnecessary bitcasts/truncates/extends, than to put in special cases for arguments and returns.
lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h
273 ↗	(On Diff #131252)	Why is this needed? The new value 17 still fits in 5 bits. If this is actually needed, then do the *Shift values below need to be increased?
test/CodeGen/ARM/fp16-instructions.ll
46 ↗	(On Diff #131252)	This looks like we're getting the default legalisation for bitcasts, so we get worse code-gen than the SOFTFP-FP16 case. Could this be fixed by adding some tablegen patterns for bitcast?

SjoerdMeijer added inline comments.Jan 24 2018, 8:02 AM

lib/Target/ARM/ARMCallingConv.td
159 ↗	(On Diff #131252)	No, you're right, we don't need it. I will remove it.
lib/Target/ARM/ARMISelLowering.cpp
2485 ↗	(On Diff #131252)	I wanted to do this fix up in ExpandBITCAST as well, like I do for the f16 function arguments, but I simply was not able to get it working. Recognising the pattern is easy, but fixing up the subsequent CopyToReg and the ARMISD::RET_FLAG nodes, i.e. replace uses, chains, glues, for these nodes was such a pain and while the rewrite looked okay, I kept running in very late segfaults because there was some funny state left. I think doing the rewrite here is equally fine: it is straightforward and we get generation of the CopyToReg and return nodes for free by this simple rewrite here, the DAG replaces we would have to do are a lot more ugly I think. This rewrite has to be done here, or in ExpandBITCAST, because otherwise it is too late. Very early in the DAG creation and in the legalizer, this gets legalized to stack Stores and loads. Thus, tablegen rewrite rules to rewrite "bitcasts" is not going to help, because long before we do instruction selection, these bitconverts no longer exist. I will look into the loads/stores that you mentioned.
lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h
273 ↗	(On Diff #131252)	Thanks, will fix.

This addressing the straightforward comments:

the unnecessary calling conv change,
the mask for the imm8

I agree that the bitcast optimisation is generic, but for now I prefer to have
it working locally first, and we can address this in a follow-up patch.

I am now looking into the SOFTFP-FP16 case which looks like to be doing
default legalisation for bitcasts.

I've looked into it, and yes you're right, default legalisation for bitcasts in the SOFTFP-FULLFP16 is happening
resulting slightly worse codegen than SOFTFP-FP16.
In my previous approach and implementation, I had to custom lower Bitcasts (for a different reason though).
In a quick experiment, I copy-pasted the code, and that does what we want. However, I need to make a few
tweaks to it. For reviewing and testing purposes, i.e. not to change too many things at the same time, I am
suggesting this approach:

This groundwork is the 1st of 3 patches, let's see how this passes testing first.
Patch 2/3: improve lowering for Bitcasts. FP16 codegen is unaffected by patch 1/3, and the the custom lowering addresses an inefficiency and not a correctness issue for the new FULLFP16 codegen cases.
Patch 3/3: fill in the remaining FP16 match rules.

Does that sound ok?

Ok, I agree with the idea of committing this as a starting point and developing it gradually. Just a few nits left.

lib/Target/ARM/ARMCallingConv.td
191 ↗	(On Diff #131289)	There are still some changes in this file, are they required for this patch?
lib/Target/ARM/ARMISelLowering.cpp
4939 ↗	(On Diff #131289)	The new argument is unused, should the change below be guarded using this?
lib/Target/ARM/ARMInstrVFP.td
712 ↗	(On Diff #131289)	Commented-out code

samparker added inline comments.Jan 25 2018, 4:15 AM

lib/Target/ARM/ARMISelDAGToDAG.cpp
935 ↗	(On Diff #131289)	Please refactor this with the above function.
lib/Target/ARM/ARMInstrVFP.td
712 ↗	(On Diff #131289)	Lines to remove.

All feedback addressed. Thanks for the reviews Sam and Oliver!

Thanks for the changes. I'm dubious about the bitcast handling, but lets get it in and then iterate upon it. LGTM.

cheers!

This revision is now accepted and ready to land.Jan 26 2018, 12:59 AM

SjoerdMeijer retitled this revision from [ARM] Armv8.2-A FP16 code generation (part 1/2) to [ARM] Armv8.2-A FP16 code generation (part 1/3).Jan 26 2018, 1:20 AM

SjoerdMeijer edited the summary of this revision. (Show Details)

Closed by commit rL323512: [ARM] Armv8.2-A FP16 code generation (part 1/3) (authored by SjoerdMeijer). · Explain WhyJan 26 2018, 1:28 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

8 lines

3 lines

49 lines

73 lines

3 lines

35 lines

12 lines

Disassembler/

ARMDisassembler.cpp

7 lines

MCTargetDesc/

ARMBaseInfo.h

4 lines

test/

CodeGen/

ARM/

GlobalISel/

arm-unsupported.ll

2 lines

fp16-instructions.ll

72 lines

Diff 131552

llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp

Show First 20 Lines • Show All 2,403 Lines • ▼ Show 20 Lines	if (Opcode == ARM::ADDri) {
case ARMII::AddrMode5:		case ARMII::AddrMode5:
ImmIdx = FrameRegIdx+1;		ImmIdx = FrameRegIdx+1;
InstrOffs = ARM_AM::getAM5Offset(MI.getOperand(ImmIdx).getImm());		InstrOffs = ARM_AM::getAM5Offset(MI.getOperand(ImmIdx).getImm());
if (ARM_AM::getAM5Op(MI.getOperand(ImmIdx).getImm()) == ARM_AM::sub)		if (ARM_AM::getAM5Op(MI.getOperand(ImmIdx).getImm()) == ARM_AM::sub)
InstrOffs *= -1;		InstrOffs *= -1;
NumBits = 8;		NumBits = 8;
Scale = 4;		Scale = 4;
break;		break;
		case ARMII::AddrMode5FP16:
		ImmIdx = FrameRegIdx+1;
		InstrOffs = ARM_AM::getAM5Offset(MI.getOperand(ImmIdx).getImm());
		if (ARM_AM::getAM5Op(MI.getOperand(ImmIdx).getImm()) == ARM_AM::sub)
		InstrOffs *= -1;
		NumBits = 8;
		Scale = 2;
		break;
default:		default:
llvm_unreachable("Unsupported addressing mode!");		llvm_unreachable("Unsupported addressing mode!");
}		}

Offset += InstrOffs * Scale;		Offset += InstrOffs * Scale;
assert((Offset & (Scale-1)) == 0 && "Can't encode this offset!");		assert((Offset & (Scale-1)) == 0 && "Can't encode this offset!");
if (Offset < 0) {		if (Offset < 0) {
Offset = -Offset;		Offset = -Offset;
▲ Show 20 Lines • Show All 2,553 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMCallingConv.td

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	def RetCC_ARM_AAPCS : CallingConv<[
// Pass SwiftSelf in a callee saved register.		// Pass SwiftSelf in a callee saved register.
CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[R10]>>>,		CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[R10]>>>,

// A SwiftError is returned in R8.		// A SwiftError is returned in R8.
CCIfSwiftError<CCIfType<[i32], CCAssignToReg<[R8]>>>,		CCIfSwiftError<CCIfType<[i32], CCAssignToReg<[R8]>>>,

CCIfType<[f64, v2f64], CCCustom<"RetCC_ARM_AAPCS_Custom_f64">>,		CCIfType<[f64, v2f64], CCCustom<"RetCC_ARM_AAPCS_Custom_f64">>,
CCIfType<[f32], CCBitConvertToType<i32>>,		CCIfType<[f32], CCBitConvertToType<i32>>,

CCDelegateTo<RetCC_ARM_AAPCS_Common>		CCDelegateTo<RetCC_ARM_AAPCS_Common>
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM AAPCS-VFP (EABI) Calling Convention		// ARM AAPCS-VFP (EABI) Calling Convention
// Also used for FastCC (when VFP2 or later is available)		// Also used for FastCC (when VFP2 or later is available)
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Show All 30 Lines	def RetCC_ARM_AAPCS_VFP : CallingConv<[
CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[R10]>>>,		CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[R10]>>>,

// A SwiftError is returned in R8.		// A SwiftError is returned in R8.
CCIfSwiftError<CCIfType<[i32], CCAssignToReg<[R8]>>>,		CCIfSwiftError<CCIfType<[i32], CCAssignToReg<[R8]>>>,

CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>,		CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>,
CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>,		CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>,
CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8,		CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8,
S9, S10, S11, S12, S13, S14, S15]>>,		S9, S10, S11, S12, S13, S14, S15]>>,
CCDelegateTo<RetCC_ARM_AAPCS_Common>		CCDelegateTo<RetCC_ARM_AAPCS_Common>
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Callee-saved register lists.		// Callee-saved register lists.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def CSR_NoRegs : CalleeSavedRegs<(add)>;		def CSR_NoRegs : CalleeSavedRegs<(add)>;
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	bool SelectAddrMode2OffsetImm(SDNode *Op, SDValue N,
SDValue &Offset, SDValue &Opc);		SDValue &Offset, SDValue &Opc);
bool SelectAddrMode2OffsetImmPre(SDNode *Op, SDValue N,		bool SelectAddrMode2OffsetImmPre(SDNode *Op, SDValue N,
SDValue &Offset, SDValue &Opc);		SDValue &Offset, SDValue &Opc);
bool SelectAddrOffsetNone(SDValue N, SDValue &Base);		bool SelectAddrOffsetNone(SDValue N, SDValue &Base);
bool SelectAddrMode3(SDValue N, SDValue &Base,		bool SelectAddrMode3(SDValue N, SDValue &Base,
SDValue &Offset, SDValue &Opc);		SDValue &Offset, SDValue &Opc);
bool SelectAddrMode3Offset(SDNode *Op, SDValue N,		bool SelectAddrMode3Offset(SDNode *Op, SDValue N,
SDValue &Offset, SDValue &Opc);		SDValue &Offset, SDValue &Opc);
bool SelectAddrMode5(SDValue N, SDValue &Base,		bool IsAddressingMode5(SDValue N, SDValue &Base, SDValue &Offset,
SDValue &Offset);		int Lwb, int Upb, bool FP16);
		bool SelectAddrMode5(SDValue N, SDValue &Base, SDValue &Offset);
		bool SelectAddrMode5FP16(SDValue N, SDValue &Base, SDValue &Offset);
bool SelectAddrMode6(SDNode *Parent, SDValue N, SDValue &Addr,SDValue &Align);		bool SelectAddrMode6(SDNode *Parent, SDValue N, SDValue &Addr,SDValue &Align);
bool SelectAddrMode6Offset(SDNode *Op, SDValue N, SDValue &Offset);		bool SelectAddrMode6Offset(SDNode *Op, SDValue N, SDValue &Offset);

bool SelectAddrModePC(SDValue N, SDValue &Offset, SDValue &Label);		bool SelectAddrModePC(SDValue N, SDValue &Offset, SDValue &Label);

// Thumb Addressing Modes:		// Thumb Addressing Modes:
bool SelectThumbAddrModeRR(SDValue N, SDValue &Base, SDValue &Offset);		bool SelectThumbAddrModeRR(SDValue N, SDValue &Base, SDValue &Offset);
bool SelectThumbAddrModeImm5S(SDValue N, unsigned Scale, SDValue &Base,		bool SelectThumbAddrModeImm5S(SDValue N, unsigned Scale, SDValue &Base,
▲ Show 20 Lines • Show All 750 Lines • ▼ Show 20 Lines	bool ARMDAGToDAGISel::SelectAddrMode3Offset(SDNode *Op, SDValue N,
}		}

Offset = N;		Offset = N;
Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(AddSub, 0), SDLoc(Op),		Opc = CurDAG->getTargetConstant(ARM_AM::getAM3Opc(AddSub, 0), SDLoc(Op),
MVT::i32);		MVT::i32);
return true;		return true;
}		}

bool ARMDAGToDAGISel::SelectAddrMode5(SDValue N,		bool ARMDAGToDAGISel::IsAddressingMode5(SDValue N, SDValue &Base, SDValue &Offset,
SDValue &Base, SDValue &Offset) {		int Lwb, int Upb, bool FP16) {
if (!CurDAG->isBaseWithConstantOffset(N)) {		if (!CurDAG->isBaseWithConstantOffset(N)) {
Base = N;		Base = N;
if (N.getOpcode() == ISD::FrameIndex) {		if (N.getOpcode() == ISD::FrameIndex) {
int FI = cast<FrameIndexSDNode>(N)->getIndex();		int FI = cast<FrameIndexSDNode>(N)->getIndex();
Base = CurDAG->getTargetFrameIndex(		Base = CurDAG->getTargetFrameIndex(
FI, TLI->getPointerTy(CurDAG->getDataLayout()));		FI, TLI->getPointerTy(CurDAG->getDataLayout()));
} else if (N.getOpcode() == ARMISD::Wrapper &&		} else if (N.getOpcode() == ARMISD::Wrapper &&
N.getOperand(0).getOpcode() != ISD::TargetGlobalAddress &&		N.getOperand(0).getOpcode() != ISD::TargetGlobalAddress &&
N.getOperand(0).getOpcode() != ISD::TargetExternalSymbol &&		N.getOperand(0).getOpcode() != ISD::TargetExternalSymbol &&
N.getOperand(0).getOpcode() != ISD::TargetGlobalTLSAddress) {		N.getOperand(0).getOpcode() != ISD::TargetGlobalTLSAddress) {
Base = N.getOperand(0);		Base = N.getOperand(0);
}		}
Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::add, 0),		Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::add, 0),
SDLoc(N), MVT::i32);		SDLoc(N), MVT::i32);
return true;		return true;
}		}

// If the RHS is +/- imm8, fold into addr mode.		// If the RHS is +/- imm8, fold into addr mode.
int RHSC;		int RHSC;
if (isScaledConstantInRange(N.getOperand(1), /Scale=/4,		const int Scale = FP16 ? 2 : 4;
-256 + 1, 256, RHSC)) {
		if (isScaledConstantInRange(N.getOperand(1), Scale, Lwb, Upb, RHSC)) {
Base = N.getOperand(0);		Base = N.getOperand(0);
if (Base.getOpcode() == ISD::FrameIndex) {		if (Base.getOpcode() == ISD::FrameIndex) {
int FI = cast<FrameIndexSDNode>(Base)->getIndex();		int FI = cast<FrameIndexSDNode>(Base)->getIndex();
Base = CurDAG->getTargetFrameIndex(		Base = CurDAG->getTargetFrameIndex(
FI, TLI->getPointerTy(CurDAG->getDataLayout()));		FI, TLI->getPointerTy(CurDAG->getDataLayout()));
}		}

ARM_AM::AddrOpc AddSub = ARM_AM::add;		ARM_AM::AddrOpc AddSub = ARM_AM::add;
if (RHSC < 0) {		if (RHSC < 0) {
AddSub = ARM_AM::sub;		AddSub = ARM_AM::sub;
RHSC = -RHSC;		RHSC = -RHSC;
}		}

		if (FP16)
		Offset = CurDAG->getTargetConstant(ARM_AM::getAM5FP16Opc(AddSub, RHSC),
		SDLoc(N), MVT::i32);
		else
Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(AddSub, RHSC),		Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(AddSub, RHSC),
SDLoc(N), MVT::i32);		SDLoc(N), MVT::i32);

return true;		return true;
}		}

Base = N;		Base = N;

		if (FP16)
		Offset = CurDAG->getTargetConstant(ARM_AM::getAM5FP16Opc(ARM_AM::add, 0),
		SDLoc(N), MVT::i32);
		else
Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::add, 0),		Offset = CurDAG->getTargetConstant(ARM_AM::getAM5Opc(ARM_AM::add, 0),
SDLoc(N), MVT::i32);		SDLoc(N), MVT::i32);

return true;		return true;
}		}

		bool ARMDAGToDAGISel::SelectAddrMode5(SDValue N,
		SDValue &Base, SDValue &Offset) {
		int Lwb = -256 + 1;
		int Upb = 256;
		return IsAddressingMode5(N, Base, Offset, Lwb, Upb, /FP16=/ false);
		}

		bool ARMDAGToDAGISel::SelectAddrMode5FP16(SDValue N,
		SDValue &Base, SDValue &Offset) {
		int Lwb = -512 + 1;
		int Upb = 512;
		return IsAddressingMode5(N, Base, Offset, Lwb, Upb, /FP16=/ true);
		}

bool ARMDAGToDAGISel::SelectAddrMode6(SDNode *Parent, SDValue N, SDValue &Addr,		bool ARMDAGToDAGISel::SelectAddrMode6(SDNode *Parent, SDValue N, SDValue &Addr,
SDValue &Align) {		SDValue &Align) {
Addr = N;		Addr = N;

unsigned Alignment = 0;		unsigned Alignment = 0;

MemSDNode *MemN = cast<MemSDNode>(Parent);		MemSDNode *MemN = cast<MemSDNode>(Parent);

▲ Show 20 Lines • Show All 3,166 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	else
addRegisterClass(MVT::i32, &ARM::GPRRegClass);		addRegisterClass(MVT::i32, &ARM::GPRRegClass);

if (!Subtarget->useSoftFloat() && Subtarget->hasVFP2() &&		if (!Subtarget->useSoftFloat() && Subtarget->hasVFP2() &&
!Subtarget->isThumb1Only()) {		!Subtarget->isThumb1Only()) {
addRegisterClass(MVT::f32, &ARM::SPRRegClass);		addRegisterClass(MVT::f32, &ARM::SPRRegClass);
addRegisterClass(MVT::f64, &ARM::DPRRegClass);		addRegisterClass(MVT::f64, &ARM::DPRRegClass);
}		}

		if (Subtarget->hasFullFP16()) {
		addRegisterClass(MVT::f16, &ARM::HPRRegClass);
		// Clean up bitcast of incoming arguments if hard float abi is enabled.
		if (Subtarget->isTargetHardFloat())
		setOperationAction(ISD::BITCAST, MVT::i16, Custom);
		}

for (MVT VT : MVT::vector_valuetypes()) {		for (MVT VT : MVT::vector_valuetypes()) {
for (MVT InnerVT : MVT::vector_valuetypes()) {		for (MVT InnerVT : MVT::vector_valuetypes()) {
setTruncStoreAction(VT, InnerVT, Expand);		setTruncStoreAction(VT, InnerVT, Expand);
setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);
}		}

▲ Show 20 Lines • Show All 1,936 Lines • ▼ Show 20 Lines	ARMTargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
// Copy the result values into the output registers.		// Copy the result values into the output registers.
for (unsigned i = 0, realRVLocIdx = 0;		for (unsigned i = 0, realRVLocIdx = 0;
i != RVLocs.size();		i != RVLocs.size();
++i, ++realRVLocIdx) {		++i, ++realRVLocIdx) {
CCValAssign &VA = RVLocs[i];		CCValAssign &VA = RVLocs[i];
assert(VA.isRegLoc() && "Can only return in registers!");		assert(VA.isRegLoc() && "Can only return in registers!");

SDValue Arg = OutVals[realRVLocIdx];		SDValue Arg = OutVals[realRVLocIdx];
		bool ReturnF16 = false;

		if (Subtarget->hasFullFP16() && Subtarget->isTargetHardFloat()) {
		// Half-precision return values can be returned like this:
		//
		// t11 f16 = fadd ...
		// t12: i16 = bitcast t11
		// t13: i32 = zero_extend t12
		// t14: f32 = bitcast t13
		//
		// to avoid code generation for bitcasts, we simply set Arg to the node
		// that produces the f16 value, t11 in this case.
		//
		if (Arg.getValueType() == MVT::f32) {
		SDValue ZE = Arg.getOperand(0);
		if (ZE.getOpcode() == ISD::ZERO_EXTEND && ZE.getValueType() == MVT::i32) {
		SDValue BC = ZE.getOperand(0);
		if (BC.getOpcode() == ISD::BITCAST && BC.getValueType() == MVT::i16) {
		Arg = BC.getOperand(0);
		ReturnF16 = true;
		}
		}
		}
		}

switch (VA.getLocInfo()) {		switch (VA.getLocInfo()) {
default: llvm_unreachable("Unknown loc info!");		default: llvm_unreachable("Unknown loc info!");
case CCValAssign::Full: break;		case CCValAssign::Full: break;
case CCValAssign::BCvt:		case CCValAssign::BCvt:
		if (!ReturnF16)
Arg = DAG.getNode(ISD::BITCAST, dl, VA.getLocVT(), Arg);		Arg = DAG.getNode(ISD::BITCAST, dl, VA.getLocVT(), Arg);
break;		break;
}		}

if (VA.needsCustom()) {		if (VA.needsCustom()) {
if (VA.getLocVT() == MVT::v2f64) {		if (VA.getLocVT() == MVT::v2f64) {
// Extract the first half and return it in two registers.		// Extract the first half and return it in two registers.
SDValue Half = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Arg,		SDValue Half = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Arg,
DAG.getConstant(0, dl, MVT::i32));		DAG.getConstant(0, dl, MVT::i32));
Show All 31 Lines	if (VA.needsCustom()) {
fmrrd.getValue(isLittleEndian ? 1 : 0),		fmrrd.getValue(isLittleEndian ? 1 : 0),
Flag);		Flag);
} else		} else
Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), Arg, Flag);		Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), Arg, Flag);

// Guarantee that all emitted copies are		// Guarantee that all emitted copies are
// stuck together, avoiding something bad.		// stuck together, avoiding something bad.
Flag = Chain.getValue(1);		Flag = Chain.getValue(1);
RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));		RetOps.push_back(DAG.getRegister(VA.getLocReg(),
		ReturnF16 ? MVT::f16 : VA.getLocVT()));
}		}
const ARMBaseRegisterInfo *TRI = Subtarget->getRegisterInfo();		const ARMBaseRegisterInfo *TRI = Subtarget->getRegisterInfo();
const MCPhysReg *I =		const MCPhysReg *I =
TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());		TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());
if (I) {		if (I) {
for (; *I; ++I) {		for (; *I; ++I) {
if (ARM::GPRRegClass.contains(*I))		if (ARM::GPRRegClass.contains(*I))
RetOps.push_back(DAG.getRegister(*I, MVT::i32));		RetOps.push_back(DAG.getRegister(*I, MVT::i32));
▲ Show 20 Lines • Show All 1,140 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {
ArgValue = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64,		ArgValue = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64,
ArgValue, ArgValue2,		ArgValue, ArgValue2,
DAG.getIntPtrConstant(1, dl));		DAG.getIntPtrConstant(1, dl));
} else		} else
ArgValue = GetF64FormalArgument(VA, ArgLocs[++i], Chain, DAG, dl);		ArgValue = GetF64FormalArgument(VA, ArgLocs[++i], Chain, DAG, dl);
} else {		} else {
const TargetRegisterClass *RC;		const TargetRegisterClass *RC;

if (RegVT == MVT::f32)
		if (RegVT == MVT::f16)
		RC = &ARM::HPRRegClass;
		else if (RegVT == MVT::f32)
RC = &ARM::SPRRegClass;		RC = &ARM::SPRRegClass;
else if (RegVT == MVT::f64)		else if (RegVT == MVT::f64)
RC = &ARM::DPRRegClass;		RC = &ARM::DPRRegClass;
else if (RegVT == MVT::v2f64)		else if (RegVT == MVT::v2f64)
RC = &ARM::QPRRegClass;		RC = &ARM::QPRRegClass;
else if (RegVT == MVT::i32)		else if (RegVT == MVT::i32)
RC = AFI->isThumb1OnlyFunction() ? &ARM::tGPRRegClass		RC = AFI->isThumb1OnlyFunction() ? &ARM::tGPRRegClass
: &ARM::GPRRegClass;		: &ARM::GPRRegClass;
▲ Show 20 Lines • Show All 1,323 Lines • ▼ Show 20 Lines	static SDValue ExpandBITCAST(SDNode *N, SelectionDAG &DAG) {
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDLoc dl(N);		SDLoc dl(N);
SDValue Op = N->getOperand(0);		SDValue Op = N->getOperand(0);

// This function is only supposed to be called for i64 types, either as the		// This function is only supposed to be called for i64 types, either as the
// source or destination of the bit convert.		// source or destination of the bit convert.
EVT SrcVT = Op.getValueType();		EVT SrcVT = Op.getValueType();
EVT DstVT = N->getValueType(0);		EVT DstVT = N->getValueType(0);

		// Half-precision arguments can be passed in like this:
		//
		// t4: f32,ch = CopyFromReg t0, Register:f32 %1
		// t8: i32 = bitcast t4
		// t9: i16 = truncate t8
		// t10: f16 = bitcast t9 <~~~~ SDNode N
		//
		// but we want to avoid code generation for the bitcast, so transform this
		// into:
		//
		// t18: f16 = CopyFromReg t0, Register:f32 %0
		//
		if (SrcVT == MVT::i16 && DstVT == MVT::f16) {
		if (Op.getOpcode() != ISD::TRUNCATE)
		return SDValue();

		SDValue Bitcast = Op.getOperand(0);
		if (Bitcast.getOpcode() != ISD::BITCAST \|\|
		Bitcast.getValueType() != MVT::i32)
		return SDValue();

		SDValue Copy = Bitcast.getOperand(0);
		if (Copy.getOpcode() != ISD::CopyFromReg \|\|
		Copy.getValueType() != MVT::f32)
		return SDValue();

		SDValue Ops[] = { Copy->getOperand(0), Copy->getOperand(1) };
		return DAG.getNode(ISD::CopyFromReg, SDLoc(Copy), MVT::f16, Ops);
		}

assert((SrcVT == MVT::i64 \|\| DstVT == MVT::i64) &&		assert((SrcVT == MVT::i64 \|\| DstVT == MVT::i64) &&
"ExpandBITCAST called for non-i64 type");		"ExpandBITCAST called for non-i64 type");

// Turn i64->f64 into VMOVDRR.		// Turn i64->f64 into VMOVDRR.
if (SrcVT == MVT::i64 && TLI.isTypeLegal(DstVT)) {		if (SrcVT == MVT::i64 && TLI.isTypeLegal(DstVT)) {
// Do not force values to GPRs (this is what VMOVDRR does for the inputs)		// Do not force values to GPRs (this is what VMOVDRR does for the inputs)
// if we can combine the bitcast with its source.		// if we can combine the bitcast with its source.
if (SDValue Val = CombineVMOVDRRCandidateWithVecOp(N, DAG))		if (SDValue Val = CombineVMOVDRRCandidateWithVecOp(N, DAG))
▲ Show 20 Lines • Show All 9,508 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMInstrFormats.td

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
def AddrModeT1_4 : AddrMode<9>;		def AddrModeT1_4 : AddrMode<9>;
def AddrModeT1_s : AddrMode<10>;		def AddrModeT1_s : AddrMode<10>;
def AddrModeT2_i12 : AddrMode<11>;		def AddrModeT2_i12 : AddrMode<11>;
def AddrModeT2_i8 : AddrMode<12>;		def AddrModeT2_i8 : AddrMode<12>;
def AddrModeT2_so : AddrMode<13>;		def AddrModeT2_so : AddrMode<13>;
def AddrModeT2_pc : AddrMode<14>;		def AddrModeT2_pc : AddrMode<14>;
def AddrModeT2_i8s4 : AddrMode<15>;		def AddrModeT2_i8s4 : AddrMode<15>;
def AddrMode_i12 : AddrMode<16>;		def AddrMode_i12 : AddrMode<16>;
		def AddrMode5FP16 : AddrMode<17>;

// Load / store index mode.		// Load / store index mode.
class IndexMode<bits<2> val> {		class IndexMode<bits<2> val> {
bits<2> Value = val;		bits<2> Value = val;
}		}
def IndexModeNone : IndexMode<0>;		def IndexModeNone : IndexMode<0>;
def IndexModePre : IndexMode<1>;		def IndexModePre : IndexMode<1>;
def IndexModePost : IndexMode<2>;		def IndexModePost : IndexMode<2>;
▲ Show 20 Lines • Show All 1,403 Lines • ▼ Show 20 Lines	class ASI5<bits<4> opcod1, bits<2> opcod2, dag oops, dag iops,

// Loads & stores operate on both NEON and VFP pipelines.		// Loads & stores operate on both NEON and VFP pipelines.
let D = VFPNeonDomain;		let D = VFPNeonDomain;
}		}

class AHI5<bits<4> opcod1, bits<2> opcod2, dag oops, dag iops,		class AHI5<bits<4> opcod1, bits<2> opcod2, dag oops, dag iops,
InstrItinClass itin,		InstrItinClass itin,
string opc, string asm, list<dag> pattern>		string opc, string asm, list<dag> pattern>
: VFPI<oops, iops, AddrMode5, 4, IndexModeNone,		: VFPI<oops, iops, AddrMode5FP16, 4, IndexModeNone,
VFPLdStFrm, itin, opc, asm, "", pattern> {		VFPLdStFrm, itin, opc, asm, "", pattern> {
list<Predicate> Predicates = [HasFullFP16];		list<Predicate> Predicates = [HasFullFP16];

// Instruction operands.		// Instruction operands.
bits<5> Sd;		bits<5> Sd;
bits<13> addr;		bits<13> addr;

// Encode instruction operands.		// Encode instruction operands.
▲ Show 20 Lines • Show All 1,074 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMInstrVFP.td

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	def vfp_f64imm : Operand<f64>,
APFloat InVal = N->getValueAPF();		APFloat InVal = N->getValueAPF();
uint32_t enc = ARM_AM::getFP64Imm(InVal);		uint32_t enc = ARM_AM::getFP64Imm(InVal);
return CurDAG->getTargetConstant(enc, SDLoc(N), MVT::i32);		return CurDAG->getTargetConstant(enc, SDLoc(N), MVT::i32);
}]>> {		}]>> {
let PrintMethod = "printFPImmOperand";		let PrintMethod = "printFPImmOperand";
let ParserMatchClass = FPImmOperand;		let ParserMatchClass = FPImmOperand;
}		}

		def alignedload16 : PatFrag<(ops node:$ptr), (load node:$ptr), [{
		return cast<LoadSDNode>(N)->getAlignment() >= 2;
		}]>;

def alignedload32 : PatFrag<(ops node:$ptr), (load node:$ptr), [{		def alignedload32 : PatFrag<(ops node:$ptr), (load node:$ptr), [{
return cast<LoadSDNode>(N)->getAlignment() >= 4;		return cast<LoadSDNode>(N)->getAlignment() >= 4;
}]>;		}]>;

		def alignedstore16 : PatFrag<(ops node:$val, node:$ptr),
		(store node:$val, node:$ptr), [{
		return cast<StoreSDNode>(N)->getAlignment() >= 2;
		}]>;

def alignedstore32 : PatFrag<(ops node:$val, node:$ptr),		def alignedstore32 : PatFrag<(ops node:$val, node:$ptr),
(store node:$val, node:$ptr), [{		(store node:$val, node:$ptr), [{
return cast<StoreSDNode>(N)->getAlignment() >= 4;		return cast<StoreSDNode>(N)->getAlignment() >= 4;
}]>;		}]>;

// The VCVT to/from fixed-point instructions encode the 'fbits' operand		// The VCVT to/from fixed-point instructions encode the 'fbits' operand
// (the number of fixed bits) differently than it appears in the assembly		// (the number of fixed bits) differently than it appears in the assembly
// source. It's encoded as "Size - fbits" where Size is the size of the		// source. It's encoded as "Size - fbits" where Size is the size of the
Show All 24 Lines
def VLDRS : ASI5<0b1101, 0b01, (outs SPR:$Sd), (ins addrmode5:$addr),		def VLDRS : ASI5<0b1101, 0b01, (outs SPR:$Sd), (ins addrmode5:$addr),
IIC_fpLoad32, "vldr", "\t$Sd, $addr",		IIC_fpLoad32, "vldr", "\t$Sd, $addr",
[(set SPR:$Sd, (alignedload32 addrmode5:$addr))]> {		[(set SPR:$Sd, (alignedload32 addrmode5:$addr))]> {
// Some single precision VFP instructions may be executed on both NEON and VFP		// Some single precision VFP instructions may be executed on both NEON and VFP
// pipelines.		// pipelines.
let D = VFPNeonDomain;		let D = VFPNeonDomain;
}		}

def VLDRH : AHI5<0b1101, 0b01, (outs SPR:$Sd), (ins addrmode5fp16:$addr),		def VLDRH : AHI5<0b1101, 0b01, (outs HPR:$Sd), (ins addrmode5fp16:$addr),
IIC_fpLoad16, "vldr", ".16\t$Sd, $addr",		IIC_fpLoad16, "vldr", ".16\t$Sd, $addr",
[]>,		[(set HPR:$Sd, (alignedload16 addrmode5fp16:$addr))]>,
Requires<[HasFullFP16]>;		Requires<[HasFullFP16]>;

} // End of 'let canFoldAsLoad = 1, isReMaterializable = 1 in'		} // End of 'let canFoldAsLoad = 1, isReMaterializable = 1 in'

def VSTRD : ADI5<0b1101, 0b00, (outs), (ins DPR:$Dd, addrmode5:$addr),		def VSTRD : ADI5<0b1101, 0b00, (outs), (ins DPR:$Dd, addrmode5:$addr),
IIC_fpStore64, "vstr", "\t$Dd, $addr",		IIC_fpStore64, "vstr", "\t$Dd, $addr",
[(alignedstore32 (f64 DPR:$Dd), addrmode5:$addr)]>;		[(alignedstore32 (f64 DPR:$Dd), addrmode5:$addr)]>;

def VSTRS : ASI5<0b1101, 0b00, (outs), (ins SPR:$Sd, addrmode5:$addr),		def VSTRS : ASI5<0b1101, 0b00, (outs), (ins SPR:$Sd, addrmode5:$addr),
IIC_fpStore32, "vstr", "\t$Sd, $addr",		IIC_fpStore32, "vstr", "\t$Sd, $addr",
[(alignedstore32 SPR:$Sd, addrmode5:$addr)]> {		[(alignedstore32 SPR:$Sd, addrmode5:$addr)]> {
// Some single precision VFP instructions may be executed on both NEON and VFP		// Some single precision VFP instructions may be executed on both NEON and VFP
// pipelines.		// pipelines.
let D = VFPNeonDomain;		let D = VFPNeonDomain;
}		}

def VSTRH : AHI5<0b1101, 0b00, (outs), (ins SPR:$Sd, addrmode5fp16:$addr),		def VSTRH : AHI5<0b1101, 0b00, (outs), (ins HPR:$Sd, addrmode5fp16:$addr),
IIC_fpStore16, "vstr", ".16\t$Sd, $addr",		IIC_fpStore16, "vstr", ".16\t$Sd, $addr",
[]>,		[(alignedstore16 HPR:$Sd, addrmode5fp16:$addr)]>,
Requires<[HasFullFP16]>;		Requires<[HasFullFP16]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Load / store multiple Instructions.		// Load / store multiple Instructions.
//		//

multiclass vfp_ldst_mult<string asm, bit L_bit,		multiclass vfp_ldst_mult<string asm, bit L_bit,
InstrItinClass itin, InstrItinClass itin_upd> {		InstrItinClass itin, InstrItinClass itin_upd> {
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	def VADDS : ASbIn<0b11100, 0b11, 0, 0,
Sched<[WriteFPALU32]> {		Sched<[WriteFPALU32]> {
// Some single precision VFP instructions may be executed on both NEON and		// Some single precision VFP instructions may be executed on both NEON and
// VFP pipelines on A8.		// VFP pipelines on A8.
let D = VFPNeonA8Domain;		let D = VFPNeonA8Domain;
}		}

let TwoOperandAliasConstraint = "$Sn = $Sd" in		let TwoOperandAliasConstraint = "$Sn = $Sd" in
def VADDH : AHbI<0b11100, 0b11, 0, 0,		def VADDH : AHbI<0b11100, 0b11, 0, 0,
(outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),		(outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
IIC_fpALU16, "vadd", ".f16\t$Sd, $Sn, $Sm",		IIC_fpALU16, "vadd", ".f16\t$Sd, $Sn, $Sm",
[]>,		[(set HPR:$Sd, (fadd HPR:$Sn, HPR:$Sm))]>,
Sched<[WriteFPALU32]>;		Sched<[WriteFPALU32]>;

let TwoOperandAliasConstraint = "$Dn = $Dd" in		let TwoOperandAliasConstraint = "$Dn = $Dd" in
def VSUBD : ADbI<0b11100, 0b11, 1, 0,		def VSUBD : ADbI<0b11100, 0b11, 1, 0,
(outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),		(outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
IIC_fpALU64, "vsub", ".f64\t$Dd, $Dn, $Dm",		IIC_fpALU64, "vsub", ".f64\t$Dd, $Dn, $Dm",
[(set DPR:$Dd, (fsub DPR:$Dn, (f64 DPR:$Dm)))]>,		[(set DPR:$Dd, (fsub DPR:$Dn, (f64 DPR:$Dm)))]>,
Sched<[WriteFPALU64]>;		Sched<[WriteFPALU64]>;

let TwoOperandAliasConstraint = "$Sn = $Sd" in		let TwoOperandAliasConstraint = "$Sn = $Sd" in
def VSUBS : ASbIn<0b11100, 0b11, 1, 0,		def VSUBS : ASbIn<0b11100, 0b11, 1, 0,
(outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),		(outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),
IIC_fpALU32, "vsub", ".f32\t$Sd, $Sn, $Sm",		IIC_fpALU32, "vsub", ".f32\t$Sd, $Sn, $Sm",
[(set SPR:$Sd, (fsub SPR:$Sn, SPR:$Sm))]>,		[(set SPR:$Sd, (fsub SPR:$Sn, SPR:$Sm))]>,
Sched<[WriteFPALU32]>{		Sched<[WriteFPALU32]>{
// Some single precision VFP instructions may be executed on both NEON and		// Some single precision VFP instructions may be executed on both NEON and
// VFP pipelines on A8.		// VFP pipelines on A8.
let D = VFPNeonA8Domain;		let D = VFPNeonA8Domain;
}		}

let TwoOperandAliasConstraint = "$Sn = $Sd" in		let TwoOperandAliasConstraint = "$Sn = $Sd" in
def VSUBH : AHbI<0b11100, 0b11, 1, 0,		def VSUBH : AHbI<0b11100, 0b11, 1, 0,
(outs SPR:$Sd), (ins SPR:$Sn, SPR:$Sm),		(outs HPR:$Sd), (ins HPR:$Sn, HPR:$Sm),
IIC_fpALU16, "vsub", ".f16\t$Sd, $Sn, $Sm",		IIC_fpALU16, "vsub", ".f16\t$Sd, $Sn, $Sm",
[]>,		[(set HPR:$Sd, (fsub HPR:$Sn, HPR:$Sm))]>,
Sched<[WriteFPALU32]>;		Sched<[WriteFPALU32]>;

let TwoOperandAliasConstraint = "$Dn = $Dd" in		let TwoOperandAliasConstraint = "$Dn = $Dd" in
def VDIVD : ADbI<0b11101, 0b00, 0, 0,		def VDIVD : ADbI<0b11101, 0b00, 0, 0,
(outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),		(outs DPR:$Dd), (ins DPR:$Dn, DPR:$Dm),
IIC_fpDIV64, "vdiv", ".f64\t$Dd, $Dn, $Dm",		IIC_fpDIV64, "vdiv", ".f64\t$Dd, $Dn, $Dm",
[(set DPR:$Dd, (fdiv DPR:$Dn, (f64 DPR:$Dm)))]>,		[(set DPR:$Dd, (fdiv DPR:$Dn, (f64 DPR:$Dm)))]>,
Sched<[WriteFPDIV64]>;		Sched<[WriteFPDIV64]>;
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	def VCVTSD : VFPAI<(outs SPR:$Sd), (ins DPR:$Dm), VFPUnaryFrm,
let Inst{21-16} = 0b110111;		let Inst{21-16} = 0b110111;
let Inst{11-8} = 0b1011;		let Inst{11-8} = 0b1011;
let Inst{7-6} = 0b11;		let Inst{7-6} = 0b11;
let Inst{4} = 0;		let Inst{4} = 0;

let Predicates = [HasVFP2, HasDPVFP];		let Predicates = [HasVFP2, HasDPVFP];
}		}

// Between half, single and double-precision. For disassembly only.		// Between half, single and double-precision.

def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),		def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
/* FIXME */ IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm",		/* FIXME */ IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm",
[/* For disassembly only; pattern left blank */]>,		[ /* intentionally left blank, see rule below */ ]>,
Requires<[HasFP16]>,		Requires<[HasFP16]>,
Sched<[WriteFPCVT]>;		Sched<[WriteFPCVT]>;

		def : Pat<(f32 (fpextend HPR:$Sm)),
		(VCVTBHS (COPY_TO_REGCLASS HPR:$Sm, SPR))>;

def VCVTBSH: ASuI<0b11101, 0b11, 0b0011, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),		def VCVTBSH: ASuI<0b11101, 0b11, 0b0011, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
/* FIXME */ IIC_fpCVTHS, "vcvtb", ".f16.f32\t$Sd, $Sm",		/* FIXME */ IIC_fpCVTHS, "vcvtb", ".f16.f32\t$Sd, $Sm",
[/* For disassembly only; pattern left blank */]>,		[]>,
Requires<[HasFP16]>,		Requires<[HasFP16]>,
Sched<[WriteFPCVT]>;		Sched<[WriteFPCVT]>;

def VCVTTHS: ASuI<0b11101, 0b11, 0b0010, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sm),		def VCVTTHS: ASuI<0b11101, 0b11, 0b0010, 0b11, 0, (outs SPR:$Sd), (ins SPR:$Sm),
/* FIXME */ IIC_fpCVTSH, "vcvtt", ".f32.f16\t$Sd, $Sm",		/* FIXME */ IIC_fpCVTSH, "vcvtt", ".f32.f16\t$Sd, $Sm",
[/* For disassembly only; pattern left blank */]>,		[/* For disassembly only; pattern left blank */]>,
Requires<[HasFP16]>,		Requires<[HasFP16]>,
Sched<[WriteFPCVT]>;		Sched<[WriteFPCVT]>;
▲ Show 20 Lines • Show All 1,720 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMRegisterInfo.td

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	let AltOrders = [(add (decimate SPR, 2), SPR),
(decimate (rotl SPR, 1), 4),		(decimate (rotl SPR, 1), 4),
(decimate (rotl SPR, 1), 2))];		(decimate (rotl SPR, 1), 2))];
let AltOrderSelect = [{		let AltOrderSelect = [{
return 1 + MF.getSubtarget<ARMSubtarget>().useStride4VFPs(MF);		return 1 + MF.getSubtarget<ARMSubtarget>().useStride4VFPs(MF);
}];		}];
let DiagnosticString = "operand must be a register in range [s0, s31]";		let DiagnosticString = "operand must be a register in range [s0, s31]";
}		}

		def HPR : RegisterClass<"ARM", [f16], 32, (sequence "S%u", 0, 31)> {
		let AltOrders = [(add (decimate HPR, 2), SPR),
		(add (decimate HPR, 4),
		(decimate HPR, 2),
		(decimate (rotl HPR, 1), 4),
		(decimate (rotl HPR, 1), 2))];
		let AltOrderSelect = [{
		return 1 + MF.getSubtarget<ARMSubtarget>().useStride4VFPs(MF);
		}];
		let DiagnosticString = "operand must be a register in range [s0, s31]";
		}

// Subset of SPR which can be used as a source of NEON scalars for 16-bit		// Subset of SPR which can be used as a source of NEON scalars for 16-bit
// operations		// operations
def SPR_8 : RegisterClass<"ARM", [f32], 32, (sequence "S%u", 0, 15)> {		def SPR_8 : RegisterClass<"ARM", [f32], 32, (sequence "S%u", 0, 15)> {
let DiagnosticString = "operand must be a register in range [s0, s15]";		let DiagnosticString = "operand must be a register in range [s0, s15]";
}		}

// Scalar double precision floating point / generic 64-bit vector register		// Scalar double precision floating point / generic 64-bit vector register
// class.		// class.
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/Disassembler/ARMDisassembler.cpp

Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
static DecodeStatus DecodetGPRRegisterClass(MCInst &Inst, unsigned RegNo,		static DecodeStatus DecodetGPRRegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder);		uint64_t Address, const void *Decoder);
static DecodeStatus DecodetcGPRRegisterClass(MCInst &Inst, unsigned RegNo,		static DecodeStatus DecodetcGPRRegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder);		uint64_t Address, const void *Decoder);
static DecodeStatus DecoderGPRRegisterClass(MCInst &Inst, unsigned RegNo,		static DecodeStatus DecoderGPRRegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder);		uint64_t Address, const void *Decoder);
static DecodeStatus DecodeGPRPairRegisterClass(MCInst &Inst, unsigned RegNo,		static DecodeStatus DecodeGPRPairRegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder);		uint64_t Address, const void *Decoder);
		static DecodeStatus DecodeHPRRegisterClass(MCInst &Inst, unsigned RegNo,
		uint64_t Address, const void *Decoder);
static DecodeStatus DecodeSPRRegisterClass(MCInst &Inst, unsigned RegNo,		static DecodeStatus DecodeSPRRegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder);		uint64_t Address, const void *Decoder);
static DecodeStatus DecodeDPRRegisterClass(MCInst &Inst, unsigned RegNo,		static DecodeStatus DecodeDPRRegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder);		uint64_t Address, const void *Decoder);
static DecodeStatus DecodeDPR_8RegisterClass(MCInst &Inst, unsigned RegNo,		static DecodeStatus DecodeDPR_8RegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder);		uint64_t Address, const void *Decoder);
static DecodeStatus DecodeDPR_VFP2RegisterClass(MCInst &Inst,		static DecodeStatus DecodeDPR_VFP2RegisterClass(MCInst &Inst,
unsigned RegNo,		unsigned RegNo,
▲ Show 20 Lines • Show All 822 Lines • ▼ Show 20 Lines	static DecodeStatus DecodeSPRRegisterClass(MCInst &Inst, unsigned RegNo,
if (RegNo > 31)		if (RegNo > 31)
return MCDisassembler::Fail;		return MCDisassembler::Fail;

unsigned Register = SPRDecoderTable[RegNo];		unsigned Register = SPRDecoderTable[RegNo];
Inst.addOperand(MCOperand::createReg(Register));		Inst.addOperand(MCOperand::createReg(Register));
return MCDisassembler::Success;		return MCDisassembler::Success;
}		}

		static DecodeStatus DecodeHPRRegisterClass(MCInst &Inst, unsigned RegNo,
		uint64_t Address, const void *Decoder) {
		return DecodeSPRRegisterClass(Inst, RegNo, Address, Decoder);
		}

static const uint16_t DPRDecoderTable[] = {		static const uint16_t DPRDecoderTable[] = {
ARM::D0, ARM::D1, ARM::D2, ARM::D3,		ARM::D0, ARM::D1, ARM::D2, ARM::D3,
ARM::D4, ARM::D5, ARM::D6, ARM::D7,		ARM::D4, ARM::D5, ARM::D6, ARM::D7,
ARM::D8, ARM::D9, ARM::D10, ARM::D11,		ARM::D8, ARM::D9, ARM::D10, ARM::D11,
ARM::D12, ARM::D13, ARM::D14, ARM::D15,		ARM::D12, ARM::D13, ARM::D14, ARM::D15,
ARM::D16, ARM::D17, ARM::D18, ARM::D19,		ARM::D16, ARM::D17, ARM::D18, ARM::D19,
ARM::D20, ARM::D21, ARM::D22, ARM::D23,		ARM::D20, ARM::D21, ARM::D22, ARM::D23,
ARM::D24, ARM::D25, ARM::D26, ARM::D27,		ARM::D24, ARM::D25, ARM::D26, ARM::D27,
▲ Show 20 Lines • Show All 4,351 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	enum AddrMode {
AddrModeT1_2 = 8,		AddrModeT1_2 = 8,
AddrModeT1_4 = 9,		AddrModeT1_4 = 9,
AddrModeT1_s = 10, // i8 * 4 for pc and sp relative data		AddrModeT1_s = 10, // i8 * 4 for pc and sp relative data
AddrModeT2_i12 = 11,		AddrModeT2_i12 = 11,
AddrModeT2_i8 = 12,		AddrModeT2_i8 = 12,
AddrModeT2_so = 13,		AddrModeT2_so = 13,
AddrModeT2_pc = 14, // +/- i12 for pc relative data		AddrModeT2_pc = 14, // +/- i12 for pc relative data
AddrModeT2_i8s4 = 15, // i8 * 4		AddrModeT2_i8s4 = 15, // i8 * 4
AddrMode_i12 = 16		AddrMode_i12 = 16,
		AddrMode5FP16 = 17 // i8 * 2
};		};

inline static const char *AddrModeToString(AddrMode addrmode) {		inline static const char *AddrModeToString(AddrMode addrmode) {
switch (addrmode) {		switch (addrmode) {
case AddrModeNone: return "AddrModeNone";		case AddrModeNone: return "AddrModeNone";
case AddrMode1: return "AddrMode1";		case AddrMode1: return "AddrMode1";
case AddrMode2: return "AddrMode2";		case AddrMode2: return "AddrMode2";
case AddrMode3: return "AddrMode3";		case AddrMode3: return "AddrMode3";
case AddrMode4: return "AddrMode4";		case AddrMode4: return "AddrMode4";
case AddrMode5: return "AddrMode5";		case AddrMode5: return "AddrMode5";
		case AddrMode5FP16: return "AddrMode5FP16";
case AddrMode6: return "AddrMode6";		case AddrMode6: return "AddrMode6";
case AddrModeT1_1: return "AddrModeT1_1";		case AddrModeT1_1: return "AddrModeT1_1";
case AddrModeT1_2: return "AddrModeT1_2";		case AddrModeT1_2: return "AddrModeT1_2";
case AddrModeT1_4: return "AddrModeT1_4";		case AddrModeT1_4: return "AddrModeT1_4";
case AddrModeT1_s: return "AddrModeT1_s";		case AddrModeT1_s: return "AddrModeT1_s";
case AddrModeT2_i12: return "AddrModeT2_i12";		case AddrModeT2_i12: return "AddrModeT2_i12";
case AddrModeT2_i8: return "AddrModeT2_i8";		case AddrModeT2_i8: return "AddrModeT2_i8";
case AddrModeT2_so: return "AddrModeT2_so";		case AddrModeT2_so: return "AddrModeT2_so";
▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/GlobalISel/arm-unsupported.ll

	Show All 37 Lines
	define i17 @test_funny_ints(i17 %a, i17 %b) {			define i17 @test_funny_ints(i17 %a, i17 %b) {
	; CHECK: remark: {{.}} unable to lower arguments: i17 (i17, i17)			; CHECK: remark: {{.}} unable to lower arguments: i17 (i17, i17)
	; CHECK-LABEL: warning: Instruction selection used fallback path for test_funny_ints			; CHECK-LABEL: warning: Instruction selection used fallback path for test_funny_ints
	%res = add i17 %a, %b			%res = add i17 %a, %b
	ret i17 %res			ret i17 %res
	}			}

	define half @test_half(half %a, half %b) {			define half @test_half(half %a, half %b) {
	; CHECK: remark: {{.}} unable to lower arguments: half (half, half)			; CHECK: remark: {{.}} unable to lower arguments: half (half, half) (in function: test_half)
	; CHECK-LABEL: warning: Instruction selection used fallback path for test_half			; CHECK-LABEL: warning: Instruction selection used fallback path for test_half
	%res = fadd half %a, %b			%res = fadd half %a, %b
	ret half %res			ret half %res
	}			}

	declare [16 x i32] @ret_demotion_target()			declare [16 x i32] @ret_demotion_target()

	define [16 x i32] @test_ret_demotion() {			define [16 x i32] @test_ret_demotion() {
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/fp16-instructions.ll

				; SOFT:
				; RUN: llc < %s -mtriple=arm-none-eabi -float-abi=soft \| FileCheck %s --check-prefix=CHECK-SOFT

				; SOFTFP:
				; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+vfp3 \| FileCheck %s --check-prefix=CHECK-SOFTFP-VFP3
				; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+vfp4 \| FileCheck %s --check-prefix=CHECK-SOFTFP-FP16
				; RUN: llc < %s -mtriple=arm-none-eabi -mattr=+fullfp16 \| FileCheck %s --check-prefix=CHECK-SOFTFP-FULLFP16

				; HARD:
				; RUN: llc < %s -mtriple=arm-none-eabihf -mattr=+vfp3 \| FileCheck %s --check-prefix=CHECK-HARDFP-VFP3
				; RUN: llc < %s -mtriple=arm-none-eabihf -mattr=+vfp4 \| FileCheck %s --check-prefix=CHECK-HARDFP-FP16
				; RUN: llc < %s -mtriple=arm-none-eabihf -mattr=+fullfp16 \| FileCheck %s --check-prefix=CHECK-HARDFP-FULLFP16

				define float @Add(float %a.coerce, float %b.coerce) local_unnamed_addr {
				entry:
				%0 = bitcast float %a.coerce to i32
				%tmp.0.extract.trunc = trunc i32 %0 to i16
				%1 = bitcast i16 %tmp.0.extract.trunc to half
				%2 = bitcast float %b.coerce to i32
				%tmp1.0.extract.trunc = trunc i32 %2 to i16
				%3 = bitcast i16 %tmp1.0.extract.trunc to half
				%add = fadd half %1, %3
				%4 = bitcast half %add to i16
				%tmp4.0.insert.ext = zext i16 %4 to i32
				%5 = bitcast i32 %tmp4.0.insert.ext to float
				ret float %5

				; CHECK-SOFT: bl __aeabi_h2f
				; CHECK-SOFT: bl __aeabi_h2f
				; CHECK-SOFT: bl __aeabi_fadd
				; CHECK-SOFT: bl __aeabi_f2h

				; CHECK-SOFTFP-VFP3: bl __aeabi_h2f
				; CHECK-SOFTFP-VFP3: bl __aeabi_h2f
				; CHECK-SOFTFP-VFP3: vadd.f32
				; CHECK-SOFTFP-VFP3: bl __aeabi_f2h

				; CHECK-SOFTFP-FP16: vmov [[S2:s[0-9]]], r1
				; CHECK-SOFTFP-FP16: vmov [[S0:s[0-9]]], r0
				; CHECK-SOFTFP-FP16: vcvtb.f32.f16 [[S2]], [[S2]]
				; CHECK-SOFTFP-FP16: vcvtb.f32.f16 [[S0]], [[S0]]
				; CHECK-SOFTFP-FP16: vadd.f32 [[S0]], [[S0]], [[S2]]
				; CHECK-SOFTFP-FP16: vcvtb.f16.f32 [[S0]], [[S0]]
				; CHECK-SOFTFP-FP16: vmov r0, s0

				; CHECK-SOFTFP-FULLFP16: strh r1, {{.*}}
				; CHECK-SOFTFP-FULLFP16: strh r0, {{.*}}
				; CHECK-SOFTFP-FULLFP16: vldr.16 [[S0:s[0-9]]], {{.*}}
				; CHECK-SOFTFP-FULLFP16: vldr.16 [[S2:s[0-9]]], {{.*}}
				; CHECK-SOFTFP-FULLFP16: vadd.f16 [[S0]], [[S2]], [[S0]]
				; CHECK-SOFTFP-FULLFP16: vstr.16 [[S2:s[0-9]]], {{.*}}
				; CHECK-SOFTFP-FULLFP16: ldrh r0, {{.*}}
				; CHECK-SOFTFP-FULLFP16: mov pc, lr

				; CHECK-HARDFP-VFP3: vmov r{{.}}, s0
				; CHECK-HARDFP-VFP3: vmov{{.*}}, s1
				; CHECK-HARDFP-VFP3: bl __aeabi_h2f
				; CHECK-HARDFP-VFP3: bl __aeabi_h2f
				; CHECK-HARDFP-VFP3: vadd.f32
				; CHECK-HARDFP-VFP3: bl __aeabi_f2h
				; CHECK-HARDFP-VFP3: vmov s0, r0

				; CHECK-HARDFP-FP16: vcvtb.f32.f16 [[S2:s[0-9]]], s1
				; CHECK-HARDFP-FP16: vcvtb.f32.f16 [[S0:s[0-9]]], s0
				; CHECK-HARDFP-FP16: vadd.f32 [[S0]], [[S0]], [[S2]]
				; CHECK-HARDFP-FP16: vcvtb.f16.f32 [[S0]], [[S0]]

				; CHECK-HARDFP-FULLFP16: vadd.f16 s0, s0, s1
				; CHECK-HARDFP-FULLFP16-NEXT: mov pc, lr

				}