This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Armv8.2-A FP16 code generation (part 2/3)
ClosedPublic

Authored by SjoerdMeijer on Jan 26 2018, 9:05 AM.

Download Raw Diff

Details

Reviewers

samparker
olista01
eli.friedman
t.p.northover

Commits

rG98d5359ea2a0: [ARM] Armv8.2-A FP16 code generation (part 2/3)
rL323861: [ARM] Armv8.2-A FP16 code generation (part 2/3)

Summary

Half-precision arguments and return values are passed as if it
were an int or float for ARM. This results in truncates and
bitcasts to/from i16 and f16 values, which are legalized very
early to stack stores/loads. When FullFP16 is enabled, we want
to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Diff Detail

Repository: rL LLVM

Event Timeline

SjoerdMeijer created this revision.Jan 26 2018, 9:05 AM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptJan 26 2018, 9:05 AM

Have you tried adding tablegen patterns for bitconvert nodes between i16 and f16? That's how it currently works for f32<->i32, see the VMOVRS and VMOVSR instructions in ARMInstrVFP.td.

That should give us a better lowering of bitcasts (not using the default store/load lowering), but we might need some additional optimisations to remove the integer truncations where they are not needed.

I'm also concerned that this code might not be correct if it triggers on code other than that generated by the calling convention lowering. I'm thinking of the case where the source contains bitcasts i32->f16->i32. Would this change optimise away the truncation, changing the behaviour of that code?

lib/Target/ARM/ARMInstrVFP.td
754 ↗	(On Diff #131599)	This doesn't look right - f16_to_fp is a conversion from f16 to f32, but COPY_TO_REGCLASS doesn't do that.

Have you tried adding tablegen patterns for bitconvert nodes between i16 and f16?
That's how it currently works for f32<->i32, see the VMOVRS and VMOVSR instructions in ARMInstrVFP.td.

Yes, I've tried that. I think this case is different than VMOVRS because there are no illegal types
involved. We are trying to match something like this:

    t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t7: i16 = truncate t2
t8: f16 = bitcast t7

and the truncate (as an operand of the bitcast) is legalized to a stack store/load very early because i16
is illegal. Thus, instruction selection never get to see i16 <-> f16 bitcast patterns.
And also, because i16 is illegal, any match rule or rewrite pattern involving i16 is not going to work.
But let me know if I'm missing something here.

I'm also concerned that this code might not be correct if it triggers on code other than
that generated by the calling convention lowering.

I share(d) this concern a bit. But this should be a very local, peephole optimisation: just
for function arguments and return value, and I rely on the CopyToReg and CopyFrom
for that (only those patterns matched). But for this reason it is probably better to move
this logic to functions Lower argument and return value.

It might be intended to only apply to function arguments and returns, but those patterns for f16_to_fp and fp_to_f16 could match anywhere.

It might be intended to only apply to function arguments and returns, but those patterns for f16_to_fp and fp_to_f16 could match anywhere.

Just checking to see if I understand this correctly.

I am matching this:

    t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t7: i16 = truncate t2
t8: f16 = bitcast t7

and with the custom lowering in this patch and using node fp16_to_fp,
I am generating this:

  t2: i32,ch = CopyFromReg t0, Register:i32 %0
t18: f32 = fp16_to_fp t2

and using this rewrite pattern:

def : Pat<(f16_to_fp GPR:$a), 
          (f32 (COPY_TO_REGCLASS GPR:$a, HPR))>;

results in moves from int to fp registers:

vmov  s0, r1
vmov  s2, r0
vadd.f16  s0, s2, s0
...

That's what I meant with the comment:

// We use FP16_TO_FP just to model a GPR -> HPR move

I got inspiration for this approach from e.g. existing test case:

test/CodeGen/ARM/fp16-args.ll

which generates exactly the same DAG for its incoming half arguments:

  t2: i32,ch = CopyFromReg t0, Register:i32 %0
t18: f32 = fp16_to_fp t2

Thus, I am (re)using the same approach, except that I not doing the convert when
FullFP16 is enabled. Is your concern that I am changing the semantics of these nodes because
I am omitting this convert? The "definition" of these nodes read:

/// FP16_TO_FP, FP_TO_FP16 - These operators are used to perform promotions
/// and truncation for half-precision (16 bit) floating numbers. These nodes
/// form a semi-softened interface for dealing with f16 (as an i16), which
/// is often a storage-only type but has native conversions.

I liked the "semi-softened interface" part here, because that's how I am using
it in a new context; I was/am reluctant to introduce a new node here.

Is your concern that I am changing the semantics of these nodes because I am omitting this convert?

Yes, it looks like you are using the same nodes, but giving them the semantics of a bitcast between i16 and f16. That's a problem, because there is existing code that teats them as floating-point extends/truncates between f32 and f16 (represented as i16 for legalisation reasons). Sooner or later, one of these nodes is going to be created by your new code and consumed by existing code, or vice-versa, and we will emit the wrong code.

Okay, got it. Yes, I wanted to stretch the definition a little bit (for the FullFP16 case), but do
see that it might be confusing and not so nice. A new ARM specific node modeling this move
should be easy here, and much cleaner and clearer.

Addressed comments:

New ARM ISD nodes VMOVhr and VMOVrh have been introduced

to model moving half argument from int to fp registers and vice versa.
For example, for reading half arguments, the DAG looks like this:

  t2: i32,ch = CopyFromReg t0, Register:i32 %0
t18: f16 = ARMISD::VMOVhr t2
...

and for writing the return value:

     ...
   t20: i32 = ARMISD::VMOVrh t11
t16: ch,glue = CopyToReg t0, Register:i32 %r0, t20

Restricted the rewrite of Bitcasts further to avoid it triggering

where it shouldn't by checking the EntryNode and RET_FLAG nodes.

I'm still not convinced about the correctness of this transformation: you are turning code which contains truncates and extends into code that doesn't, without checking whether the top 16 bits could be relevant. This happens to be OK if the value is coming from/going to an fp16 arithmetic instruction, which ignores/clears the top 16 bits, but I don't think it's correct in all cases.

I think a better way to do this would be:

Define the new DAG nodes as clearing/ignoring the top 16 bits on the i32 side, and lower it to the vmov.i16 instructions which do this.
Lower bitcasts involving f16 to these DAG nodes, without checking what instructions are around them.
Add DAG combines to fold zexts and truncates into the new nodes where that is legal.

*) New nodes ARMISD::VMOVhr and ARMISD::VMOVrh are now defined to be clearing
the top 16 bits.

*) The match rules to instruction select vmov.f16 use the ARMISD::VMOVhr and
ARMISD nodes; thus they direct map on vmov.f16.

*) The logic in ExpandBITCAST has been simplified. Node ARMISD::VMOVhr is
created for i32->i16->f16 truncate/bitcast patterns, so this:

    t2: i32 = ..
  t7: i16 = truncate t2
t8: f16 = bitcast t7

now becomes:

  t2: i32 = ..
t18: f16 = ARMISD::VMOVhr t2

This is what we want for soft-float ABI when half args are passed as ints. And
it is simpler than before we don't have to look at the CopyFromReg node. Thus,
we now generate this for an f16 add that works on 2 half operands that are
passed as integer args:

vmov.f16  s0, r1
vmov.f16  s2, r0
vadd.f16  s0, s2, s0
vmov.f16  r0, s0

*) For hard-float ABI and FullFP16, the initial pattern is a bit different:

      t2: f32,ch = CopyFromReg t0, Register:f32 %0
    t5: i32 = bitcast t2
  t6: i16 = truncate t5
t7: f16 = bitcast t6

This is now a 2-step approach. First, the i32->i16->f16 truncate/bitcast
pattern matches:

    t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t5: i32 = bitcast t2
t18: f16 = ARMISD::VMOVhr t5

And then, in the 2nd step, the f32->i32 bitcast and move is rewritten to just
this:

f16 = CopyFromReg t0, Register:f32 %1

which is what we need to avoid generating unnecessary moves for hard-float
FullFP16, and just generate the data processing instruction:

vadd.f16  s0, s0, s1

I've kept the logic for this last rewrite also in ExpandBITCAST (as opposed to
moving it DAG combine), because I am matching the bitcast and this looks the
right place to do this.

*) Any missed bitcast rewrite opportunities will get the default legalization
treatment and result in stack stores/loads, thus these missed opportunities are
not correctness issues

Thanks for making these changes, LGTM.

This revision is now accepted and ready to land.Jan 31 2018, 2:01 AM

Many thanks for the reviews and comments!

Closed by commit rL323861: [ARM] Armv8.2-A FP16 code generation (part 2/3) (authored by SjoerdMeijer). · Explain WhyJan 31 2018, 2:20 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMISelLowering.h

4 lines

ARMISelLowering.cpp

104 lines

ARMInstrVFP.td

13 lines

test/

CodeGen/

ARM/

fp16-instructions.ll

14 lines

Diff 132120

llvm/trunk/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

// Vector move immediate and move negated immediate:		// Vector move immediate and move negated immediate:
VMOVIMM,		VMOVIMM,
VMVNIMM,		VMVNIMM,

// Vector move f32 immediate:		// Vector move f32 immediate:
VMOVFPIMM,		VMOVFPIMM,

		// Move H <-> R, clearing top 16 bits
		VMOVrh,
		VMOVhr,

// Vector duplicate:		// Vector duplicate:
VDUP,		VDUP,
VDUPLANE,		VDUPLANE,

// Vector shuffles:		// Vector shuffles:
VEXT, // extract		VEXT, // extract
VREV64, // reverse elements within 64-bit doublewords		VREV64, // reverse elements within 64-bit doublewords
VREV32, // reverse elements within 32-bit words		VREV32, // reverse elements within 32-bit words
▲ Show 20 Lines • Show All 618 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,
if (!Subtarget->useSoftFloat() && Subtarget->hasVFP2() &&		if (!Subtarget->useSoftFloat() && Subtarget->hasVFP2() &&
!Subtarget->isThumb1Only()) {		!Subtarget->isThumb1Only()) {
addRegisterClass(MVT::f32, &ARM::SPRRegClass);		addRegisterClass(MVT::f32, &ARM::SPRRegClass);
addRegisterClass(MVT::f64, &ARM::DPRRegClass);		addRegisterClass(MVT::f64, &ARM::DPRRegClass);
}		}

if (Subtarget->hasFullFP16()) {		if (Subtarget->hasFullFP16()) {
addRegisterClass(MVT::f16, &ARM::HPRRegClass);		addRegisterClass(MVT::f16, &ARM::HPRRegClass);
// Clean up bitcast of incoming arguments if hard float abi is enabled.
if (Subtarget->isTargetHardFloat())
setOperationAction(ISD::BITCAST, MVT::i16, Custom);		setOperationAction(ISD::BITCAST, MVT::i16, Custom);
		setOperationAction(ISD::BITCAST, MVT::i32, Custom);
		setOperationAction(ISD::BITCAST, MVT::f16, Custom);
}		}

for (MVT VT : MVT::vector_valuetypes()) {		for (MVT VT : MVT::vector_valuetypes()) {
for (MVT InnerVT : MVT::vector_valuetypes()) {		for (MVT InnerVT : MVT::vector_valuetypes()) {
setTruncStoreAction(VT, InnerVT, Expand);		setTruncStoreAction(VT, InnerVT, Expand);
setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);
▲ Show 20 Lines • Show All 730 Lines • ▼ Show 20 Lines	const char *ARMTargetLowering::getTargetNodeName(unsigned Opcode) const {

case ARMISD::ADDC: return "ARMISD::ADDC";		case ARMISD::ADDC: return "ARMISD::ADDC";
case ARMISD::ADDE: return "ARMISD::ADDE";		case ARMISD::ADDE: return "ARMISD::ADDE";
case ARMISD::SUBC: return "ARMISD::SUBC";		case ARMISD::SUBC: return "ARMISD::SUBC";
case ARMISD::SUBE: return "ARMISD::SUBE";		case ARMISD::SUBE: return "ARMISD::SUBE";

case ARMISD::VMOVRRD: return "ARMISD::VMOVRRD";		case ARMISD::VMOVRRD: return "ARMISD::VMOVRRD";
case ARMISD::VMOVDRR: return "ARMISD::VMOVDRR";		case ARMISD::VMOVDRR: return "ARMISD::VMOVDRR";
		case ARMISD::VMOVhr: return "ARMISD::VMOVhr";
		case ARMISD::VMOVrh: return "ARMISD::VMOVrh";

case ARMISD::EH_SJLJ_SETJMP: return "ARMISD::EH_SJLJ_SETJMP";		case ARMISD::EH_SJLJ_SETJMP: return "ARMISD::EH_SJLJ_SETJMP";
case ARMISD::EH_SJLJ_LONGJMP: return "ARMISD::EH_SJLJ_LONGJMP";		case ARMISD::EH_SJLJ_LONGJMP: return "ARMISD::EH_SJLJ_LONGJMP";
case ARMISD::EH_SJLJ_SETUP_DISPATCH: return "ARMISD::EH_SJLJ_SETUP_DISPATCH";		case ARMISD::EH_SJLJ_SETUP_DISPATCH: return "ARMISD::EH_SJLJ_SETUP_DISPATCH";

case ARMISD::TC_RETURN: return "ARMISD::TC_RETURN";		case ARMISD::TC_RETURN: return "ARMISD::TC_RETURN";

case ARMISD::THREAD_POINTER:return "ARMISD::THREAD_POINTER";		case ARMISD::THREAD_POINTER:return "ARMISD::THREAD_POINTER";
▲ Show 20 Lines • Show All 3,762 Lines • ▼ Show 20 Lines	return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, DstVT, BitCast,
DAG.getConstant(NewIndex.getZExtValue(), dl, MVT::i32));		DAG.getConstant(NewIndex.getZExtValue(), dl, MVT::i32));
}		}

/// ExpandBITCAST - If the target supports VFP, this function is called to		/// ExpandBITCAST - If the target supports VFP, this function is called to
/// expand a bit convert where either the source or destination type is i64 to		/// expand a bit convert where either the source or destination type is i64 to
/// use a VMOVDRR or VMOVRRD node. This should not be done when the non-i64		/// use a VMOVDRR or VMOVRRD node. This should not be done when the non-i64
/// operand type is illegal (e.g., v2f32 for a target that doesn't support		/// operand type is illegal (e.g., v2f32 for a target that doesn't support
/// vectors), since the legalizer won't know what to do with that.		/// vectors), since the legalizer won't know what to do with that.
static SDValue ExpandBITCAST(SDNode *N, SelectionDAG &DAG) {		static SDValue ExpandBITCAST(SDNode *N, SelectionDAG &DAG,
		const ARMSubtarget *Subtarget) {
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDLoc dl(N);		SDLoc dl(N);
SDValue Op = N->getOperand(0);		SDValue Op = N->getOperand(0);

// This function is only supposed to be called for i64 types, either as the		// This function is only supposed to be called for i64 types, either as the
// source or destination of the bit convert.		// source or destination of the bit convert.
EVT SrcVT = Op.getValueType();		EVT SrcVT = Op.getValueType();
EVT DstVT = N->getValueType(0);		EVT DstVT = N->getValueType(0);
		const bool HasFullFP16 = Subtarget->hasFullFP16();

// Half-precision arguments can be passed in like this:		if (SrcVT == MVT::f32 && DstVT == MVT::i32) {
//		// FullFP16: half values are passed in S-registers, and we don't
// t4: f32,ch = CopyFromReg t0, Register:f32 %1		// need any of the bitcast and moves:
// t8: i32 = bitcast t4
// t9: i16 = truncate t8
// t10: f16 = bitcast t9 <~~~~ SDNode N
//
// but we want to avoid code generation for the bitcast, so transform this
// into:
//
// t18: f16 = CopyFromReg t0, Register:f32 %0
//		//
		// t2: f32,ch = CopyFromReg t0, Register:f32 %0
		// t5: i32 = bitcast t2
		// t18: f16 = ARMISD::VMOVhr t5
		if (Op.getOpcode() != ISD::CopyFromReg \|\|
		Op.getValueType() != MVT::f32)
		return SDValue();

		auto Move = N->use_begin();
		if (Move->getOpcode() != ARMISD::VMOVhr)
		return SDValue();

		SDValue Ops[] = { Op.getOperand(0), Op.getOperand(1) };
		SDValue Copy = DAG.getNode(ISD::CopyFromReg, SDLoc(Op), MVT::f16, Ops);
		DAG.ReplaceAllUsesWith(*Move, &Copy);
		return Copy;
		}

if (SrcVT == MVT::i16 && DstVT == MVT::f16) {		if (SrcVT == MVT::i16 && DstVT == MVT::f16) {
if (Op.getOpcode() != ISD::TRUNCATE)		if (!HasFullFP16)
return SDValue();		return SDValue();
		// SoftFP: read half-precision arguments:
		//
		// t2: i32,ch = ...
		// t7: i16 = truncate t2 <~~~~ Op
		// t8: f16 = bitcast t7 <~~~~ N
		//
		if (Op.getOperand(0).getValueType() == MVT::i32)
		return DAG.getNode(ARMISD::VMOVhr, SDLoc(Op),
		MVT::f16, Op.getOperand(0));

SDValue Bitcast = Op.getOperand(0);
if (Bitcast.getOpcode() != ISD::BITCAST \|\|
Bitcast.getValueType() != MVT::i32)
return SDValue();		return SDValue();
		}

SDValue Copy = Bitcast.getOperand(0);		// Half-precision return values
if (Copy.getOpcode() != ISD::CopyFromReg \|\|		if (SrcVT == MVT::f16 && DstVT == MVT::i16) {
Copy.getValueType() != MVT::f32)		if (!HasFullFP16)
		return SDValue();
		//
		// t11: f16 = fadd t8, t10
		// t12: i16 = bitcast t11 <~~~ SDNode N
		// t13: i32 = zero_extend t12
		// t16: ch,glue = CopyToReg t0, Register:i32 %r0, t13
		// t17: ch = ARMISD::RET_FLAG t16, Register:i32 %r0, t16:1
		//
		// transform this into:
		//
		// t20: i32 = ARMISD::VMOVrh t11
		// t16: ch,glue = CopyToReg t0, Register:i32 %r0, t20
		//
		auto ZeroExtend = N->use_begin();
		if (N->use_size() != 1 \|\| ZeroExtend->getOpcode() != ISD::ZERO_EXTEND \|\|
		ZeroExtend->getValueType(0) != MVT::i32)
return SDValue();		return SDValue();

SDValue Ops[] = { Copy->getOperand(0), Copy->getOperand(1) };		auto Copy = ZeroExtend->use_begin();
return DAG.getNode(ISD::CopyFromReg, SDLoc(Copy), MVT::f16, Ops);		if (Copy->getOpcode() == ISD::CopyToReg &&
		Copy->use_begin()->getOpcode() == ARMISD::RET_FLAG) {
		SDValue Cvt = DAG.getNode(ARMISD::VMOVrh, SDLoc(Op), MVT::i32, Op);
		DAG.ReplaceAllUsesWith(*ZeroExtend, &Cvt);
		return Cvt;
		}
		return SDValue();
}		}

assert((SrcVT == MVT::i64 \|\| DstVT == MVT::i64) &&		if (!(SrcVT == MVT::i64 \|\| DstVT == MVT::i64))
"ExpandBITCAST called for non-i64 type");		return SDValue();

// Turn i64->f64 into VMOVDRR.		// Turn i64->f64 into VMOVDRR.
if (SrcVT == MVT::i64 && TLI.isTypeLegal(DstVT)) {		if (SrcVT == MVT::i64 && TLI.isTypeLegal(DstVT)) {
// Do not force values to GPRs (this is what VMOVDRR does for the inputs)		// Do not force values to GPRs (this is what VMOVDRR does for the inputs)
// if we can combine the bitcast with its source.		// if we can combine the bitcast with its source.
if (SDValue Val = CombineVMOVDRRCandidateWithVecOp(N, DAG))		if (SDValue Val = CombineVMOVDRRCandidateWithVecOp(N, DAG))
return Val;		return Val;

▲ Show 20 Lines • Show All 2,873 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::FCOPYSIGN: return LowerFCOPYSIGN(Op, DAG);		case ISD::FCOPYSIGN: return LowerFCOPYSIGN(Op, DAG);
case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);		case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);
case ISD::FRAMEADDR: return LowerFRAMEADDR(Op, DAG);		case ISD::FRAMEADDR: return LowerFRAMEADDR(Op, DAG);
case ISD::EH_SJLJ_SETJMP: return LowerEH_SJLJ_SETJMP(Op, DAG);		case ISD::EH_SJLJ_SETJMP: return LowerEH_SJLJ_SETJMP(Op, DAG);
case ISD::EH_SJLJ_LONGJMP: return LowerEH_SJLJ_LONGJMP(Op, DAG);		case ISD::EH_SJLJ_LONGJMP: return LowerEH_SJLJ_LONGJMP(Op, DAG);
case ISD::EH_SJLJ_SETUP_DISPATCH: return LowerEH_SJLJ_SETUP_DISPATCH(Op, DAG);		case ISD::EH_SJLJ_SETUP_DISPATCH: return LowerEH_SJLJ_SETUP_DISPATCH(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG,		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG,
Subtarget);		Subtarget);
case ISD::BITCAST: return ExpandBITCAST(Op.getNode(), DAG);		case ISD::BITCAST: return ExpandBITCAST(Op.getNode(), DAG, Subtarget);
case ISD::SHL:		case ISD::SHL:
case ISD::SRL:		case ISD::SRL:
case ISD::SRA: return LowerShift(Op.getNode(), DAG, Subtarget);		case ISD::SRA: return LowerShift(Op.getNode(), DAG, Subtarget);
case ISD::SREM: return LowerREM(Op.getNode(), DAG);		case ISD::SREM: return LowerREM(Op.getNode(), DAG);
case ISD::UREM: return LowerREM(Op.getNode(), DAG);		case ISD::UREM: return LowerREM(Op.getNode(), DAG);
case ISD::SHL_PARTS: return LowerShiftLeftParts(Op, DAG);		case ISD::SHL_PARTS: return LowerShiftLeftParts(Op, DAG);
case ISD::SRL_PARTS:		case ISD::SRL_PARTS:
case ISD::SRA_PARTS: return LowerShiftRightParts(Op, DAG);		case ISD::SRA_PARTS: return LowerShiftRightParts(Op, DAG);
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	void ARMTargetLowering::ReplaceNodeResults(SDNode *N,
SDValue Res;		SDValue Res;
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
llvm_unreachable("Don't know how to custom expand this!");		llvm_unreachable("Don't know how to custom expand this!");
case ISD::READ_REGISTER:		case ISD::READ_REGISTER:
ExpandREAD_REGISTER(N, Results, DAG);		ExpandREAD_REGISTER(N, Results, DAG);
break;		break;
case ISD::BITCAST:		case ISD::BITCAST:
Res = ExpandBITCAST(N, DAG);		Res = ExpandBITCAST(N, DAG, Subtarget);
break;		break;
case ISD::SRL:		case ISD::SRL:
case ISD::SRA:		case ISD::SRA:
Res = Expand64BitShift(N, DAG, Subtarget);		Res = Expand64BitShift(N, DAG, Subtarget);
break;		break;
case ISD::SREM:		case ISD::SREM:
case ISD::UREM:		case ISD::UREM:
Res = LowerREM(N, DAG);		Res = LowerREM(N, DAG);
▲ Show 20 Lines • Show All 6,514 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMInstrVFP.td

Show All 17 Lines	def SDT_VMOVRRD : SDTypeProfile<2, 1, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>,
SDTCisVT<2, f64>]>;		SDTCisVT<2, f64>]>;

def arm_fmstat : SDNode<"ARMISD::FMSTAT", SDTNone, [SDNPInGlue, SDNPOutGlue]>;		def arm_fmstat : SDNode<"ARMISD::FMSTAT", SDTNone, [SDNPInGlue, SDNPOutGlue]>;
def arm_cmpfp : SDNode<"ARMISD::CMPFP", SDT_ARMFCmp, [SDNPOutGlue]>;		def arm_cmpfp : SDNode<"ARMISD::CMPFP", SDT_ARMFCmp, [SDNPOutGlue]>;
def arm_cmpfp0 : SDNode<"ARMISD::CMPFPw0", SDT_CMPFP0, [SDNPOutGlue]>;		def arm_cmpfp0 : SDNode<"ARMISD::CMPFPw0", SDT_CMPFP0, [SDNPOutGlue]>;
def arm_fmdrr : SDNode<"ARMISD::VMOVDRR", SDT_VMOVDRR>;		def arm_fmdrr : SDNode<"ARMISD::VMOVDRR", SDT_VMOVDRR>;
def arm_fmrrd : SDNode<"ARMISD::VMOVRRD", SDT_VMOVRRD>;		def arm_fmrrd : SDNode<"ARMISD::VMOVRRD", SDT_VMOVRRD>;

		def SDT_VMOVhr : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisVT<1, i32>] >;
		def SDT_VMOVrh : SDTypeProfile<1, 1, [SDTCisVT<0, i32>, SDTCisFP<1>] >;
		def arm_vmovhr : SDNode<"ARMISD::VMOVhr", SDT_VMOVhr>;
		def arm_vmovrh : SDNode<"ARMISD::VMOVrh", SDT_VMOVrh>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Operand Definitions.		// Operand Definitions.
//		//

// 8-bit floating-point immediate encodings.		// 8-bit floating-point immediate encodings.
def FPImmOperand : AsmOperandClass {		def FPImmOperand : AsmOperandClass {
let Name = "FPImm";		let Name = "FPImm";
let ParserMethod = "parseFPImm";		let ParserMethod = "parseFPImm";
▲ Show 20 Lines • Show All 1,132 Lines • ▼ Show 20 Lines	def VMOVSRR : AVConv5I<0b11000100, 0b1010,
// pipelines.		// pipelines.
let D = VFPNeonDomain;		let D = VFPNeonDomain;

let DecoderMethod = "DecodeVMOVSRR";		let DecoderMethod = "DecodeVMOVSRR";
}		}

// Move H->R, clearing top 16 bits		// Move H->R, clearing top 16 bits
def VMOVRH : AVConv2I<0b11100001, 0b1001,		def VMOVRH : AVConv2I<0b11100001, 0b1001,
(outs GPR:$Rt), (ins SPR:$Sn),		(outs GPR:$Rt), (ins HPR:$Sn),
IIC_fpMOVSI, "vmov", ".f16\t$Rt, $Sn",		IIC_fpMOVSI, "vmov", ".f16\t$Rt, $Sn",
[]>,		[(set GPR:$Rt, (arm_vmovrh HPR:$Sn))]>,
Requires<[HasFullFP16]>,		Requires<[HasFullFP16]>,
Sched<[WriteFPMOV]> {		Sched<[WriteFPMOV]> {
// Instruction operands.		// Instruction operands.
bits<4> Rt;		bits<4> Rt;
bits<5> Sn;		bits<5> Sn;

// Encode instruction operands.		// Encode instruction operands.
let Inst{19-16} = Sn{4-1};		let Inst{19-16} = Sn{4-1};
let Inst{7} = Sn{0};		let Inst{7} = Sn{0};
let Inst{15-12} = Rt;		let Inst{15-12} = Rt;

let Inst{6-5} = 0b00;		let Inst{6-5} = 0b00;
let Inst{3-0} = 0b0000;		let Inst{3-0} = 0b0000;
}		}

// Move R->H, clearing top 16 bits		// Move R->H, clearing top 16 bits
def VMOVHR : AVConv4I<0b11100000, 0b1001,		def VMOVHR : AVConv4I<0b11100000, 0b1001,
(outs SPR:$Sn), (ins GPR:$Rt),		(outs HPR:$Sn), (ins GPR:$Rt),
IIC_fpMOVIS, "vmov", ".f16\t$Sn, $Rt",		IIC_fpMOVIS, "vmov", ".f16\t$Sn, $Rt",
[]>,		[(set HPR:$Sn, (arm_vmovhr GPR:$Rt))]>,
Requires<[HasFullFP16]>,		Requires<[HasFullFP16]>,
Sched<[WriteFPMOV]> {		Sched<[WriteFPMOV]> {
// Instruction operands.		// Instruction operands.
bits<5> Sn;		bits<5> Sn;
bits<4> Rt;		bits<4> Rt;

// Encode instruction operands.		// Encode instruction operands.
let Inst{19-16} = Sn{4-1};		let Inst{19-16} = Sn{4-1};
▲ Show 20 Lines • Show All 1,206 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/fp16-instructions.ll

	Show All 37 Lines
	; CHECK-SOFTFP-FP16: vmov [[S2:s[0-9]]], r1			; CHECK-SOFTFP-FP16: vmov [[S2:s[0-9]]], r1
	; CHECK-SOFTFP-FP16: vmov [[S0:s[0-9]]], r0			; CHECK-SOFTFP-FP16: vmov [[S0:s[0-9]]], r0
	; CHECK-SOFTFP-FP16: vcvtb.f32.f16 [[S2]], [[S2]]			; CHECK-SOFTFP-FP16: vcvtb.f32.f16 [[S2]], [[S2]]
	; CHECK-SOFTFP-FP16: vcvtb.f32.f16 [[S0]], [[S0]]			; CHECK-SOFTFP-FP16: vcvtb.f32.f16 [[S0]], [[S0]]
	; CHECK-SOFTFP-FP16: vadd.f32 [[S0]], [[S0]], [[S2]]			; CHECK-SOFTFP-FP16: vadd.f32 [[S0]], [[S0]], [[S2]]
	; CHECK-SOFTFP-FP16: vcvtb.f16.f32 [[S0]], [[S0]]			; CHECK-SOFTFP-FP16: vcvtb.f16.f32 [[S0]], [[S0]]
	; CHECK-SOFTFP-FP16: vmov r0, s0			; CHECK-SOFTFP-FP16: vmov r0, s0

	; CHECK-SOFTFP-FULLFP16: strh r1, {{.*}}			; CHECK-SOFTFP-FULLFP16: vmov.f16 [[S0:s[0-9]]], r1
	; CHECK-SOFTFP-FULLFP16: strh r0, {{.*}}			; CHECK-SOFTFP-FULLFP16: vmov.f16 [[S2:s[0-9]]], r0
	; CHECK-SOFTFP-FULLFP16: vldr.16 [[S0:s[0-9]]], {{.*}}
	; CHECK-SOFTFP-FULLFP16: vldr.16 [[S2:s[0-9]]], {{.*}}
	; CHECK-SOFTFP-FULLFP16: vadd.f16 [[S0]], [[S2]], [[S0]]			; CHECK-SOFTFP-FULLFP16: vadd.f16 [[S0]], [[S2]], [[S0]]
	; CHECK-SOFTFP-FULLFP16: vstr.16 [[S2:s[0-9]]], {{.*}}			; CHECK-SOFTFP-FULLFP16-NEXT: vmov.f16 r0, s0
	; CHECK-SOFTFP-FULLFP16: ldrh r0, {{.*}}			; CHECK-SOFTFP-FULLFP16-NEXT: mov pc, lr
	; CHECK-SOFTFP-FULLFP16: mov pc, lr

	; CHECK-HARDFP-VFP3: vmov r{{.}}, s0			; CHECK-HARDFP-VFP3: vmov r{{.}}, s0
	; CHECK-HARDFP-VFP3: vmov{{.*}}, s1			; CHECK-HARDFP-VFP3: vmov{{.*}}, s1
	; CHECK-HARDFP-VFP3: bl __aeabi_h2f			; CHECK-HARDFP-VFP3: bl __aeabi_h2f
	; CHECK-HARDFP-VFP3: bl __aeabi_h2f			; CHECK-HARDFP-VFP3: bl __aeabi_h2f
	; CHECK-HARDFP-VFP3: vadd.f32			; CHECK-HARDFP-VFP3: vadd.f32
	; CHECK-HARDFP-VFP3: bl __aeabi_f2h			; CHECK-HARDFP-VFP3: bl __aeabi_f2h
	; CHECK-HARDFP-VFP3: vmov s0, r0			; CHECK-HARDFP-VFP3: vmov s0, r0

	; CHECK-HARDFP-FP16: vcvtb.f32.f16 [[S2:s[0-9]]], s1			; CHECK-HARDFP-FP16: vcvtb.f32.f16 [[S2:s[0-9]]], s1
	; CHECK-HARDFP-FP16: vcvtb.f32.f16 [[S0:s[0-9]]], s0			; CHECK-HARDFP-FP16: vcvtb.f32.f16 [[S0:s[0-9]]], s0
	; CHECK-HARDFP-FP16: vadd.f32 [[S0]], [[S0]], [[S2]]			; CHECK-HARDFP-FP16: vadd.f32 [[S0]], [[S0]], [[S2]]
	; CHECK-HARDFP-FP16: vcvtb.f16.f32 [[S0]], [[S0]]			; CHECK-HARDFP-FP16: vcvtb.f16.f32 [[S0]], [[S0]]

	; CHECK-HARDFP-FULLFP16: vadd.f16 s0, s0, s1			; CHECK-HARDFP-FULLFP16: vadd.f16 s0, s0, s1
	; CHECK-HARDFP-FULLFP16-NEXT: mov pc, lr			; CHECK-HARDFP-FULLFP16-NEXT: mov pc, lr

	}			}