This is an archive of the discontinued LLVM Phabricator instance.

[X86] Teach how to custom lower double-to-half conversions under fast-math.
ClosedPublic

Authored by andreadb on Feb 23 2015, 8:27 AM.

Download Raw Diff

Details

Reviewers

qcolombet
grosbach
ab

Commits

rGaf3f397b10da: [X86] Teach how to custom lower double-to-half conversions under fast-math.
rL230276: [X86] Teach how to custom lower double-to-half conversions under fast-math.

Summary

This patch teaches the backend how to custom lower a 'fp_to_fp16' node that performs a double-to-half conversion.

Under fast-math, if the target has F16C, the backend can expand a double-to-half conversion into a double-to-float conversion immediately followed by a float-to-half conversion. Before this patch, a double-to-half conversion was always expanded into a library call even under fast-math.

Example:
\code
define zeroext i16 @func(double %d) #0 {
entry:

%0 = tail call i16 @llvm.convert.to.fp16.f64(double %d)
ret i16 %0

}

attributes #0 = { "unsafe-fp-math=true" "use-soft-float"="false" }
\code end

Before this patch (with -mattr=+f16c), the conversion from double to fp16 was expanded into a library call to function '__truncdfhf2'.

With this patch, the double-to-half conversion is now expanded into the sequence:

vcvtsd2ss %xmm0, %xmm0, %xmm0
vcvtps2ph $0, %xmm0, %xmm0

Note that this patch also handles 'long double'-to-half conversions.

This patch doesn't add custom lowering rules for 'fp16_to_fp' dag nodes. The reason why we don't need those rules is because LegalizeDAG (see around lines 3532:3546) already knows how to expand a half-to-double conversion into a 'FP16_TO_FP' plus 'FP_EXTEND'.

Please let me know if ok to submit.

Thanks!
Andrea

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb updated this revision to Diff 20515.Feb 23 2015, 8:27 AM

andreadb retitled this revision from to [X86] Teach how to custom lower double-to-half conversions under fast-math..

andreadb updated this object.

andreadb edited the test plan for this revision. (Show Details)

andreadb added reviewers: qcolombet, ab, grosbach.

andreadb added a subscriber: Unknown Object (MLST).

Should this be in the target-independent legalizer instead, like FP16_TO_FP? Only with UnsafeFPMath, of course (I'm a bit uncomfortable with that, but I don't find it shocking, and I see there's precedent.)

Otherwise, the change seems reasonable, thanks!

-Ahmed

test/CodeGen/X86/fastmath-float-half-conversion.ll
17 ↗	(On Diff #20515)	Not very important, but add a check for fp80->fp32, perhaps?

Hi Ahmed,

Thanks for the review!
Here is a updated patch. As you suggested, I moved the expansion of FP_TO_FP16 into the target independed dag legalizer.
I also updated the test adding extra CHECK lines for the 'long double-to-float' conversion.

The 'long double-to-float conversion' is currently implemented by the sequence: fldt+fstps+movss.
Basically, the long double in input to the function is firstly pushed onto the top of the x87 FPU register stack, and then 'popped' from the FPU stack onto the stack again. Finally, it is loaded as a float using a zero extending movss. The code sequence looks a bit redundant but it works :-)

Please let me know what you think.

Thanks!
-Andrea

In D7832#128399, @andreadb wrote:

Hi Ahmed,

Thanks for the review!
Here is a updated patch. As you suggested, I moved the expansion of FP_TO_FP16 into the target independed dag legalizer.

If f16->f32 isn't legal as well, we just turned one libcall into two, no?

I also updated the test adding extra CHECK lines for the 'long double-to-float' conversion.

The 'long double-to-float conversion' is currently implemented by the sequence: fldt+fstps+movss.
Basically, the long double in input to the function is firstly pushed onto the top of the x87 FPU register stack, and then 'popped' from the FPU stack onto the stack again. Finally, it is loaded as a float using a zero extending movss. The code sequence looks a bit redundant but it works :-)

I expected something like that, thanks for adding it!

-Ahmed

Please let me know what you think.

Thanks!
-Andrea

In D7832#128423, @ab wrote:

In D7832#128399, @andreadb wrote:

Hi Ahmed,

Thanks for the review!
Here is a updated patch. As you suggested, I moved the expansion of FP_TO_FP16 into the target independed dag legalizer.

If f16->f32 isn't legal as well, we just turned one libcall into two, no?

Right, it doesn't make sense to expand that node if f16->f32 is not legal.
We would still generate a single libcall (for the f16->f32 conversion). However, we would get a extra (v)cvtsd2ss for the double-half conversion.
I will add a check to make sure that the f16->f32 is 'legalOrCustom' before attempting to expand the double-half conversion.

I will upload a new version patch.

I also updated the test adding extra CHECK lines for the 'long double-to-float' conversion.

The 'long double-to-float conversion' is currently implemented by the sequence: fldt+fstps+movss.
Basically, the long double in input to the function is firstly pushed onto the top of the x87 FPU register stack, and then 'popped' from the FPU stack onto the stack again. Finally, it is loaded as a float using a zero extending movss. The code sequence looks a bit redundant but it works :-)

I expected something like that, thanks for adding it!

-Ahmed

Please let me know what you think.

Thanks!
-Andrea

Uploaded a new version of the patch.
This time, we avoid expanding a 'fp_to_fp16' if the float-half is not legal (or custom) for the target.
I also added a extra RUN line in my new test to verify the codegen if the target doesn't have F16C.

Please let me know what you think.

Thanks again for your time.
Andrea

LGTM now, thanks!

-Ahmed

This revision is now accepted and ready to land.Feb 23 2015, 1:11 PM

Closed by commit rL230276: [X86] Teach how to custom lower double-to-half conversions under fast-math. (authored by adibiagio). · Explain WhyFeb 23 2015, 3:01 PM

This revision was automatically updated to reflect the committed changes.

Thanks Ahmed.
Committed revision 230276.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

15 lines

test/

CodeGen/

X86/

fastmath-float-half-conversion.ll

52 lines

Diff 20549

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 3,542 Lines • ▼ Show 20 Lines	case ISD::FP16_TO_FP: {
// the option of emitting that before resorting to a libcall.		// the option of emitting that before resorting to a libcall.
SDValue Res =		SDValue Res =
DAG.getNode(ISD::FP16_TO_FP, dl, MVT::f32, Node->getOperand(0));		DAG.getNode(ISD::FP16_TO_FP, dl, MVT::f32, Node->getOperand(0));
Results.push_back(		Results.push_back(
DAG.getNode(ISD::FP_EXTEND, dl, Node->getValueType(0), Res));		DAG.getNode(ISD::FP_EXTEND, dl, Node->getValueType(0), Res));
break;		break;
}		}
case ISD::FP_TO_FP16: {		case ISD::FP_TO_FP16: {
		if (!TM.Options.UseSoftFloat && TM.Options.UnsafeFPMath) {
		SDValue Op = Node->getOperand(0);
		MVT SVT = Op.getSimpleValueType();
		if ((SVT == MVT::f64 \|\| SVT == MVT::f80) &&
		TLI.isOperationLegalOrCustom(ISD::FP_TO_FP16, MVT::f32)) {
		// Under fastmath, we can expand this node into a fround followed by
		// a float-half conversion.
		SDValue FloatVal = DAG.getNode(ISD::FP_ROUND, dl, MVT::f32, Op,
		DAG.getIntPtrConstant(0));
		Results.push_back(
		DAG.getNode(ISD::FP_TO_FP16, dl, MVT::i16, FloatVal));
		break;
		}
		}

RTLIB::Libcall LC =		RTLIB::Libcall LC =
RTLIB::getFPROUND(Node->getOperand(0).getValueType(), MVT::f16);		RTLIB::getFPROUND(Node->getOperand(0).getValueType(), MVT::f16);
assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unable to expand fp_to_fp16");		assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unable to expand fp_to_fp16");
Results.push_back(ExpandLibCall(LC, Node, false));		Results.push_back(ExpandLibCall(LC, Node, false));
break;		break;
}		}
case ISD::ConstantFP: {		case ISD::ConstantFP: {
ConstantFPSDNode *CFP = cast<ConstantFPSDNode>(Node);		ConstantFPSDNode *CFP = cast<ConstantFPSDNode>(Node);
▲ Show 20 Lines • Show All 811 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fastmath-float-half-conversion.ll

				; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s \| FileCheck %s --check-prefix=ALL --check-prefix=F16C
				; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+avx < %s \| FileCheck %s --check-prefix=ALL --check-prefix=AVX

				define zeroext i16 @test1_fast(double %d) #0 {
				; ALL-LABEL: test1_fast:
				; F16C-NOT: callq {{_+}}truncdfhf2
				; F16C: vcvtsd2ss %xmm0, %xmm0, %xmm0
				; F16C-NEXT: vcvtps2ph $0, %xmm0, %xmm0
				; AVX: callq {{_+}}truncdfhf2
				; ALL: ret
				entry:
				%0 = tail call i16 @llvm.convert.to.fp16.f64(double %d)
				ret i16 %0
				}

				define zeroext i16 @test2_fast(x86_fp80 %d) #0 {
				; ALL-LABEL: test2_fast:
				; F16C-NOT: callq {{_+}}truncxfhf2
				; F16C: fldt
				; F16C-NEXT: fstps
				; F16C-NEXT: vmovss
				; F16C-NEXT: vcvtps2ph $0, %xmm0, %xmm0
				; AVX: callq {{_+}}truncxfhf2
				; ALL: ret
				entry:
				%0 = tail call i16 @llvm.convert.to.fp16.f80(x86_fp80 %d)
				ret i16 %0
				}

				define zeroext i16 @test1(double %d) #1 {
				; ALL-LABEL: test1:
				; ALL: callq {{_+}}truncdfhf2
				; ALL: ret
				entry:
				%0 = tail call i16 @llvm.convert.to.fp16.f64(double %d)
				ret i16 %0
				}

				define zeroext i16 @test2(x86_fp80 %d) #1 {
				; ALL-LABEL: test2:
				; ALL: callq {{_+}}truncxfhf2
				; ALL: ret
				entry:
				%0 = tail call i16 @llvm.convert.to.fp16.f80(x86_fp80 %d)
				ret i16 %0
				}

				declare i16 @llvm.convert.to.fp16.f64(double)
				declare i16 @llvm.convert.to.fp16.f80(x86_fp80)

				attributes #0 = { nounwind readnone uwtable "unsafe-fp-math"="true" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone uwtable "unsafe-fp-math"="false" "use-soft-float"="false" }

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Teach how to custom lower double-to-half conversions under fast-math.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 20549

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/trunk/test/CodeGen/X86/fastmath-float-half-conversion.ll

[X86] Teach how to custom lower double-to-half conversions under fast-math.
ClosedPublic